Wednesday, April 10, 2013

Basic Regular Expression Patterns VII


1. Problem : Check if the given string is a valid IP adderss. IP addresses (IPv4) are represented in dot-decimal notation consisting of 4 decimal parts where each part ranges from 0 to 255.
  
   answer  :
 var patt = /^((2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1})\.){3}(2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1})$/;
 patt.test("200.10.0.1");  //true
 patt.test("100.100.0.1"); //True
 patt.test("340.100.0.1"); //False

  
   Explanation :
   First, we would be looking at ((2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1})\.){3} part. The next part will be automatically understood.
   ((2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1})\.){3} means a pattern is repeated 3 times. In an IP address, "{digits}." (digits and dot) are repeated 3 times and then one sequence of digits appear. For example in "255.255.255.123", "255." is repeated thice. Let's break up the RegExp.
   a. 2[0-4][0-9] means the number may be between 200 to 249
   b. | means OR
   c. 25[0-5] means the number may be 250 though 255. Till this point, 200 to 255 numbers are clready checked
   d. [1]?[0-9]?[0-9]{1} checks whethe the number is between 0 to 199. Hence till this point, all numbers between 0 to 255 are checked and captured. This REGEXP is discussed in Basic Regular Expression Patterns IV
   e. \. means dot
   f. So, together ((2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1})\.){3} means a sequence of numbers from 0 to 255 and a dot would appear thrice
   g. The last (2[0-4][0-9]|25[0-5]|[1]?[0-9]?[0-9]{1}) means another number from 0 to 255 would appear at the end.

   Google also provides a very handy tool for generating RegEx for validating any specific IP address range.

2. Problem : How to check for doubled words and remove them from a given string in Javascript
   Suppose, we have the following string which have some words repeated inside it.
    

 var str = "We love love our our God God";
  

  Problem is how to remove the duplicated words.

   answer :
   var patt = /\b(\w+)\s+\1/g ;
 var str = "We love love our our God God";
 var c = str.replace(patt, "$1");
 console.log( c );


   Output :
   We love our God

   Explanation :
   a. \b means a word boundary. This is for identifying "love" as a whole word.
   b. (\w+) means a sequence of alphanumeric word characters
   c. \s+ means one or more whitespaces comes next
   d. \1 means we want to use the backreference. This is to detect " love love ", " our our "," the the "  etc. The first match will be refered to as \1 by REGEXP engine.
   e. /g means we want a global search. Otherwise the search would stop after it found the first doubled word "love love".

   When we use the javascirpt replace function, we need to provide the new string also which would be taking the place of old words/characters. By specifying $1 as the second argument, we tell "replace" function to use the first matched element or first capture as replace string for each match. This means, when the REGEXP engine finds "love" as a doubled word, "$1" refers to that "love" itself and hence "love love" is replaced with a single "love". Similar thing happens for the word "our" due to the usage of global flag /g. The REGEXP engine again looks for doubled word and it finds "our" as a doubled word; "$1" in replace function refers to that "our" itself as a first capture and hence "our our" is replaced with a single word "our".

No comments: