Tuesday, May 28, 2013

String trimming with C

Trimming means removing whitespaces from beginning and end of a string. First we would write a function which removes all trailing whitespaces from a given string.

The logic is ::

i) Scan a string from right end in a loop
ii) If a whitespace found at location n and location n-1 holds any non-whitespace, then we put a '\0' NULL character there and stop.
iii) Otherwise for every whitespace, we replace it with a NULL character.

#include"stdio.h"
#include"string.h"
main()
{
  char str[35] = "  India  is my   country  " ;
 
  /* TRIM end whitespaces */
  right_trim(str); 
 
  /* Print After right Trimming*/
  printf("\n %s",str);
}


/* Right Trim function definition */
right_trim(char *str)
{
  int i,len;
  len = strlen(str);
  i = len - 1;
 
  /* Make sure STRING is not empty */
  if( len > 0 )
  {

       /* This Loop scans the string from end */
       while(str[i])
       {
         /* If a space is found at nth position
            and n-1 position holds a character
*/
         if( str[i] == ' ' && str[i-1]!=' ' && i > 0 )
         {
           str[i] = 0;

           /* BREAK THE LOOP - Important*/
           break;
         }
         else if( str[i] == ' ' )
         {
            str[i] = 0;
         }
               
         i--;
       }
   
   }
}


The right_trim() function takes a pointer to a string which means it works on the original string, not a copy of it. There is a main loop which scans a given string from right end and it places a NULL character at any position if the following conditions satisfy..

a. If any whitespace is found and its previous character is a non-whitespace, then we can assume that it is end of the word/string, hence we put a NULL character at that position.

Suppose we have a string "PHP  " (2 whitespaces at the end) with character 'P' at index 0 and ' ' (space) at index 3 and 4. When the loop scans the string from the end (From index 4), in the first pass, the condition "str[i] == ' '" is evaluated to be true, hence a NULL is placed at position 4. In the second pass the condition " str[i] == ' ' && str[i-1]!=' ' && i > 0 "  becomes true and null character is placed at position 3 making the string "PHP". After this point, we should break out of the loop.

For another example string "Nice PHP  " (2 spaces at the end), after it puts '\0' at 8th position when condition " str[i] == ' ' && str[i-1]!=' ' && i > 0" (i=8) satisfies, if we don't break the loop, it satisfies the similar condition at position 4 and puts a '\0' at 4th position making the string "Nice" which is not what we want.

b. The else part's condition  str[i] == ' ' is equally important to right-trim empty string like  "    ". It continuously places a '\0' on every occurrence of whitespaces.

Now check out how to left-trim a given string like "   NICE PHP   " with 3 whitespace at both front and end. Here is the logic ::

i)Start scanning the string from beginning
ii) If a position has whitespace (" ") and next character is a non-whitespace, break the loop. That is the valid starting point of the string. We can copy from this position into another array or the same array
iii) Else we keep putting 0 in every position if it is a whitespace.

The step (iii) is for trimming empty strings like "  " (2 whitespaces). Now check the implementation below ..
   
#include"stdio.h"
#include"string.h"
main()
{

  char str[35] = "  NICE PHP  " ;
 
  /* TRIM beginning whitespaces */
  left_trim(str); 
 
  /* Print After LEFT Trimming */
  printf("\n %s",str);
}


/* LEFT Trim function definition */
left_trim(char *str)
{
  int i,len, copy_flag=0;
  len = strlen(str);
  i = 0;
 
  /* Make sure STRING is not empty */
  if( len > 0  )
  {

       /* This Loop scans the string from beginning */
       while(str[i])
       {
         /* If a space is found at nth position
            and n+1 position holds a character
*/
         if( str[i] == ' ' && str[i+1]!=' ' && i < len-1 )
         {

           /* Set the FLAG to denote that Shifting/Copying is required */
           copy_flag = 1; 

           /* BREAK THE LOOP - Important */
           break;
         }
         else if( str[i] == ' ' )
         {
            str[i] = 0;
         }
               
         i++;
       }
   
   }
   

   /* LEft Shifting is required */
   if( copy_flag )
   {
      /* i+1 holds valid start of the string as
         prior to that, all position holds whitespaces

      */
      int fpos = i+1;
      int target_pos = 0;
     
      /* start shifting the string towards left */
      while( str[fpos] )
      {
        /* Write/Shift */
        str[target_pos] = str[fpos];

        target_pos++;
        fpos++;
      }
     
      /* Denote new ending */
      str[target_pos] = 0;

   }

}


The loop while(str[i]) determines where the first non-whitespace character appears within the string. If any such non-whitespace character is found, its location is stored and we beak out the loop with a flag set to denote that the array values need to be shifted towards left. Then we shift the array towards left. If any non-whitespace character is not found, '\0' is inserted at every position which does not cause any harm. For example a string is "  NICE PHP  " (with 2 spaces at the beginning and end), in the 2nd pass of the loop the string becomes "0 NICE PHP  " as the first position gets a '\0' due to the code block :

else if( str[i] == ' ' )
{
  str[i] = 0;
}


The above condition satisfies when the i has a value of 0. But it does not harm because rest of the string "NICE PHP  " are shifted towards left and such NULL characters are over-written.

This was a very lengthy approach, let's take another small approach to left trim a given string. Here we would keep on left shift array characters until the first character becomes a non-whitespace character. Check the implementation below.

#include"stdio.h"
#include"string.h"
main()
{

  char str[35] = "  NICE PHP  " ;
 
  /* TRIM beginning whitespaces */
  left_trim(str); 
 
  /* Print After LEFT Trimming */
  printf("\n %s",str);
}


/* LEFT Trim function definition */
left_trim(char *str)
{
        int i=0, j , len;
      
       /* This Loop scans only the 1st position */
       while(  strlen( str ) > 0 && str[0] == ' ')
       {
           len = strlen( str );
           /* Put a NULL at 0 position */
           str[0] = '\0';

           /* LEFT Shift array characters */
           j = 1;
           while( str[j] )
           {
             str[j-1] = str[j];
             j++;
           }
         
           /* Denote End of the string */
           str[j-1] = 0;
           
       }
}


The 2nd approach requires less lines of code and easy to understand. The output is "NICE PHP  ".

No comments: