Thursday, February 13, 2014

Text File Reading in PHP

A text file can be read in many ways, however reading PDFs or Excel files will be different because there format is complex. We usually do not need to read complex file types, so, we'll mostly stick to reading text files. 

Usually text files can be read -- i) all at once ii) character by character iii) line by line iv) in bytes. Check the code below.

First we would read the total content at once from a file. 

<?php
// Method 1
// This prints the text in original format
// file_get_contents() read the whole content 
// into a string variable $str
echo $str = nl2br(file_get_contents("test.txt"));

// Method 2 :: readfile() writes to output buffer.

// If output buffering is turned off, content won't 
// be displayed
readfile("test.txt");

// If we clean the buffer, the content is 

// not displayed on browser
readfile("test.txt");
ob_end_clean();

// Let's get all the buffered content into a string

// ob_get_clean() returns the buffer content as a string
// This again prints in content in original format
readfile("test.txt");
$str = ob_get_clean();
echo nl2br( $str );

/// Method 3 :: Grab the total content using fread()

// Open the file
$file_name = "test.txt";
$fp = fopen($file_name,"r+") or die("File can't be opened");
// GEt the total contents
echo nl2br(fread($fp, filesize($file_name) ));
fclose($fp);
?>

Now, let's try to read the character by character. Check the code below.

<?php
// Open the file
$file_name = "test.txt";
$fp = fopen($file_name,"r+") or die("File can't be opened");

// Loop every character

while( ($char = fgetc($fp))!== false )

  if( ord($char) == 13 )  
    echo "<br>";
  else
    echo $char;
}

// Close the file handle/pointer
fclose($fp);
?>

The ord() function returns the ASCII code of passed character. So, for every newline character, we are printing a <br> element to keep the original formatting.  

Next, we would be reading a file line by line using fgets() function. Check the code below.

<?php
// Open the file
$file_name = "test.txt";
$fp = fopen($file_name,"r+") or die("File can't be opened");

// Loop thru lines

while( ($str = fgets($fp))!= NULL )
 echo $str . "<br>";

// Close the file handle/pointer

fclose($fp);
?>

The above code reads all the lines one by one from beginning. To read a file line by line from the end, check the article How to read a file backward in PHP?.

Next, we would read a file byte by byte which is again done with fread().

<?php
$file_name = "test.txt";
$fp = fopen($file_name,"r+") or die("File can't be opened");

// Loop thru bytes
while ( ( $data = fread($fp, 10) )!= false )
 echo nl2br($data);

// Close file 
fclose($fp);
?>

The above code reads 10 bytes at a time including the newline character. Let's twist the output of the above program and discover some good stuffs.

Let's change the while loop as shown below :

<?php
// Loop thru bytes
while ( ( $data = fread($fp, 10) )!= false )
{
  $data = str_replace("\n","@",$data);
  $data = str_replace("\r","#",$data);
  echo "[$data]<br>";
}
?>

Suppose the file test.txt has the following content ::

Line 1 :: THIS IS JUST a TEST
Line 2 :: THIS IS JUST a TEST
Line 3 :: THIS IS JUST a TEST

and if the above code is run on Xampp or Wamp on Windows, the above file content would be perceived as following ::

Line 1 :: THIS IS JUST a TEST\r\n
Line 2 :: THIS IS JUST a TEST\r\n
Line 3 :: THIS IS JUST a TEST\r\n

Newline character on Windows is comprised of '\r' and '\n'. The output of the above code will prove that. The output is shown below :

[Line 1 :: ]
[THIS IS JU]
[ST a TEST#]
[@Line 2 ::]
[ THIS IS J]
[UST a TEST]
[#@Line 3 :]
[: THIS IS ]
[JUST a TES]
[T#@]


The above output proves i) fread() reads through newline character and ii) Newline character on Windows is '\r\n'.

The $bytes_pos holds a value 0 (zero) initially and file pointer is standing at 0th location. Then fread() function reads 10 bytes, hence the file pointer points to 10th location now which is returned by ftell(). So, this way the $bytes_pos array keeps on storing positions like 0, 10, 20, 30 40. When it comes to the reading last 10 bytes, ftell() still returns the position of current bytes (which can be EOF or a newline) within file which is stored at the last position in the array $bytes_pos. This would cause a small problem in our next example. 

Let's try a useless but a different thing with fread(). Let's read all these data packs (10 bytes each) backward. Check the code below.

<?php
$read_length = 10;

$file_name = "tester.txt";
$fp = fopen($file_name,"r+") or die("File can't be opened");

// Store the First Position 0, file reading starts from
// Position 0 within file
$bytes_pos = array(0);

// Loop thru bytes and store bytes position in an array
while ( ( $data = fread($fp, $read_length) )!= false )
 $bytes_pos[] = ftell( $fp );

// Reverse the array for reading it Backward 
$bytes_pos = array_reverse( $bytes_pos );

// Finally, reading 10 bytes from stored positions
foreach($bytes_pos as $pos)
{
  // Move the file pointer
  fseek( $fp, $pos );
  
  // Read
  $data = fread($fp, $read_length) ;
  $data = str_replace("\n","@",$data);
  $data = str_replace("\r","#",$data);
  echo "[$data]<br>";
}

// Close file 
fclose($fp);
?>

Here again, we started with reading 10 bytes and storing the position in an array $bytes_pos. The ftell() function returns the current position of file pointer. When the array is filled with various positions 10 bytes apart, we just reversed it. Then finally we iterated through the array using the foreach loop construct. Next, we used the fseek() function to move the file pointer to desired location and read 10 bytes from thereon using fread() function. Check the output below. It is almost the opposite to the previous output.

 [] [T#@]
[JUST a TES]
[: THIS IS ]
[#@Line 3 :]
[UST a TEST]
[ THIS IS J]
[@Line 2 ::]
[ST a TEST#]
[THIS IS JU]
[Line 1 :: ]

The first line "[]" is coming because of the fact that file position of EOF or newline is stored at the last location within the array. This was discussed awhile ago. However this problem can be overcome by removing the last item from that array.

No comments: