Friday, July 26, 2013

Handling XML in PHP - I

XML documents are much stricter than HTML. XML is case sensitive whereas HTML tags can be in any case. Multiple spaces are ignored in HTML whereas XML preserves whitespaces. In HTML, tags may not be closed, fo example <br> tag does not need to have a closing </br> tag. But in XML, tags must be properly closed. Attribute quoting is essential in XML; which means the value for an element attribute must be wrapped in a quote. But this is not mandatory in HTML. Lastly, In XML, all data inside tag must be escaped. Ampersand (&) must be escaped to &amp; We need htmlspecialchars() or htmlentities() functions to do this for us.

There are various XML handling extension in PHP for reading, manipulating XML document. Here we would check out SimpleXML extension which is quite capable of handling general XML document.

XML document must start with an <xml> tag with a version number as shown below.

<?xml version="1.0"?>

To generate an XML document in PHP requires correct Content-Type header for the document. A simple example to generate XML document is given below.

<?php
header('Content-type:text/xml');
echo "<?xml version='1.0'?>";
echo <<<XML
 <student>
  <fname>John</fname>
  <lname>Smith</lname>
  <roll_no>109</roll_no>
  <class>VII</class>
 </student>
XML;
?>


The XML declaration statement <?xml version='1.0'?> may cause a problem. It purely matches with any PHP code block with short_open_tag directive set to on. In that case PHP processing might start thinking that it is a PHP statement. To prevent this, this XML declaration is printed with echo or print.

For parsing XML document, we would use simplexml_load_file() function. Check the example below. Here we assume that an xml file test.xml has the following content.

<?xml version='1.0'?>
<students>
 <student id='1' section='A'>
  <!-- Teacher : Niel Hertz -->
  <fname>John</fname>
  <lname>Smith</lname>
  <roll_no>109</roll_no>
  <class>VII</class>
 </student>
 <student id='2' section='B'>
  <!-- Teacher : Jim Cartel -->
  <fname>Jeff</fname>
  <lname>Smith</lname>
  <roll_no>110</roll_no>
  <class>VIII</class>
 </student>
</students>


Notice that XML elements can have attributes as HTML elements have. But in XML, the attribute values need to be quoted. The XML document can have comments (Similar to HTML comments).

Now our PHP code loads this file and parses it.

<?php
// Load the XML file in an Object
$all_students = simplexml_load_file('test.xml');

// This prints an array-like structure
// which is very easier to browse
// print_r( $all_students );



// Browse each node
foreach( $all_students->student as $st )
{
  echo "<br>ID :: {$st['id']}, Student Name : {$st->fname} {$st->lname}, ";
  echo "Roll : {$st->roll_no}, Class : {$st->class}";
}
?>


Output ::
ID :: 1, Student Name : John Smith, Roll : 109, Class : VII
ID :: 2, Student Name : Jeff Smith, Roll : 110, Class : VIII


The simplexml_load_file() function reads the XML document, puts the root element 'students' in an SimpleXMLElement object variable $all_students. The variable '$all_students' now have all the <student> nodes in array format where each student detail is again stored in SimpleXMLElement object. So, in the foreach loop, the $st variable points to each student node (now converted to an object) and all the child nodes become properties of that object. Hence $st->fname prints the <fname> node content appearing under the <student> node. So, basically SimpleXML converts all XML elements into object properties which
then becomes easier to handle.

Also notice, how the 'id' attribute of each <student> element is accessed through the array construct $st['id']. All node attributes are stored in an array. However comments are not captured by SimpleXML extension.

SimpleXML extension comes with another function called simplexml_load_string() which loads the XML from a string. Check the example below.

<?php
$str = <<<XML
<?xml version='1.0'?>
<students>
 <student id='1' section='A'>
  <!-- Teacher : Niel Hertz -->
  <fname>John</fname>
  <lname>Smith</lname>
  <roll_no>109</roll_no>
  <class>VII</class>
 </student>
 <student id='2' section='B'>
  <!-- Teacher : Jim Cartel -->
  <fname>Jeff</fname>
  <lname>Smith</lname>
  <roll_no>110</roll_no>
  <class>VIII</class>
 </student>
</students>
XML;

// Now load the XML and turn it into an object
$all_students = simplexml_load_string($str);
?>


The same effect can be achieved by using SimpleXMLIterator class object. This is shown below ..

<?php
// $str variable is defined above
// Load the XML and turn it into an Iterator object

$all_students = new SimpleXmlIterator($str);

// Rewind is necessary to move the pointer
// to the first element

$all_students->rewind();

// Now start Looping; The key() function
// returns key in each iteration

while( $all_students->key() )
{
  // Get the current item
  $st = $all_students->current();
 
  // Print
  echo "<br>ID :: {$st['id']}, Student Name : {$st->fname} {$st->lname}, Roll : {$st->roll_no}, Class : {$st->class}";
 
  // Move the pointer to next item
  $all_students->next();
}
?>


The code above is quite self-explanatory. The functions rewind(), next() are used to move the iterator pointer to at the beginning and next item respectively. The  current() function points to current item.

Below, we are using objects of another class called SimpleXMLElement to convert an XML string to an object.

<?php
// Convert to SimpleXMLElement object
$all_students = new SimpleXMLElement($str);

// Show count of total immediate children under root
echo $all_students->count() . "<br>";

// Browse thru children for Printing
foreach($all_students->children() as $st)
{
 echo "<br>ID:{$st['id']}, ". $st->getName()." Name:{$st->fname} {$st->lname}, ";
 echo "Roll:{$st->roll_no}, Class:{$st->class}, Section:{$st['section']}";
}
?>


The count() method of SimpleXMLElement object counts the children of an element. At the beginning, the variable $all_students holds all the XML document including the root element. Hence calling count() function just reports "2" (i.e 2 <student> tags under the root element <students>). The children() method of the SimpleXMLElement class object finds children of given node. In our cases, 2 <student> nodes are listed as children of the root element <students>. So, a foreach() loop prints the details of each child node or <student> tag.

Let's add attributes and child nodes to the above XML structure.

<?php
// Now load the XML and turn it into an object
$all_students = new SimpleXMLElement($str);

// Get All the children
$child = $all_students->children();

// Browse thru children for adding
// Attributes and Child elements

for($i=0; $i<count($child); $i++ )
{
  // Get each child
  $st = $child[$i];
 
  // Create a username text
  $username = strtolower($st->fname . "_" . $st->lname);
  $username = str_replace(" ", "", $username );
 
  // Add a new Attibute called 'username' to each <student> node
  $st->addAttribute('username', $username );
 
  // Add a child Node called nickname inside each <student> node
  $st->addChild('nickname', $username );
}
// Now print the XML on browser
// SET the header

header('Content-Type:text/xml');
// Print XML
echo $all_students->asXML();
?>


The above code is quite self-explanatory. We are just looping through each child element occurs under the root element 'students'. We can add any attribute to any tag or node using addAttribute() method. This method takes attribute name and value as first and second parameters respectively. The addChild() method adds child under any node and takes new node's name and content as first & second parameters respectively. Finally, the asXML() method prints the new XML, but for the browser to print the XML in correct format, we must setup the Content-Type in the header() call. The output is shown below.

<students>
 <student id="1" section="A" username="john_smith">
  <!-- Teacher : Niel Hertz -->
  <fname>John</fname>
  <lname>Smith</lname>
  <roll_no>109</roll_no>
  <class>VII</class>
  <nickname>john_smith</nickname>
 </student>
 <student id="2" section="B" username="jeff_smith">
  <!-- Teacher : Jim Cartel -->
  <fname>Jeff</fname>
  <lname>Smith</lname>
  <roll_no>110</roll_no>
  <class>VIII</class>
  <nickname>jeff_smith</nickname>
 </student>
</students>


Notice the new attributes and child elements '<nickname>' have been added (marked in orange) in the XML structure.

XPath is used to navigate through elements and attributes in an XML document. XPath support is available in SimpleXML. Which means we can run XPath query on any XML data. The xpath() method of SimpleXML extension searches for any SimpleXML node matching the path provided as XPath. Check one example below.

<?php
// Load the XML and turn it into an object
$all_students = new SimpleXMLElement($str);

// Get all first Names only
$first_names = $all_students->xpath('/students/student/fname');
foreach ($first_names as $fname)
{
  echo " $fname";
}
?>


The above code displays content of all <fname> tags appearing in XPath 'students/student'. The method xpath() takes the path of the node as argument and returns all the nodes that appear in that path specified. If error, then FALSE is returned by xpath() method, otherwise an array of SimpleXML nodes are returned. If no matching node is found, then an empty array is returned.


Check out the 2nd part of this article - Handling XML in PHP - II

To see Complex XML parsing using SimpleXMLElement, check article Parsing Complex XML with SimpleXML in PHP

No comments: