Thursday, June 30, 2016

Parsing Complex XML with SimpleXML in PHP

Let's parse a very complex XML Data using SimpleXML methods in PHP. Let's take a complex XML data as shown below.

<NodeLevel1>
 <NodeLevel2>
  <NodeLevel3>
<RaceDay RaceDayDate="2016-06-29" >
   <Meeting MeetingCode="BR" MtgId="1299709952" VenueName="Doomben" >
<Pool PoolType="DD" DisplayStatus="SELLING"></Pool>
<Pool PoolType="XD" DisplayStatus="PAYING"></Pool>
<Pool PoolType="TT" DisplayStatus="CLOSED"></Pool>
<Pool PoolType="QD" DisplayStatus="CLOSED"></Pool>
<MultiPool PoolType="XD" DisplayStatus="PAYING"></MultiPool>
<Race RaceNo="1" RaceTime="12:53" RaceName="2YO HANDICAP" />
<Race RaceNo="2" RaceTime="13:23" RaceName="BM 75 HANDICAP" />
<Race RaceNo="3" RaceTime="13:53" RaceName="MAIDEN PLATE" >
     <TipsterTip TipsterId="0" Tips="4"/>
     <TipsterTip TipsterId="5" Tips="1-9-4-8"/>
     <Pool PoolType="A2" Available="Y" Abandoned="N" />
     <Pool PoolType="EX" Available="Y" Abandoned="N" />
     <Pool PoolType="F4" Available="Y" Abandoned="N" />
     <Runner RunnerNo="1" RunnerName="ALL TROOPS" />
     <Runner RunnerNo="2" RunnerName="SEQ THE STAR" />
     <Runner RunnerNo="3" RunnerName="SHADOW LAWN"/>
     <Runner RunnerNo="4" RunnerName="FREQUENDLY" />
</Race>
<Tipster TipsterId="0" TipsterName="LATE MAIL"/>
<Tipster TipsterId="1" TipsterName="RADIO TAB"/>
<Tipster TipsterId="2" TipsterName="TRACKMAN"/>
 </Meeting>
</RaceDay>
<RaceDay RaceDayDate="2016-06-30" >
 <Meeting MeetingCode="MR" MtgId="2299719559" VenueName="Lucas" >
<Pool PoolType="CC" DisplayStatus="SELLING"></Pool>
<Pool PoolType="YD" DisplayStatus="PAYING"></Pool>
<Pool PoolType="VT"  DisplayStatus="CLOSED"></Pool>
<Pool PoolType="MD" DisplayStatus="CLOSED"></Pool>
<MultiPool PoolType="VD" PoolDisplayStatus="PAYING"></MultiPool>
<Race RaceNo="1" RaceTime="12:53" RaceName="R2YO BHANDI" />
<Race RaceNo="2" RaceTime="13:23" RaceName="XX 75 ZINDA" />
<Race RaceNo="3" RaceTime="13:53" RaceName="PLATE RAIDEN" >
    <TipsterTip TipsterId="0" Tips="5"/>
    <TipsterTip TipsterId="5" Tips="2-1-4-8-4"/>
    <Pool PoolType="A2" Available="Y" Abandoned="N" />
    <Pool PoolType="EX" Available="Y" Abandoned="N" />
    <Pool PoolType="F4" Available="Y" Abandoned="N" />
    <Runner RunnerNo="1" RunnerName="ALL BROOKS" />
    <Runner RunnerNo="2" RunnerName="MIDDLE STAR" />
    <Runner RunnerNo="3" RunnerName="LONELY LAWN"/>
    <Runner RunnerNo="4" RunnerName="OBLIV" />
</Race>
<Tipster TipsterId="0" TipsterName="EARLY MAIL"/>
<Tipster TipsterId="1" TipsterName="RADIO CAB"/>
<Tipster TipsterId="2" TipsterName="JACKMAN"/>
 </Meeting>
  </RaceDay>
 </NodeLevel3>
</NodeLevel2>
</NodeLevel1>  

See that <NodeLevel3> node has two <RaceDay> nodes in it. And each <RaceDay> has its own <Meeting> node. Again each <Meeting> node  has various nodes like <Pool>, <MultiPool>, <Race>, <Tipster> as its children. Finally each <Race> node has <TipsterTip>, <Pool> and <Runner> nodes under it.

Here, Most of the nodes have attributes and some have descendants under it. We would traverse through all the <RaceDay> nodes and finds all its attributes and children. Let's start it.

<?php
$xml_source = <<<EOD
<NodeLevel1>
 <NodeLevel2>
  <NodeLevel3>
   <RaceDay RaceDayDate="2016-06-29" >
     <Meeting MeetingCode="BR" MtgId="1299709952" VenueName="Doomben" >
      <Pool PoolType="DD" DisplayStatus="SELLING"></Pool>
      <Pool PoolType="XD" DisplayStatus="PAYING"></Pool>
      <Pool PoolType="TT" DisplayStatus="CLOSED"></Pool>
      <Pool PoolType="QD" DisplayStatus="CLOSED"></Pool>
      <MultiPool PoolType="XD" DisplayStatus="PAYING">
      </MultiPool>
      <Race RaceNo="1" RaceTime="12:53" RaceName="2YO HANDICAP"/>
      <Race RaceNo="2" RaceTime="13:23" RaceName="BM7 HANDICAP"/>
      <Race RaceNo="3" RaceTime="13:53" RaceName="MAIDEN PLATE">
       <TipsterTip TipsterId="0" Tips="4"/>
       <TipsterTip TipsterId="5" Tips="1-9-4-8"/>
       <Pool PoolType="A2" Available="Y" Abandoned="N" />
       <Pool PoolType="EX" Available="Y" Abandoned="N" />
       <Pool PoolType="F4" Available="Y" Abandoned="N" />
       <Runner RunnerNo="1" RunnerName="ALL TROOPS" />
       <Runner RunnerNo="2" RunnerName="SEQ THE STAR" />
       <Runner RunnerNo="3" RunnerName="SHADOW LAWN"/>
       <Runner RunnerNo="4" RunnerName="FREQUENDLY" />
     </Race>
     <Tipster TipsterId="0" TipsterName="LATE MAIL"/>
     <Tipster TipsterId="1" TipsterName="RADIO TAB"/>
     <Tipster TipsterId="2" TipsterName="TRACKMAN"/>
   </Meeting>
  </RaceDay>
  <RaceDay RaceDayDate="2016-06-30" >
   <Meeting MeetingCode="MR" MtgId="2299719559" VenueName="Las" >
    <Pool PoolType="CC" PoolDisplayStatus="SELLING"></Pool>
    <Pool PoolType="YD" PoolDisplayStatus="PAYING"></Pool>
    <Pool PoolType="VT" PoolDisplayStatus="CLOSED"></Pool>
    <Pool PoolType="MD" PoolDisplayStatus="CLOSED"></Pool>
    <MultiPool PoolType="VD" DisplayStatus="PAYING">
    </MultiPool>
    <Race RaceNo="1" RaceTime="12:53" RaceName="R2YO BHANDI" />
    <Race RaceNo="2" RaceTime="13:23" RaceName="XX 75 ZINDA" />
    <Race RaceNo="3" RaceTime="13:53" RaceName="PLATE RAIDEN" >
     <TipsterTip TipsterId="0" Tips="5"/>
     <TipsterTip TipsterId="5" Tips="2-1-4-8-4"/>
     <Pool PoolType="A2" Available="Y" Abandoned="N" />
     <Pool PoolType="EX" Available="Y" Abandoned="N" />
     <Pool PoolType="F4" Available="Y" Abandoned="N" />
     <Runner RunnerNo="1" RunnerName="ALL BROOKS" />
     <Runner RunnerNo="2" RunnerName="MIDDLE STAR" />
     <Runner RunnerNo="3" RunnerName="LONELY LAWN"/>
     <Runner RunnerNo="4" RunnerName="OBLIV" />
    </Race>
    <Tipster TipsterId="0" TipsterName="EARLY MAIL"/>
    <Tipster TipsterId="1" TipsterName="RADIO CAB"/>
    <Tipster TipsterId="2" TipsterName="JACKMAN"/>
  </Meeting>
 </RaceDay>
</NodeLevel3>
</NodeLevel2>
</NodeLevel1>
EOD;
?>

See how I have declared the XML in a string using Heredoc in PHP.

$xml_source = <<<EOD

When using Heredoc, we need to make sure that there is no blankspace after the opening identifier. So, "<<<EOD" must be followed by a newline "\n"; which means in the editor, after typing "<<<EOD" we need to press ENTER to move to the new line.

Heredoc helps us to avoid quote (' or ") usage problems. See, all the node attributes are wrapped in double quote. We have another method to define the XML string as shown below. 

/// We make sure that all single quotes are escaped
$xml_source = '<NodeLevel1><NodeLevel2><NodeLevel3>' .
              '<RaceDay Name="John O\'Neal" > ..... '; 
 
Ok, now let's proceed.

// LOAD the XML Root Object
$all_nodes = new SimpleXMLElement($xml_source);

// BROWSE to Certain PATH/NODE
$all_nodelevel3 = $all_nodes
                  ->xpath('/NodeLevel1/NodeLevel2/NodeLevel3');

// PRINT what we got
print_r($all_nodelevel3);

The above piece of code would load the XML data, create SimpleXMLElement Object with it. Then we are traversing to "/NodeLevel1/NodeLevel2/NodeLevel3" node in the XML tree. xpath method actually searches the SimpleXML node for children matching the XPATH provided as its argument. We don't add the trailing slash ('/') to the end of our XPATH. 

To get all the <NodeLevel1> we need to pass "/NodeLevel1" as argument to xpath() method.

Now, let's print all the <Runner> nodes in the above XML.

<?php
// LOAD the XML Root Object
$all_nodes = new SimpleXMLElement($xml_source);

// BROWSE to all <NodeLevel3>
$all_nodelevel3 = $all_nodes->xpath('/NodeLevel1/NodeLevel2/NodeLevel3');

// LOOP THRU <NodeLevel3> nodes
foreach($all_nodelevel3 as $nodelevel3)
{
  // GEt All <RaceDay> Nodes
  $all_racedays = $nodelevel3->RaceDay;
  
  // LOOP THRU All <RaceDay> Nodes
  foreach($all_racedays as $raceday)  
  {
// GET ALL <Meeting>
$all_meeting = $raceday->Meeting;
 
// Loop Thru <Meeting>
foreach($all_meeting as $meeting)
{
     // GET ALL RACE
     $all_race = $meeting->Race;

     // LOOP Thru <Race> Nodes
     foreach($all_race as $race)
     {
 
       // GET ALL <Runner> nodes
       $all_runners = $race->Runner;
 
       /// Note that some <Race> nodes don't have
       /// <Runner> nodes under it
       /// So, we check if <Runner> nodes exist
       if($all_runners)
       {
/// Loop Thru <Runner> nodes
foreach($all_runners as $runner)
{
          /// GEt <Runner> Node's attributes
          $atts = $runner->attributes();

          // Loop Thru Attributes
          $str = "";
          foreach($atts as $key => $val)
          {
$str .= "$key => $val, ";
          }
         // PRINT  
          echo "RUNNER  :: $str <br>";

}
       }
     }
}
 }
}
?>

Check the Output Below :: 

RUNNER :: RunnerNo => 1, RunnerName => ALL TROOPS, 
RUNNER :: RunnerNo => 2, RunnerName => SEQ THE STAR, 
RUNNER :: RunnerNo => 3, RunnerName => SHADOW LAWN, 
RUNNER :: RunnerNo => 4, RunnerName => FREQUENDLY, 
RUNNER :: RunnerNo => 1, RunnerName => ALL BROOKS, 
RUNNER :: RunnerNo => 2, RunnerName => MIDDLE STAR, 
RUNNER :: RunnerNo => 3, RunnerName => LONELY LAWN, 
RUNNER :: RunnerNo => 4, RunnerName => OBLIV, 

See, how we have used "foreach" loop structure to traverse nodes and get deeper into the XML Tree. foreach construct has been used considering that <RaceDay>, <Meeting>, <Race> and <Runner> nodes may appear in any number within their Parent Node in the XML Tree. 

We even used foreach($all_nodelevel3 as $nodelevel3) to consider that many <NodeLevel3> nodes co-exist within a single <NodeLevel2> node.

Secondly, we have used attributes() function to get all the attributes of a node.

Hope this helps.