PHP XML Parser
PHP provides some really useful XML parsing features, but can be surprisingly difficult to use in some applications. The difficulty arises from the need to write schema-specific parser versions, or limit the schema to simple data.
There is a function SimpleXML() which is available as of PHP v5, but it can be ‘pretty useless when parsing mixed content’ (to use an expression from the PHP web site).
To escape these limitations we developed our own general purpose XML parser. Originally written for use under PHP v4 with amazon.com ECS data, it has matured into a solid tool that we use for all our projects which require parsing XML data. It works perfectly with PHP v4 or v5.
It is fast and reliable, schema agnostic and returns all content, simple complex or mixed, in an ordinary PHP array structure. In addition, it handles all operations including retrieval of data from a local or remote source and correctly handles error conditions.
How can I get it?
Our XML parser code represents much work and value for our own applications and customers, and is not freely available for download.
That said, neither do we restrict it’s use with licenses or other encumberances.
For business applications we would like an opportunity to provide it with support to meet your needs. For others, just tell us how you would like to use it and we will try to accommodate you.
In any event, just ask! You will find us easy to work with and our PHP XML parser without equal for your applications!
Who uses it?
Our XML parser is in use on many production web sites and is used as a component of many command line utilities which routinely handle documents with thousands of elements and very large data sets.
Because it is the parser we use in our amazon online store systems, it is currently in use on dozens of online store web sites and has proven solid stability! In this application alone, it handles tens of thousands of product data requests each day.
How to use it
Using our XML parser is as easy this…
$source = {FILENAME or URL of source XML}
include('xml_parser_class.php');
$xmlparser = new xml_parser_class();
if($parsed_data = $xmlparser->parse_source($source)){
print_r($parsed_data);
}
else{
print_r($xmlparser->status_msgs);
}
There are several options that may be configured as defaults, or set at runtime:
- Select between HTML or plain text status messages
- Array element name prefix for attribute values
- Array element name for multi-valued XML element marker
- Array element name for simple data portion of complex data elements
- Default parser character encoding
- Cached time attribute name added to cached XML source document root
Additional functionality:
- May be used to cache remote data with added timestamp attribute
- May use PHP fopen() or Unix sockets for remote sources
- Generates status / error messages for all operations
- May be dynamically reset for parsing multiple documents
- Parses ANY XML document without advance knowledge of schema
- May be included in parent objects for easy extension and maintenance
Example
For an example of how our parser produces an array from XML data, consider the following document…
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes”?>
<document xmlns=“http://allentech.net/xmldocs”>
<origin>
<generator>Allen Technology text2xml, http://allentech.net</generator>
<date>2008-02-13 18:56:36</date>
<source>test_source.txt</source>
<command>/usr/local/bin/text2xml -a test_source.txt</command>
</origin>
<line>A sample XML document.</line>
<section number=“1” name=“A Section Heading”>
<document name=“A dummy document element”/>
<section number=“1.1” name=“A Sub-section Heading”>
Simple data in a complex element
<line>This is a document line</line>
<line>And another document line</line>
<line>Three is the charm!</line>
</section>
</section>
</document>
Which produces this array structure…
Array
(
[document] => Array
(
[ATT:xmlns] => http://allentech.net/xmldocs
[origin] => Array
(
[generator] => Allen Technology text2xml, http://allentech.net
[date] => 2008-02-13 18:56:36
[source] => test_source.txt
[command] => /usr/local/bin/text2xml -a test_source.txt
)
[line] => A sample XML document.
[section] => Array
(
[ATT:number] => 1
[ATT:name] => A Section Heading
[document] => Array
(
[ATT:name] => A dummy document element
)
[section] => Array
(
[ATT:number] => 1.1
[ATT:name] => A Sub-section Heading
[SDATA] => Simple data in a complex element
[line] => Array
(
[MVT] => 1
[0] => This is a document line
[1] => And another document line
[2] => Three is the charm!
)
)
)
)
)
Some things to note:
1 Multivalued elements are indexed with a corresponding MVT element set true
2 Element attribute names are prefixed with ATT:
3 Simple data content of complex elements are named SDATA 
|