Well, in the first place, even XML is a subset of SGML, a valid SGML file does not have to be a well-formed XML file. XML is more strict and does not use all the features that SGML offers.
Since the DOMDocument based on XML (not SGML), this is not entirely compatible.
Next to this problem, please see 2.2 Open Headers of Financial Exchanges in Ofexfin1.doc, this explains to you that
The contents of an Open Financial Exchange file consists of a simple set of headers, followed by the content defined by this header
and further:
An empty line follows the last heading. Then (for the OFXSGML type) SGML-readable data begins with the <OFX> tag.
So, find the first blank line and split it until it appears. Then load the SGML part into a DOMDocument, first converting SGML to XML:
$source = fopen('file.ofx', 'r'); if (!$source) { throw new Exception('Unable to open OFX file.'); } // skip headers of OFX file $headers = array(); $charsets = array( 1252 => 'WINDOWS-1251', ); while(!feof($source)) { $line = trim(fgets($source)); if ($line === '') { break; } list($header, $value) = explode(':', $line, 2); $headers[$header] = $value; } $buffer = ''; // dead-cheap SGML to XML conversion // see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx while(!feof($source)) { $line = trim(fgets($source)); if ($line === '') continue; $line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line); if (substr($line, -1, 1) !== '>') { list($tag) = explode('>', $line, 2); $line .= '</' . substr($tag, 1) . '>'; } $buffer .= $line ."\n"; } // use DOMDocument with non-standard recover mode $doc = new DOMDocument(); $doc->recover = true; $doc->preserveWhiteSpace = false; $doc->formatOutput = true; $save = libxml_use_internal_errors(true); $doc->loadXML($buffer); libxml_use_internal_errors($save); echo $doc->saveXML();
This code example outputs the following (reformatted) XML, which also shows that the DOMDocument loaded the data correctly:
<?xml version="1.0"?> <OFX> <SIGNONMSGSRSV1> <SONRS> <STATUS> <CODE>0</CODE> <SEVERITY>INFO</SEVERITY> </STATUS> <DTSERVER>20130331073401</DTSERVER> <LANGUAGE>SPA</LANGUAGE> </SONRS> </SIGNONMSGSRSV1> <BANKMSGSRSV1> <STMTTRNRS> <TRNUID>0</TRNUID> <STATUS> <CODE>0</CODE> <SEVERITY>INFO</SEVERITY> </STATUS> <STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM> </STMTRS> </STMTTRNRS> </BANKMSGSRSV1> </OFX>
I do not know if this can be confirmed against DTD. Maybe it works. Also, if SGML is not written with values ββthat have a tag on one line (and only one element is needed per line), then this fragile conversion will break.