How to parse this OFX file?

This is the original file x, since it comes from m bank (don’t worry, theres nothing sensitive, I cut out the middle part with all transactions)

The Open Financial Exchange (OFX) is a data flow format for exchanging financial information that has evolved from Microsoft Open Financial Connectivity (OFC) and Intuit Open Exchange File Formats.

Now I need to make it out. I have already seen that question , but this is not a duplicate, because I am interested in how to do it.

I'm sure I can find some clever regular expressions that will do the job, but it is ugly and error-prone (if the format is changed, some fields may be missing, formatting / spaces are different, etc. etc ..)

OFXHEADER:100 DATA:OFXSGML VERSION:102 SECURITY:NONE ENCODING:USASCII CHARSET:1252 COMPRESSION:NONE OLDFILEUID:NONE NEWFILEUID:NONE <OFX> <SIGNONMSGSRSV1> <SONRS> <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <DTSERVER>20110420000000[+1:CET] <LANGUAGE>ENG </SONRS> </SIGNONMSGSRSV1> <BANKMSGSRSV1> <STMTTRNRS> <TRNUID>1 <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <STMTRS> <CURDEF>EUR <BANKACCTFROM> <BANKID>20404 <ACCTID>02608983629 <ACCTTYPE>CHECKING </BANKACCTFROM> <BANKTRANLIST> <DTSTART>20110207 <DTEND>20110419 <STMTTRN> <TRNTYPE>XFER <DTPOSTED>20110205000000[+1:CET] <TRNAMT>-6.12 <FITID>C74BD430D5FF2521 <NAME>unbekannt <MEMO>BILLA DANKT 1265P K2 05.02.UM 17.49 </STMTTRN> <STMTTRN> <TRNTYPE>XFER <DTPOSTED>20110207000000[+1:CET] <TRNAMT>-10.00 <FITID>C74BE0F90A657901 <NAME>unbekannt <MEMO>AUTOMAT 13177 KARTE2 07.02.UM 10:22 </STMTTRN> ............................. goes on like this ........................ <STMTTRN> <TRNTYPE>XFER <DTPOSTED>20110418000000[+1:CET] <TRNAMT>-9.45 <FITID>C7A5071492D14D29 <NAME>unbekannt <MEMO>HOFER DANKT 0408P K2 18.04.UM 18.47 </STMTTRN> </BANKTRANLIST> <LEDGERBAL> <BALAMT>1992.29 <DTASOF>20110420000000[+1:CET] </LEDGERBAL> </STMTRS> </STMTTRNRS> </BANKMSGSRSV1> </OFX> 

I am currently using this code which gives me the desired result:

 <? $files = array(); $files[] = '***_2011001.ofx'; $files[] = '***_2011002.ofx'; $files[] = '***_2011003.ofx'; system('touch file.csv && chmod 777 file.csv'); $fp = fopen('file.csv', 'w'); foreach($files as $file) { echo $file."...\n"; $content = file_get_contents($file); $content = str_replace("\n","",$content); $content = str_replace(" ","",$content); $regex = '|<STMTTRN><TRNTYPE>(.+?)<DTPOSTED>(.+?)<TRNAMT>(.+?)<FITID>(.+?)<NAME>(.+?)<MEMO>(.+?)</STMTTRN>|'; echo preg_match_all($regex,$content,$matches,PREG_SET_ORDER)." matches... \n"; foreach($matches as $match) { echo "."; array_shift($match); fputcsv($fp, $match); } echo "\n"; } echo "done.\n"; fclose($fp); 

it's really ugly, and if it were a valid xml file, I would personally kill myself for it, but how to do it better?

+8
xml php regex parsing ofx
source share
2 answers

Your code seems great given that the file is not XML or even SGML . The only thing you can do is try to make a more general SAX-like parser. That is, you just look at the input stream one block at a time (where the block can be any, for example, a string or just a given number of characters). Then call the callback function every time you come across <ELEMENT> . You can even look as bizarre as creating a parser class, where you can register callback functions that listen on certain elements.

It will be more general and less ugly (for some definition of ugly), but it will be more support code. It's nice to do and nice to have if you need to parse this file format a lot (or in a lot of different options). If your hosted code is the only place you do, then just KISS .

+4
source share
 // Load Data String $str = file_get_contents($fLoc); $MArr = array(); // Final assembled master array // Fetch all transactions preg_match_all("/<STMTTRN>(.*)<\/STMTTRN>/msU",$str,$m); if ( !empty($m[1]) ) { $recArr = $m[1]; unset($str,$m); // Parse each transaction record foreach ( $recArr as $i => $str ) { $_arr = array(); preg_match_all("/(^\s*<(?'key'.*)>(?'val'.*)\s*$)/m",$str,$m); foreach ( $m["key"] as $i => $key ) { $_arr[$key] = trim($m["val"][$i]); // Reassemble array key => val } array_push($MArr,$_arr); } } print_r($MArr); 
0
source share

All Articles