Python XML parsing from website

I am trying to parse from a site. I am stuck. I will cite the XML below. This comes from a website. I have two questions. What is the best way to read xml from a website, and then I had problems skewing in xml to get the speed I needed.

I need the number Base: OBS_VALUE 0.12

What I still have:

from xml.dom import minidom import urllib document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') web = urllib.urlopen(document) get_web = web.read() xmldoc = minidom.parseString(document) ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0] ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0] for line in ff_series: price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data print(price) 

XML from webiste:

 -<Header> <ID>FFD</ID> <Test>false</Test> <Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared> <Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> <Contact> <Name xml:lang="en">Public Information Web Team</Name> <Email>ny.piwebteam@ny.frb.org</Email> </Contact> </Sender> <!--ReportingBegin></ReportingBegin--> </Header> <ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> <ffbase:Key> <base:FREQ>D</base:FREQ> <base:RATE>FF</base:RATE> <base:MATURITY>O</base:MATURITY> <ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> </ffbase:Key> <ff:Obs OBS_CONF="F" OBS_STATUS="A"> <base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD> <base:OBS_VALUE>0.12</base:OBS_VALUE> 
+8
python xml
source share
2 answers

If you want to stick with xml.dom.minidom try this ...

 from xml.dom import minidom import urllib url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily' xml_str = urllib.urlopen(xml_str).read() xmldoc = minidom.parseString(xml_str) obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE') # prints the first base:OBS_VALUE it finds print obs_values[0].firstChild.nodeValue # prints the second base:OBS_VALUE it finds print obs_values[1].firstChild.nodeValue # prints all base:OBS_VALUE in the XML doc for obs_val in obs_values: print obs_val.firstChild.nodeValue 

However, if you want to use lxml, use the underrun solution. In addition, your source code had some errors. In fact, you were trying to parse a document variable that was a web address. You needed to parse the xml returned from the site, which in your example is the get_web variable.

+7
source share

Take a look at your code:

 document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') web = urllib.urlopen(document) get_web = web.read() xmldoc = minidom.parseString(document) 

I'm not sure if you have the correct document if you do not want http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr because what you get (a group of guys in in this case, the lines listed next to each other are automatically combined).

After that, you will do some work on creating get_web, but then you will not use it on the next line. Instead, you are trying to parse your document , which is the url ...

In addition, I would completely suggest you use ElementTree, preferably lxml ElementTree ( http://lxml.de/ ). In addition, lxml etree parser accepts a file-like object, which may be a urllib object. If you did this by straightening the rest of your document, you could do this:

 from lxml import etree from io import StringIO import urllib url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily' root = etree.parse(urllib.urlopen(url)) for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'): price = obs.xpath('./base:OBS_VALUE').text print(price) 
+3
source share

All Articles