Getting url using RCurl gives a different date format than in browser

I am trying to clear a mobile phone-enabled webpage using RCurl at the following URL:

http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685

Using this code:

library(RCurl) options( RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.ABC Safari/525.13")) inurl <- getURL(http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685) 

Please note that I tried installing the user agent as a Chrome browser - the results I get are the same with or without this. When I look at the URL in Chrome, the dates are displayed the same as with the timestamp:

Screenshot of text on web page

And the HTML source corresponds to that:

 Last Updated: 24-Aug-2009 11:36<br> First Reported: 24-Aug-2009 11:24<br> 

But inside R, after I got the data from the URL, the dates are formatted as follows:

 Last Updated: 2009-08-24<br> First Reported: 2009-08-24<br> 

Any ideas what is going on here? I suppose the server responds to the browser user agent / Curl or region or language or something similar, and returns different data, but cannot understand what I need to set in RCurl parameters to change this.

+4
source share
1 answer

It looks like the server is expecting an Accept-Language header:

 library(RCurl) getURL("http://m.fire.tas.gov.au/?pageId=incidentDetails&closed_incident_no=161685", httpheader = c("Accept-Language" = "en-US,en;q=0.5")) 

works for me (returns First Reported: 24-Aug-2009 11:24<br> etc.). I discovered this using the HttpFox Firefox plugin.

0
source

All Articles