How to get a very long XML string from an SQL database using R?

I have a script to get an XML file from an SQL database. Here is how I do it:

library(RODBC) library(XML) myconn <- odbcConnect("mydsn") query.text <- "SELECT xmlfield FROM db WHERE id = 12345" doc <- sqlQuery(myconn, query.text, stringsAsFactors=FALSE) doc <- iconv(doc[1,1], from="latin1", to="UTF-8") doc <- xmlInternalTreeParse(doc, encoding="UTF-8") 

However, parsing did not work for a particular database row, although it did work when I copied the contents of this field to a separate file and parsed it. Two days later, a โ€œtrial errorโ€ I identified the main problem. It seems that requesting small XML files in this way does not cause any problems, but when I request larger files, the line breaks after 65534 characters. Therefore, the end of the XML file is missing and the file cannot be parsed.

I thought this might be a general limitation of ODBC connections on my computer. However, another program that also uses ODBC to retrieve the same XML field from the same database does this without any problems. Therefore, I assume this is a problem of R

Any ideas how to fix this?

+2
sql r odbc
source share
2 answers

I wrote to the author of the package and finally got the following answer:

Your inability to read is not my problem, and this is not a reasonable excuse.

The manual says

'\ item [Character types] Character types can be classified in three ways: fixed or variable length, maximum size, and used kit symbol. Most commonly used types \ footnote {SQL names for
these are \ code {CHARACTER VARYING} and \ code {CHARACTER}, but these are too cumbersome for routine use.} - \ code {varchar} for short
strings of variable length (up to a maximum) and \ code {char} for
short lines of a fixed length (usually on the right side with spaces).
The "short" value differs in DBMS and is at least 254, often several thousand - often other types will be available for longer
character strings. There is a sanity check that will only allow lines up to 65535 bytes when reading: this can be removed by recompiling \ pkg {RODBC}. ''

This guide can be found in the doc directory of the RODBC package. This information is not contained in the reference guide.

At the same time, I found a good solution to extract my data without using RODBC , I did not try to recompile this package. But I hope that this answer will be useful to those who have problems with the same question.

+3
source share

If you want to change the source of RODBC and recompile it, it is easy enough to use github and the devtools package:

Now you can read in large lines. However, you may receive errors due to attempts to allocate too much memory (the line following the health check:

  thisHandle->ColData[i].pData = Calloc(nRows * (datalen + 1), char); 

therefore, the easiest way to continue is to set the rows_at_time = 1 argument in your sqlQuery call from R

NTN

+3
source share

All Articles