SQL query with import comments in R from file

Several posters asked similar questions here, and they took from me 80% of the way to read text files with sql queries in them in R for use as input to RODBC:

Import multi-line SQL query in one line

RODBC temporary error when connecting to MS SQL Server

However, in my sql files there are a few comments in them (like -comment on this and that). My question is, how can I get around either deleting comment lines from a query during import, or making sure that the resulting line supports line breaks without adding comments to real queries?

For example, query6.sql:

--query 6 select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1 from data.table a6 group by a6.column1 

becomes:

 sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' ')) sqlStr "--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1" 

when reading in R.

+6
source share
6 answers

Are you sure you cannot just use it as is? This works even though it spans several lines and has a comment:

 > library(sqldf) > sql <- "select * -- my select statement + from BOD + " > sqldf(sql) Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 

This also works:

 > sql2 <- c("select * -- my select statement", "from BOD") > sql2.paste <- paste(sql2, collapse = "\n") > sqldf(sql2.paste) Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 
+2
source

I had problems with a different answer, so I changed Roman and made a small function. This worked for all my test cases, including several comments, single-line and partial comments.

 read.sql <- function(filename, silent = TRUE) { q <- readLines(filename, warn = !silent) q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments q <- paste(q, collapse = " ") return(q) } 
+2
source

Something like that?

 > cat("--query 6 + select a6.column1, + a6.column2, + count(a6.column3) as counts + --count the number of occurences in table 1 + from data.table a6 + group by a6.column1", file = "query6.sql") > > my.q <- readLines("query6.sql") Warning message: In readLines("query6.sql") : incomplete final line found on 'query6.sql' > my.q [1] "--query 6" "select a6.column1, " [3] "a6.column2," "count(a6.column3) as counts" [5] "--count the number of occurences in table 1 " "from data.table a6" [7] "group by a6.column1" > find.com <- grepl("--", my.q) > > my.q <- my.q[!find.com] > paste(my.q, collapse = " ") [1] "select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1" > > unlink("query6.sql") > rm(list = ls()) 
+1
source

had to solve a similar problem recently using a different language and still find R to make it easier to implement

 readSQLFile <- function(fname, retainNewLines=FALSE) { lines <- readLines(fname) #remove -- type comments lines <- vapply(lines, function(x) { #handle /* -- */ type comments if (grepl("/\\*(.*)--", x)) return(x) strsplit(x,"--")[[1]][1] }, character(1)) #remove /* */ type comments sqlstr <- paste(lines, collapse=ifelse(retainNewLines, "&&&&&&&&&&" , " ")) sqlstr <- gsub("/\\*(.|\n)*?\\*/","",sqlstr) if (retainNewLines) { sqlstr <- strsplit(sqlstr, "&&&&&&&&&&")[[1]] sqlstr <- sqlstr[sqlstr!=""] } sqlstr } #readSQLFile #example fname <- tempfile("sql",fileext=".sql") cat("--query 6 select a6.column1, --trailing comments a6.column2, ---test triple - count(a6.column3) as counts, --/* funny comment */ a6.column3 - a6.column4 ---test single - /*count the number of occurences in table 1; test another comment style */ from data.table a6 /* --1st weirdo comment */ /* --2nd weirdo comment */ group by a6.column1\n", file=fname) #remove new lines readSQLFile(fname) #retain new lines readSQLFile(fname, TRUE) unlink(fname) 
0
source

You can use readChar() instead of readLines() . I had a persistent problem with mixed comment ( -- or /* */ ), and it always worked well for me.

 sql <- readChar(path.to.file, file.size(path.to.file)) query <- sqlQuery(con, sql, stringsAsFactors = TRUE) 
0
source

Summary

clean_query function:

  • Removes all mixed comments.
  • Creates single line output
  • Accepts an SQL path or text string
  • Simple

Function

 require(tidyverse) # pass in either a text query or path to a sql file clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){ # if sql path, read, otherwise assume text input if( str_detect(text_or_path, "(?i)\\.sql$") ){ text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n") } # echo original query to the console # (unnecessary, but helpful for status if passing sequential queries to a db) cat("\nThe query you're processing is: \n", text_or_path, "\n\n") # return text_or_path %>% # remove all demarked /* */ sql comments gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>% # remove all demarked -- comments gsub(pattern = '--[^\r\n]*', replacement = ' ') %>% # remove everything after the query-end semicolon gsub(pattern = ';.*', replacement = ' ') %>% #remove any line break, tab, etc. gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>% # remove extra whitespace gsub(pattern = ' +', replacement = ' ') } 

You can attach regular expressions together if you want obscurely long expressions, but I recommend readable code.



Output for "query6.sql"

 [1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 " 



Additional text input example

 query <- " /* this query has intentionally messy comments */ Select COL_A -- with a comment here ,COL_B ,COL_C FROM -- and some helpful comment here Database.Datatable ; -- or wherever /* and some more comments here */ " 

Call Function:

 clean_query(query) 

Output:

 [1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable " 



If you want to check reading from a .sql file:

 temp_path <- path.expand("~/query.sql") cat(query, file = temp_path) clean_query(temp_path) file.remove(temp_path) 
0
source

All Articles