How to clear comments from raw sql file

I have a problem with cleaning comments and empty lines from an existing sql file. A file has more than 10 thousand lines, so cleaning it manually is not an option.

I have a little python script, but I have no idea how to handle comments inside multi-line inserts.

The code:

f = file( 'file.sql', 'r' ) t = filter( lambda x: not x.startswith('--') \ and not x.isspace() , f.readlines() ) f.close() t #<- here the cleaned data should be 

How it should work:

This needs to be cleared:

 -- normal sql comment 

This should remain like this:

 CREATE FUNCTION func1(a integer) RETURNS void LANGUAGE plpgsql AS $$ BEGIN -- comment [...] END; $$; INSERT INTO public.texts (multilinetext) VALUES (' and more lines here \' -- part of text \' [...] '); 
+10
source share
5 answers

Try the sqlparse module.

Updated example: leave comments inside insert values ​​and comments in CREATE FUNCTION blocks . You can change the behavior settings:

 import sqlparse from sqlparse import tokens queries = ''' CREATE FUNCTION func1(a integer) RETURNS void LANGUAGE plpgsql AS $$ BEGIN -- comment END; $$; SELECT -- comment * FROM -- comment TABLE foo; -- comment INSERT INTO foo VALUES ('a -- foo bar'); INSERT INTO foo VALUES (' a -- foo bar' ); ''' IGNORE = set(['CREATE FUNCTION',]) # extend this def _filter(stmt, allow=0): ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)] start = ' '.join(d.value for d in ddl[:2]) if ddl and start in IGNORE: allow = 1 for tok in stmt.tokens: if allow or not isinstance(tok, sqlparse.sql.Comment): yield tok for stmt in sqlparse.split(queries): sql = sqlparse.parse(stmt)[0] print sqlparse.sql.TokenList([t for t in _filter(sql)]) 

Output:

 CREATE FUNCTION func1(a integer) RETURNS void LANGUAGE plpgsql AS $$ BEGIN -- comment END; $$; SELECT * FROM TABLE foo; INSERT INTO foo VALUES ('a -- foo bar'); INSERT INTO foo VALUES (' a -- foo bar' ); 
+9
source

Adding an updated answer :)

 import sqlparse sql_example = """--comment SELECT * from test; INSERT INTO test VALUES (' -- test a '); """ print sqlparse.format(sql_example, strip_comments=True).strip() 

Exit:

 SELECT * from test; INSERT INTO test VALUES (' -- test a '); 

It achieves the same result, but also covers all other angular cases and a more concise one.

+2
source

This is an example of samplebias answer that works with your example:

 import sqlparse sql_example = """--comment SELECT * from test; INSERT INTO test VALUES (' -- test a '); """ new_sql = [] for statement in sqlparse.parse(sql_example): new_tockens = [stm for stm in statement.tokens if not isinstance(stm, sqlparse.sql.Comment)] new_statement = sqlparse.sql.TokenList(new_tockens) new_sql.append(new_statement.to_unicode()) print sqlparse.format("\n".join(new_sql)) 

Output:

 SELECT * from test; INSERT INTO test VALUES (' -- test a '); 
+1
source

This can be done using regular expressions. First you need to split the file line by line, after which you can split the file into comments. The following Perl program does this:

 #! /usr/bin/perl -w # Read hole file. my $file = join ('', <>); # Split by strings including the strings. my @major_parts = split (/('(?:[^'\\]++|\\.)*+')/, $file); foreach my $part (@major_parts) { if ($part =~ /^'/) { # Print the part if it is a string. print $part; } else { # Split by comments removing the comments my @minor_parts = split (/^--.*$/m, $part); # Print the remaining parts. print join ('', @minor_parts); } } 
0
source
 # Remove comments ie lines beginning with whitespace and '--' (using multi-line flag) re.sub('^\s*--.*\n?', '', query, flags=re.MULTILINE) 

The Regex line is explained:

  • ^ start of line
  • \ s whitespace
  • \ s * zero or more whitespace characters
  • - two hypens (static string pattern)
  • . * zero or more of any characters (i.e. the rest of the string)
  • \ n newline character
  • ? end of line
  • flags = re.M - multiline modifier

"When specified, the pattern character '^' matches at the beginning of a line and at the beginning of each line (immediately after each new line)"

See the Python regex documentation for more details:

https://docs.python.org/3/library/re.html

0
source

All Articles