Removing a new line from a csv file

I am trying to process a csv file in python that has a ^ M character in the middle of each line / line, which is a new line. I cannot open the file in any mode except "rU".

If I open the file in "rU" mode, it reads in a new line and splits the file (creating a new line) and gives me twice as many lines.

I want to completely delete a new line. How?

+4
source share
1 answer

Please note that the docs say:

csvfile can be any object that supports the iterator protocol and returns a string each time the next() method is called. File objects and list objects are suitable.

So, you can always attach a filter to a file before transferring it to a reader or DictReader . Instead of this:

 with open('myfile.csv', 'rU') as myfile: for row in csv.reader(myfile): 

Do it:

 with open('myfile.csv', 'rU') as myfile: filtered = (line.replace('\r', '') for line in myfile) for row in csv.reader(filtered): 

This '\r' is a way of writing ^M in Python (and C). This way, it just removes all ^M characters, no matter where they appear, replacing every empty line.


I assume that I want to change the file permanently, and not filter it.

First, if you want to modify the file before running the Python script on it, why not do it from outside of Python? sed , tr , many text editors, etc. can do it all for you. Here is a GNU sed example:

 gsed -i'' 's/\r//g' myfile.csv 

But if you want to do this in Python, this is not much more detail, and you may find it more readable, therefore:

Firstly, you cannot change the file in place if you want to insert or delete from the middle. The usual solution is to write a new file and either move the new file over the old one (Unix only) or delete the old one (cross-platform).

Cross platform version:

 os.rename('myfile.csv', 'myfile.csv.bak') with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile: for line in infile: outfile.write(line.replace('\r')) os.remove('myfile.csv.bak') 

Less awkward but Unix-only version:

 temp = tempfile.NamedTemporaryFile(delete=False) with open('myfile.csv', 'rU') as myfile, closing(temp): for line in myfile: temp.write(line.replace('\r')) os.rename(tempfile.name, 'myfile.csv') 
+11
source

All Articles