Trash in a file after truncating (0) in Python

Suppose there is a test.txt file containing the string 'test' .

Now consider the following Python code:

 f = open('test', 'r+') f.read() f.truncate(0) f.write('passed') f.flush(); 

Now I expect test.txt to contain 'passed' now, however there are some weird characters test.txt !

Update: flush after truncate does not help.

+6
source share
6 answers

This is because truncation does not change the position of the stream.

When you read() file, you move the position to the end. Thus, sequential write will write to the file from this position. However, when you call flush() , it seems that it not only tries to write the buffer to the file, but also performs some error checking and corrects the current position of the file. When Flush() is called after truncate(0) , it does not write anything (the buffer is empty), then checks the file size and places the position in the first appropriate place (which is 0 ).

UPDATE

The Python file function is NOT just a wrapper around the equivalents of the standard C library, but knowing the C functions helps you know what is happening more accurately.

From the ftruncate man page :

The value of the search pointer does not change when the ftruncate () function is called.

From the fflush man page :

If a stream indicates an input stream or an update stream in which the most recent operation was entered, this stream is discarded if it is searchable and is not yet at the end of the file. Cleaning the input stream discards any buffered input and adjusts the file pointer so that the next input operation accesses the byte after the last read.

This means that if you put flush before truncate it will have no effect. I checked, and it was.

But to install flush after truncate :

If the stream indicates an output stream or an update stream in which the most recent operation has not been entered, fflush () forces writing any unwritten data for this stream to a file, and the st_ctime and st_mtime fields of the main file are marked for updating.

The man page does not mention the search pointer when explaining output streams when the last operation was not entered. (Here is our last truncate operation)

UPDATE 2

I found something in the source code of Python-3.2.2\Modules\_io\fileio.c:837 : Python-3.2.2\Modules\_io\fileio.c:837

 #ifdef HAVE_FTRUNCATE static PyObject * fileio_truncate(fileio *self, PyObject *args) { PyObject *posobj = NULL; /* the new size wanted by the user */ #ifndef MS_WINDOWS Py_off_t pos; #endif ... #ifdef MS_WINDOWS /* MS _chsize doesn't work if newsize doesn't fit in 32 bits, so don't even try using it. */ { PyObject *oldposobj, *tempposobj; HANDLE hFile; ////// THIS LINE ////////////////////////////////////////////////////////////// /* we save the file pointer position */ oldposobj = portable_lseek(fd, NULL, 1); if (oldposobj == NULL) { Py_DECREF(posobj); return NULL; } /* we then move to the truncation position */ ... /* Truncate. Note that this may grow the file! */ ... ////// AND THIS LINE ////////////////////////////////////////////////////////// /* we restore the file pointer position in any case */ tempposobj = portable_lseek(fd, oldposobj, 0); Py_DECREF(oldposobj); if (tempposobj == NULL) { Py_DECREF(posobj); return NULL; } Py_DECREF(tempposobj); } #else ... #endif /* HAVE_FTRUNCATE */ 

Look at the two lines that I pointed out ( /////This Line///// ). If your platform is Windows, then it saves the position and returns it after truncation.

To my surprise, most of the flush functions inside Python 3.2.2 functions either did nothing or did not call the fflush C function at fflush . The truncated part of 3.2.2 was also very undocumented. However, I found something interesting in Python 2.7.2 sources. First I found this in Python-2.7.2\Objects\fileobject.c:812 in the truncate implementation:

  /* Get current file position. If the file happens to be open for * update and the last operation was an input operation, C doesn't * define what the later fflush() will do, but we promise truncate() * won't change the current position (and fflush() *does* change it * then at least on Windows). The easiest thing is to capture * current pos now and seek back to it at the end. */ 

To summarize, I think this is completely platform dependent. I checked the default Python 3.2.2 for Windows x64 and got the same results as you. I don’t know what is happening on * nixes.

+5
source

Yes, it is true that truncate() does not move the position, but says it is just like death:

 f.read() f.seek(0) f.truncate(0) f.close() 

this works great;)

+3
source

Truncation does not change the file position.

Also note that even if the file is opened in read + write mode, you cannot simply switch between the two types of operations (a search operation is required to be able to switch from read to write or vice versa).

+2
source

I expect the following code you wanted to write:

 open('test.txt').read() open('test.txt', 'w').write('passed') 
0
source

If someone is in the same boat as me, here is my problem with the solution:

  • I have a program that is always on, i.e. it does not stop, continues to poll data and writes to the log file
  • The problem is that I want to split the main file as soon as it reaches the 10 MB mark, so I wrote the following program.
  • I also found a solution to the problem when truncate was writing null values ​​to the file, which caused a further problem.

Below is an illustration of how I solved this problem.

 f1 = open('client.log','w') nowTime = datetime.datetime.now().time() f1.write(os.urandom(1024*1024*15)) #Adding random values worth 15 MB if (int(os.path.getsize('client.log') / 1048576) > 10): #checking if file size is 10 MB and above print 'File size limit Exceeded, needs trimming' dst = 'client_'+ str(randint(0, 999999)) + '.log' copyfile('client.log', dst) #Copying file to another one print 'Copied content to ' + str(dst) print 'Erasing current file' f1.truncate(0) #Truncating data, this works fine but puts the counter at the last f1.seek(0) #very important to use after truncate so that new data begins from 0 print 'File truncated successfully' f1.write('This is fresh content') #Dummy content f1.close() print 'All Job Processed' 
0
source

It depends. If you want to keep the file open and access it without closing it, then flush will force the file to be written. If you close the file immediately after the flash, then you will not need it, because closing will be for you. This is my understanding from docs

-1
source

All Articles