Python: how to capture output to a text file? (now only 25 of 530 lines)

Question

Python: how to capture output to a text file? (now only 25 of 530 lines)

I have done quite a bit of hiding on SO and a lot of searching and reading, but I must also admit that it related to programming in general. I try to learn when I go, and so I played with Python NLTK. In the script below, I can make everything work, except that it only writes what would be the first screen of multiscreen output, at least what I think of it.

Here's the script:

#! /usr/bin/env python import nltk # First we have to open and read the file: thefile = open('all_no_id.txt') raw = thefile.read() # Second we have to process it with nltk functions to do what we want tokens = nltk.wordpunct_tokenize(raw) text = nltk.Text(tokens) # Now we can actually do stuff with it: concord = text.concordance("cultural") # Now to save this to a file fileconcord = open('ccord-cultural.txt', 'w') fileconcord.writelines(concord) fileconcord.close()

And here is the beginning of the output file:

 Building index... Displaying 25 of 530 matches: y .  The Baobab Tree : Stories of Cultural Continuity The continuity evident regardless of ethnicity , and the cultural legacy of Africa as well . This Af

What am I missing here to write all 530 matches to a file?

+4

python python-2.7 nltk

John laudun Jun 15 '12 at 2:59

source share

2 answers

Update:

I found this to write the output of text.concordance to the Options file from the ntlk group. This is from 2010 and says:

The documentation for the Text class says: "designed to support the initial study of texts (through the interactive console) .... If you want to write a program that uses these analyzes, then you should go around the Text class and use the corresponding function or class analysis directly."

If nothing has changed in the package since then, this may be the source of your problem.

--- earlier ---

I do not see a problem writing to a file using writelines () :

file.writelines (sequence)
Write a sequence of lines to a file. A sequence can be any iterable object that produces strings, usually a list of strings. There is no return. (The name is intended to match readlines (); writelines () does not add line separators.)

Pay attention to the part in italics, have you viewed the output file in different editors? Perhaps there is data, but is not displayed correctly due to the lack of end of line separators?

Are you sure this part generates the data you want to output?

  concord = text.concordance("cultural")

I am not familiar with nltk , so I am just asking as part of troubleshooting possible sources of the problem.

+2

Levon Jun 15 '12 at 3:06

source share

bezmax · Accepted Answer · 2012-06-15T03:49:21+0000

text.concordance(self, word, width=79, lines=25) seems to have other parameters as per the manual .

I see no way to extract the concordance index, however, the concordance printing code seems to have this part: lines = min(lines, len(offsets)) , so you can just pass sys.maxint as the last argument:

 concord = text.concordance("cultural", 75, sys.maxint)

Added:

Now, looking at your original code, I don’t see the way it could work before. text.concordance nothing, but prints everything to stdout with print . So a simple option would be to redirect stdout to your file, for example:

 import sys .... # Open the file fileconcord = open('ccord-cultural.txt', 'w') # Save old stdout stream tmpout = sys.stdout # Redirect all "print" calls to that file sys.stdout = fileconcord # Init the method text.concordance("cultural", 200, sys.maxint) # Close file fileconcord.close() # Reset stdout in case you need something else to print sys.stdout = tmpout

Another option is to use the appropriate classes directly and omit the text wrapper. Just copy the bits from here and combine them with the bits here , and you're done.

Python: how to capture output to a text file? (now only 25 of 530 lines)

More articles: