Returns a list of words after reading a file in python

I have a text file called test.txt . I want to read it and return a list of all words (with the removal of new lines) from the file.

This is my current code:

 def read_words(test.txt): open_file = open(words_file, 'r') words_list =[] contents = open_file.readlines() for i in range(len(contents)): words_list.append(contents[i].strip('\n')) return words_list open_file.close() 

Running this code creates this list:

 ['hello there how is everything ', 'thank you all', 'again', 'thanks a lot'] 

I want the list to look like this:

 ['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot'] 
+7
source share
4 answers

Replace the string words_list.append(...) in the for loop with the following:

 words_list.extend(contents[i].split()) 

This will split each line into space characters, and then add each item in the resulting list to words_list .

Or as an alternative method to rewrite the entire function as an understanding of the list:

 def read_words(words_file): return [word for line in open(words_file, 'r') for word in line.split()] 
+13
source

Depending on the file size, this looks just as simple:

 with open(file) as f: words = f.read().split() 
+17
source

Here is how I would write it:

 def read_words(words_file): with open(words_file, 'r') as f: ret = [] for line in f: ret += line.split() return ret print read_words('test.txt') 

The function can be slightly reduced with itertools , but I personally find the result less readable:

 import itertools def read_words(words_file): with open(words_file, 'r') as f: return list(itertools.chain.from_iterable(line.split() for line in f)) print read_words('test.txt') 

The good thing about the second version is that it can be made completely generator-based and thus avoid storing all files in memory at the same time.

+5
source

There are several ways to do this. Here are a few:

If you do not need duplicate words :

 def getWords(filepath): with open('filepath') as f: return list(itertools.chain(line.split() for line in f)) 

If you want to return a list of words in which each word appears only once :

Note: this does not preserve word order

 def getWords(filepath): with open('filepath') as f: return {word for word in line.split() for line in f} # python2.7 return set((word for word in line.split() for line in f)) # python 2.6 

If you want the set - and-- to keep the word order :

 def getWords(filepath): with open('filepath') as f: words = [] pos = {} position = itertools.count() for line in f: for word in line.split(): if word not in pos: pos[word] = position.next() words.append(word) return sorted(words, key=pos.__getitem__) 

If you need a dictionary dictionary :

 def getWords(filepath): with open('filepath') as f: return collections.Counter(itertools.chain(line.split() for line in file)) 

Hope for this help

+3
source

All Articles