How to convert a text file to a list while removing duplicate words and sorting the list in Python?

I am trying to read lines in a file, split lines into words and add individual words to the list if they are not already on the list. Finally, the words must be sorted. I have been trying to get this right for a while, and I understand the concepts, but I'm not sure how to get the exact language and placement. Here is what I have:

filename = raw_input("Enter file name: ")
openedfile = open(filename)
lst = list()
for line in openedfile:
    line.rstrip()
    words = line.split()
for word in words:
    if word not in lst:
        lst.append(words)
print lst
+4
source share
4 answers

If you break a text file into words based on spaces, just use split()for everything. Nothing happens by reading each line and removing it, because it is split()already processing all this.

So, to get the initial list of words, you only need:

filename = raw_input("Enter file name: ")
openedfile = open(filename)
wordlist = openedfile.read().split()

, , :

wordset = set(wordlist)

, , :

words = sorted(wordset)

:

filename = raw_input("Enter file name: ")
with open(filename) as stream:
    words = sorted(set(stream.read().split()))

(NB: with )

+2

-, ,

lst.append()

, . lst.append(). .

-,

lst.sort()

+1

for for word in words:, - lst.append(words) - words lst, , - lst.append(word).

, for - for word in words: for line in openedfile:, .

, , , - lst.sort() .

, with, , , .

, set(), not in list O (n), . set, , , .

list(..) .

-

filename = raw_input("Enter file name: ")
with open(filename) as openedfile:
    s = set()
    for line in openedfile:
        line.rstrip()
        words = line.split()
        for word in words:
            s.update(words)
    lst = list(s)
    lst.sort()
    print lst
+1

:

  • set . , .
  • 'r', .
  • Sorting is very simple (also for sets), just use sorted().

Something like this works:

filename = raw_input("Enter file name: ")

words = set()
with open(filename, 'r') as myfile:
    for line in myfile.readlines():
        new_words = line.strip().split(' ')
        words.update(new_words)

print sorted(words)
+1
source

All Articles