How to eliminate duplicate entries in Python while maintaining case sensitivity?

I am looking for a way to remove duplicate entries from a Python list, but with a twist; The final list should be case sensitive with preference for uppercase words.

For example, between cupand cupI need to save cup, not cup. Unlike other common solutions that offer to be used first lower(), I would prefer to keep the string case here, and I would prefer to keep it with a capital letter above the string in lower case.

Again, I am trying to include this list: [Hello, hello, world, world, poland, Poland]

in it:

[Hello, world, Poland]

How should I do it?

Thanks in advance.

+4
source share
4 answers

This does not preserve order words, but it creates a list of "unique" words with a preference for capital letters.

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']

If you want to maintain order, as shown in words, you can use collections.OrderedDict :

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
+6
source

Using setto track noticed words:

def uniq(words):
    seen = set()
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible. (3.3+)
        if l in seen:
            continue
        seen.add(l)
        yield word

Using:

>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']

UPDATE

The previous version does not care about upper case preference in lower case. In the updated version, I used minas @TheSoundDefense.

import collections

def uniq(words):
    seen = collections.OrderedDict()  # Use {} if the order is not important.
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible (3.3+)
        seen[l] = min(word, seen.get(l, word))
    return seen.values()
+4
source

"", , , :

orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
  for i in range(len(unique_list)):
    if unique_list[i].lower() == word.lower():
      unique_list[i] = min(word, unique_list[i])
      break
  else:
    unique_list.append(word)

min .

+2

Some of the best answers here, but hopefully something simple, different and useful. This code satisfies the conditions of your test, consecutive pairs of matching words, but will not cope with anything more complicated; such as inconsistent pairs, non-pairs, or non-line pairs. Everything is more complicated, and I would take a different approach.

p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']

def pref_upper(p):
    q = []
    a = 0
    b = 1

    for x in range(len(p) /2):
            if p[a][0].isupper() and p[b][0].isupper():
                    q.append(p[a])
            if p[a][0].isupper() and p[b][0].islower():
                    q.append(p[a])
            if p[a][0].islower() and p[b][0].isupper():
                    q.append(p[b])
            if p[a][0].islower() and p[b][0].islower():
                    q.append(p[b])
            a +=2
            b +=2
    return q

print pref_upper(p1)
print pref_upper(p2)
0
source

All Articles