Sort a list of strings based on regular expression matching or something similar

Question

Sort a list of strings based on regular expression matching or something similar

I have a text file that looks something like this:

random text random text, can be anything blabla %A blabla random text random text, can be anything blabla %D blabla random text random text, can be anything blabla blabla %F random text random text, can be anything blabla blabla random text random text, %C can be anything blabla blabla

When I readlines() it enters, it becomes a list of sentences. Now I want this list to be sorted by letter after % . So basically, when sorting is applied to the above, it should look like this:

 random text random text, can be anything blabla %A blabla random text random text, %C can be anything blabla blabla random text random text, can be anything blabla %D blabla random text random text, can be anything blabla blabla %F random text random text, can be anything blabla blabla

Is there a good way to do this, or will I have to break each row into columns and then move the letters to a specific column and then sort with key=operator.itemgetter(col) ?

thanks

+4

python sorting

garg Jul 04 '09 at 15:30

source share

4 answers

how about this? hope this helps.

 def k(line): v = line.partition("%")[2] v = v[0] if v else 'z' # here z stands for the max value return v print ''.join(sorted(open('data.txt', 'rb'), key = k))

+3

sunqiang Jul 04 '09 at 15:40

source share

You can use the key custom function to compare strings. Using lambda syntax, you can write this inline, for example:

 strings.sort(key=lambda str: re.sub(".*%", "", str));

Calling re.sub(".*%", "", str) effectively removes anything before the first percent sign, so if there is a percent sign in the line, it will compare what comes after it, otherwise it will compare the whole line.

Pedantically speaking, this not only uses the letter following the percent sign, but also uses everything after. If you want to use a letter, and only the letter will try this slightly larger line:

 strings.sort(key=lambda str: re.sub(".*%(.).*", "\\1", str));

+1

John kugelman Jul 04 '09 at 15:40

source share

Here is a quick and dirty approach. Without knowing more about the requirements of a kind, I do not know if this satisfies your needs.

Suppose your list is stored in ' listoflines ':

 listoflines.sort( key=lambda x: x[x.find('%'):] )

Note that this sorts all lines without the% character by their final character.

+1

Brandon e taylor Jul 04 '09 at 15:45

source share

llimllib · Accepted Answer · 2009-07-04T15:38:59+0000

 In [1]: def grp(pat, txt): ...: r = re.search(pat, txt) ...: return r.group(0) if r else '&' In [2]: y Out[2]: ['random text random text, can be anything blabla %A blabla', 'random text random text, can be anything blabla %D blabla', 'random text random text, can be anything blabla blabla %F', 'random text random text, can be anything blabla blabla', 'random text random text, %C can be anything blabla blabla'] In [3]: y.sort(key=lambda l: grp("%\w", l)) In [4]: y Out[4]: ['random text random text, can be anything blabla %A blabla', 'random text random text, %C can be anything blabla blabla', 'random text random text, can be anything blabla %D blabla', 'random text random text, can be anything blabla blabla %F', 'random text random text, can be anything blabla blabla']

Sort a list of strings based on regular expression matching or something similar

More articles: