Effectively replace multiline list strings with single_line list string

Question

Effectively replace multiline list strings with single_line list string

I am trying to parse the output from OS X mdls . For some keys, a value is a list of values. I need to fix these key pairs, values correctly. All lists of values begin with ( and then end with ) .

I need to be able to iterate over all key pairs, values, so that I can parse several outputs correctly (i.e. mdls run several files to create one output, where there is no difference between where the metadata of the file is located and the other starts). I have an example code below.

Is there a more efficient way to do this?

 import re mdls_output = """kMDItemAuthors = ( margheim ) kMDItemContentCreationDate = 2015-07-10 14:41:01 +0000 kMDItemContentModificationDate = 2015-07-10 14:41:01 +0000 kMDItemContentType = "com.adobe.pdf" kMDItemContentTypeTree = ( "com.adobe.pdf", "public.data", "public.item", "public.composite-content", "public.content" ) kMDItemCreator = "Safari" kMDItemDateAdded = 2015-07-10 14:41:01 +0000 """ mdls_lists = re.findall(r"^\w+\s+=\s\(\n.*?\n\)$", mdls_output, re.S | re.M) single_line_lists = [re.sub(r'\s+', ' ', x.strip()) for x in mdls_lists] for i, mdls_list in enumerate(mdls_lists): mdls_output = mdls_output.replace(mdls_list, single_line_lists[i]) print(mdls_output)

+5

python regex

smargh Jul 10 '15 at 16:46

source share

1 answer

meuh · Accepted Answer · 2015-07-10T17:54:13+0000

You can use the python regex routine , which can take a function as a replacement string . The function is called for each match with the matching object. The returned string replaces the match.

 def myfn(m): return re.sub(r'\s+', ' ', m.group().strip()) pat = re.compile(r"^\w+\s+=\s\(\n.*?\n\)$", re.S | re.M) mdls_output = pat.sub(myfn, mdls_output)

Effectively replace multiline list strings with single_line list string

More articles: