Effectively replace multiline list strings with single_line list string

I am trying to parse the output from OS X mdls . For some keys, a value is a list of values. I need to fix these key pairs, values ​​correctly. All lists of values ​​begin with ( and then end with ) .

I need to be able to iterate over all key pairs, values, so that I can parse several outputs correctly (i.e. mdls run several files to create one output, where there is no difference between where the metadata of the file is located and the other starts). I have an example code below.

Is there a more efficient way to do this?

 import re mdls_output = """kMDItemAuthors = ( margheim ) kMDItemContentCreationDate = 2015-07-10 14:41:01 +0000 kMDItemContentModificationDate = 2015-07-10 14:41:01 +0000 kMDItemContentType = "com.adobe.pdf" kMDItemContentTypeTree = ( "com.adobe.pdf", "public.data", "public.item", "public.composite-content", "public.content" ) kMDItemCreator = "Safari" kMDItemDateAdded = 2015-07-10 14:41:01 +0000 """ mdls_lists = re.findall(r"^\w+\s+=\s\(\n.*?\n\)$", mdls_output, re.S | re.M) single_line_lists = [re.sub(r'\s+', ' ', x.strip()) for x in mdls_lists] for i, mdls_list in enumerate(mdls_lists): mdls_output = mdls_output.replace(mdls_list, single_line_lists[i]) print(mdls_output) 
+5
source share
1 answer

You can use the python regex routine , which can take a function as a replacement string . The function is called for each match with the matching object. The returned string replaces the match.

 def myfn(m): return re.sub(r'\s+', ' ', m.group().strip()) pat = re.compile(r"^\w+\s+=\s\(\n.*?\n\)$", re.S | re.M) mdls_output = pat.sub(myfn, mdls_output) 
+2
source

All Articles