I am trying to parse the output from OS X mdls . For some keys, a value is a list of values. I need to fix these key pairs, values correctly. All lists of values begin with ( and then end with ) .
I need to be able to iterate over all key pairs, values, so that I can parse several outputs correctly (i.e. mdls run several files to create one output, where there is no difference between where the metadata of the file is located and the other starts). I have an example code below.
Is there a more efficient way to do this?
import re mdls_output = """kMDItemAuthors = ( margheim ) kMDItemContentCreationDate = 2015-07-10 14:41:01 +0000 kMDItemContentModificationDate = 2015-07-10 14:41:01 +0000 kMDItemContentType = "com.adobe.pdf" kMDItemContentTypeTree = ( "com.adobe.pdf", "public.data", "public.item", "public.composite-content", "public.content" ) kMDItemCreator = "Safari" kMDItemDateAdded = 2015-07-10 14:41:01 +0000 """ mdls_lists = re.findall(r"^\w+\s+=\s\(\n.*?\n\)$", mdls_output, re.S | re.M) single_line_lists = [re.sub(r'\s+', ' ', x.strip()) for x in mdls_lists] for i, mdls_list in enumerate(mdls_lists): mdls_output = mdls_output.replace(mdls_list, single_line_lists[i]) print(mdls_output)
source share