Hachoir - getting data from a group

Trying to use Hachoir to extract metadata from a video file. It works well enough unless you use "get" or the like to return the width and height values.

I assumed this would be:

metadata.get('width') 

But this causes an error (the object does not have a width property).

When I run the following:

 for data in sorted(metadata): if len(data.values ) > 0: print data.key, data.values[0].value 

All that is returned is information from the General group.

When i use:

 metadata.exportPlaintext 

... information from "Common", "Video stream" and "Audio stream" is returned. I could just parse the resulting “text” and cross out the values ​​for height and width, but I would rather try to do it right using metadata.get ('width') or similar.

After looking at the source code, I thought I could use the following:

 for key, metadata in metadata.__groups.iteritems(): 

To iterate through ._groups in metadata, but then throw away the “AsfMetadata” object, it does not have the “_groups” attribute, which I'm sure should not be the way I thought, “AsfMetadata” was a subclass of MultipleMetadata (), which has such a variable.

Probably missing something completely obvious.

+4
source share
2 answers

This seems less simple for a WMV file. I turned the metadata for such videos into defaultdict , and now it’s easier to get the width of the image:

 from collections import defaultdict from pprint import pprint from hachoir_metadata import metadata from hachoir_core.cmd_line import unicodeFilename from hachoir_parser import createParser # using this example http://archive.org/details/WorkToFishtestwmv filename = './test_wmv.wmv' filename, realname = unicodeFilename(filename), filename parser = createParser(filename) # See what keys you can extract for k,v in metadata.extractMetadata(parser)._Metadata__data.iteritems(): if v.values: print v.key, v.values[0].value # Turn the tags into a defaultdict metalist = metadata.extractMetadata(parser).exportPlaintext() meta = defaultdict(defaultdict) for item in metalist: if item.endswith(':'): k = item[:-1] else: tag, value = item.split(': ') tag = tag[2:] meta[k][tag] = value print meta['Video stream #1']['Image width'] # 320 pixels 
+3
source

To get width x height from the first group of top-level metadata that has size information in a media file without access to private attributes and without parsing text output, you can use file_metadata.iterGroups() :

 #!/usr/bin/env python import sys from itertools import chain # $ pip install hachoir-{core,parser,metadata} from hachoir_core.cmd_line import unicodeFilename from hachoir_metadata import extractMetadata from hachoir_parser import createParser file_metadata = extractMetadata(createParser(unicodeFilename(sys.argv[1]))) it = chain([file_metadata], file_metadata.iterGroups()) print("%sx%s" % next((metadata.get('width'), metadata.get('height')) for metadata in it if metadata.has('width') and metadata.get('height'))) 

Convert metadata to a dictionary (non-recursively, i.e., if necessary, manually sorting through groups):

 def metadata_as_dict(metadata): return {item.key: (len(item.values) > 1 and [v.value for v in item.values] or item.values[0].value) for item in metadata if item.values} 
+3
source

All Articles