Python creates jpg file from source data

I have this strange xml file which apparently contains jpeg image data:

<?xml version="1.0" encoding="UTF-8"?> <AttachmentDocument xmlns="http://echa.europa.eu/schemas/iuclid5/20070330" documentReferencePK="ECB5-d18039fe-6fb0-44d6-be9e-d6ade38be543/0" encoding="0" fileSize="5788" fileTimestamp="2007-04-17T12:38:44Z" parentDocumentPK="ECB5-fb07efbf-ee93-4cdd-865b-49efa51cbd15/0" version="2007-03-19T14:13:29Z"> <modificationHistory> <modification date="2007-05-10T09:00:00Z"> <comment>Created</comment> <modificationBy>European Commision/Joint Research Centre/European Chemicals Bureau</modificationBy> </modification> </modificationHistory> <ownershipProtection copyProtection="false" fractionalDocument="false" sealed="false"/> <fileName>33952-38-4-V2.jpeg</fileName> <fileMimetype>image/jpeg</fileMimetype> <rawContent> H4sIAAAAAAAAAO2XZ1AU65qAe5iBIQwgOCMZRkHJCIgEySBhyEEyIyDgMBKHLEFQBJEoIHBEQFQE JUjOSo4iOQ+Ss2QkSZhZvLXn7j11726d3draH1vn7Xp+dH1fd/XzvV+//TZxlDgNnNNQRakCIBAA gM4OgEgA5JQNVBRv6RrcQGLsBO+52WOQ3iJCwkgeLw+sCwaJ0lBDauipqCG9xUV5BZB29ndtvJw8 kTgvGyes531K4jigDJCTkUHJSMmhUCgFBTklDE4No6KCMdGfp4WzMXOwszGzsiK5hLiRlwQ4WVl5 ... </rawContent> <MD5>0d80850b0c4085500f80e1430b90c70910d4110cc0d7</MD5> </AttachmentDocument> 

(Full version here ) And I can not read the image from it.

My attempt:

 from PIL import Image import StringIO import base64 # I've eleminated all newlines and tabs to produce data string data="H4sIAAAAAAAAAO2XZ1AU65qAe5..." im = Image.open(StringIO.StringIO(base64.b64decode(data))) 

But I get an error message:

 File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open raise IOError("cannot identify image file") 
+4
source share
1 answer

If you check what you get at the base64 decoding output, you will notice that it is a gzip file. Extract the compressed file and you will get the desired JPEG.

Comment saved on image:

 CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), default quality 
+3
source

All Articles