Python File Slurp w / endian conversion

It was recently asked how to make a slurp file in python , and the accepted answer suggested something like:

with open('x.txt') as x: f = x.read()

How can I do this to read the file and convert the data view to endian?

For example, I have a 1 GB binary that only contains a bunch of single-point floats packaged as a large endian, and I want to convert it to a small endian and dump into a numpy array. Below is the function that I wrote to execute this and some real code that calls it. I use struct.unpackendian for conversion and try to speed things up using mmap.

Then my question is: am I using slurp correctly with mmapand struct.unpack? Is there a cleaner and faster way to do this? Now I have work, but I would really like to know how to do it better.

Thanks in advance!

#!/usr/bin/python
from struct import unpack
import mmap
import numpy as np

def mmapChannel(arrayName,  fileName,  channelNo,  line_count,  sample_count):
    """
    We need to read in the asf internal file and convert it into a numpy array.
    It is stored as a single row, and is binary. Thenumber of lines (rows), samples (columns),
    and channels all come from the .meta text file
    Also, internal format files are packed big endian, but most systems use little endian, so we need
    to make that conversion as well.
    Memory mapping seemed to improve the ingestion speed a bit
    """
    # memory-map the file, size 0 means whole file
    # length = line_count * sample_count * arrayName.itemsize
    print "\tMemory Mapping..."
    with open(fileName, "rb") as f:
        map = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        map.seek(channelNo*line_count*sample_count*arrayName.itemsize)

        for i in xrange(line_count*sample_count):
            arrayName[0, i] = unpack('>f', map.read(arrayName.itemsize) )[0]

        # Same method as above, just more verbose for the maintenance programmer.
        #        for i in xrange(line_count*sample_count): #row
        #            be_float = map.read(arrayName.itemsize) # arrayName.itemsize should be 4 for float32
        #            le_float = unpack('>f', be_float)[0] # > for big endian, < for little endian
        #            arrayName[0, i]= le_float

        map.close()
    return arrayName

print "Initializing the Amp HH HV, and Phase HH HV arrays..."
HHamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HHphase = np.ones((1,  line_count*sample_count),  dtype='float32')
HVamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HVphase = np.ones((1,  line_count*sample_count),  dtype='float32')



print "Ingesting HH_Amp..."
HHamp = mmapChannel(HHamp, 'ALPSRP042301700-P1.1__A.img',  0,  line_count,  sample_count)
print "Ingesting HH_phase..."
HHphase = mmapChannel(HHphase, 'ALPSRP042301700-P1.1__A.img',  1,  line_count,  sample_count)
print "Ingesting HV_AMP..."
HVamp = mmapChannel(HVamp, 'ALPSRP042301700-P1.1__A.img',  2,  line_count,  sample_count)
print "Ingesting HV_phase..."
HVphase = mmapChannel(HVphase, 'ALPSRP042301700-P1.1__A.img',  3,  line_count,  sample_count)

print "Reshaping...."
HHamp_orig = HHamp.reshape(line_count, -1)
HHphase_orig = HHphase.reshape(line_count, -1)
HVamp_orig = HVamp.reshape(line_count, -1)
HVphase_orig = HVphase.reshape(line_count, -1)
+5
source share
4 answers
with open(fileName, "rb") as f:
  arrayName = numpy.fromfile(f, numpy.float32)
arrayName.byteswap(True)

Pretty hard to beat for speed and brevity ;-). For byteswap see here (argument Truemeans "do it in place"); for fromfile see here .

, ( , ). , , byteswap , byteswap, :

if struct.pack('=f', 2.3) == struct.pack('<f', 2.3):
  arrayName.byteswap(True)

i.e, batwap .

+6

@Alex Martelli:

arr = numpy.fromfile(filename, numpy.dtype('>f4'))
# no byteswap is needed regardless of endianess of the machine
+7

ASM, CorePy. , , - . / 1 , .

- C, python. DEM () . , script.

0
source

I would expect something like this faster

arrayName[0] = unpack('>'+'f'*line_count*sample_count, map.read(arrayName.itemsize*line_count*sample_count))

Please do not use mapas a variable name

0
source

All Articles