Is the pyro implementation of Avro slow?

I am reading some data from the avro file using the avro library. It takes about a minute to load 33K objects from a file. This seems very slow to me, especially with a version of Java reading the same file in about 1 second.

Here is the code, am I doing something wrong?

import avro.datafile
import avro.io
from time import time

def load(filename):
    fo = open(filename, "rb")
    reader = avro.datafile.DataFileReader(fo, avro.io.DatumReader())
    for i, record in enumerate(reader):
        pass

    return i + 1

def main(argv=None):
    import sys
    from argparse import ArgumentParser

    argv = argv or sys.argv

    parser = ArgumentParser(description="Read avro file")


    start = time()
    num_records = load("events.avro")
    end = time()

    print("{0} records in {1} seconds".format(num_records, end - start))

if __name__ == "__main__":
    main()
+5
source share
2 answers

The avro Python package available in PyPI is pure Python, so I'm not surprised if it is slower than Java by an order of magnitude or more.

There is an implementation of Avro C, but as far as I know, no one has created a Python extension on its basis.

+3
source

, python fastavro, Cython, .

https://bitbucket.org/tebeka/fastavro

+1

All Articles