If you work with large datasets, Python may not be the best choice. If you want to use a database such as MySQL or Postgres, you should try SQLAlchemy . This makes it easy to work with potentially large datasets using small Python objects. For example, if you use a type definition:
from datetime import datetime from sqlalchemy import Column, DateTime, Enum, ForeignKey, Integer, \ MetaData, PickleType, String, Text, Table, LargeBinary from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import column_property, deferred, object_session, \ relation, backref SqlaBaseClass = declarative_base() class MyDataObject(SqlaBaseClass): __tablename__ = 'datarows' eltid = Column(Integer, primary_key=True) name = Column(String(50, convert_unicode=True), nullable=False, unique=True, index=True) created = Column(DateTime) updated = Column(DateTime, default=datetime.today) mylargecontent = deferred(Column(LargeBinary)) def __init__(self, name): self.name = name self.created = datetime.today() def __repr__(self): return "<MyDataObject name='%s'>" %(self.name,)
Then you can easily access all rows with small data objects:
# set up database connection; open dbsession; ... for elt in dbsession.query(MyDataObject).all(): print elt.eltid
I think the fact is that you can add as many fields to your data as you want by adding indexes necessary to speed up the search. And most importantly, when you work with MyDataObject , you can create potentially large deferred fields so that they only load when you need them.
phooji
source share