Python sqlalchemy column values

I have 6 tables in my SQLite database, each table with 6 columns ( Date, user, NormalA, specialA, contact, remarks ) and 1000+ rows.

How can I use sqlalchemy to sort the Date column to find duplicate dates and delete this row?

+7
python sqlalchemy
source share
3 answers

Assuming this is your model:

 class MyTable(Base): __tablename__ = 'my_table' id = Column(Integer, primary_key=True) date = Column(DateTime) user = Column(String) # do not really care of columns other than `id` and `date` # important here is the fact that `id` is a PK 

The following are two ways to delete data:

  • Find duplicates, mark them for deletion and complete the transaction
  • Create a single SQL query that will perform the deletion directly in the database.

For both of them, an auxiliary query will be used:

 # helper subquery: find first row (by primary key) for each unique date subq = ( session.query(MyTable.date, func.min(MyTable.id).label("min_id")) .group_by(MyTable.date) ) .subquery('date_min_id') 

Option-1: find duplicates, mark them for deletion and complete the transaction

 # query to find all duplicates q_duplicates = ( session .query(MyTable) .join(subq, and_( MyTable.date == subq.c.date, MyTable.id != subq.c.min_id) ) ) for x in q_duplicates: print("Will delete %s" % x) session.delete(x) session.commit() 

Option-2: create a single SQL query that will perform the deletion directly in the database

 sq = ( session .query(MyTable.id) .join(subq, and_( MyTable.date == subq.c.date, MyTable.id != subq.c.min_id) ) ).subquery("subq") dq = ( session .query(MyTable) .filter(MyTable.id.in_(sq)) ).delete(synchronize_session=False) 
+2
source share

Inspired by Find Duplicate Values ​​in an SQL Table , this can help you choose duplicate dates:

 query = session.query( MyTable ).\ having(func.count(MyTable.date) > 1).\ group_by(MyTable.date).all() 

If you want to display unique dates; distinct on is what you might need

+1
source share

While I like the whole object-oriented approach with SQLAlchemy, sometimes I find it easier to directly use some SQL. And since the entries do not have a key, we need a line number ( _ROWID_ ) to delete the destination entries, and I don’t think the API provides it.

So, first we connect to the database:

 from sqlalchemy import create_engine db = create_engine(r'sqlite:///C:\temp\example.db') eng = db.engine 

Then list all entries:

 for row in eng.execute("SELECT * FROM TableA;") : print row 

And to display all duplicate entries where the dates are identical:

 for row in eng.execute(""" SELECT * FROM {table} WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1) ORDER BY {field}; """.format(table="TableA", field="Date")) : print row 

Now that we have identified all the duplicates, they should probably be fixed if the other fields are different:

 eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;"); eng.execute("UPDATE TableA SET NormalA=4, specialA=8 WHERE Date = '2015-18-12' ;"); 

And at the root, save the first inserted record and delete the most recent duplicate records:

 print eng.execute(""" DELETE FROM {table} WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field}); """.format(table="TableA", field="Date")).rowcount 

Or save the last inserted record and delete other duplicate records:

 print eng.execute(""" DELETE FROM {table} WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field}); """.format(table="TableA", field="Date")).rowcount 
+1
source share

All Articles