SQLite pivot table such as SQL should be

I have some data. 224,000 rows in an SQLite database. I want to extract temporary information from it in order to provide a data visualization tool. In fact, each row in db is an event that (among other things, does not have much meaning) represents a group of time in seconds from the time of the era and the name responsible for it. I want to extract how many events each name has for each week in db.

It is quite simple:

SELECT COUNT(*), name, strf("%W:%Y", time, "unixepoch") FROM events GROUP BY strf("%W:%Y", time, "unixepoch"), name ORDER BY time 

and we get about six thousand rows of data.

 count name week:year 23............ fudge.......23:2009 etc... 

But I don't need a row for every name in every week - I need a row for every name and a column for every week, for example:

 Name 23:2009 24:2009 25:2009 fudge........23............6............19 fish.........1.............0............12 etc... 

Now the monitoring process has been running for 69 weeks, and the number of unique names is 502. It’s clear that I’m far from any solution that includes hard coding of all columns and, especially, rows. I am less inattentive to something related to repeating a batch, say with python executemany (), but I am ready to accept it if necessary. SQL must be installed, damn it.

+4
source share
2 answers

A good approach in such cases is not pushing SQL to such an extent that it becomes confusing, difficult to understand and maintain. Let SQL do what is convenient and can handle query results in Python.

Here is a cut out version of a simple crosstab generator that I wrote. The full version delivers totals of rows / columns / greats.

You will notice that it has a built-in β€œgroup by” - the original use case was to summarize the data received from Excel files using Python and xlrd.

row_key and col_key that you supply do not have to be strings, as in the example; they can be tuples - for example, (year, week) in your case - or they can be integers. you have a string column name mapping with an integer sort key.

 import sys class CrossTab(object): def __init__( self, missing=0, # what to return for an empty cell. Alternatives: '', 0.0, None, 'NULL' ): self.missing = missing self.col_key_set = set() self.cell_dict = {} self.headings_OK = False def add_item(self, row_key, col_key, value): self.col_key_set.add(col_key) try: self.cell_dict[row_key][col_key] += value except KeyError: try: self.cell_dict[row_key][col_key] = value except KeyError: self.cell_dict[row_key] = {col_key: value} def _process_headings(self): if self.headings_OK: return self.row_headings = list(sorted(self.cell_dict.iterkeys())) self.col_headings = list(sorted(self.col_key_set)) self.headings_OK = True def get_col_headings(self): self._process_headings() return self.col_headings def generate_row_info(self): self._process_headings() for row_key in self.row_headings: row_dict = self.cell_dict[row_key] row_vals = [row_dict.get(col_key, self.missing) for col_key in self.col_headings] yield row_key, row_vals def dump(self, f=None, header=None, footer='', ): if f is None: f = sys.stdout alist = self.__dict__.items() alist.sort() if header is not None: print >> f, header for attr, value in alist: print >> f, "%s: %r" % (attr, value) if footer is not None: print >> f, footer if __name__ == "__main__": data = [ ['Rob', 'Morn', 240], ['Rob', 'Aft', 300], ['Joe', 'Morn', 70], ['Joe', 'Aft', 80], ['Jill', 'Morn', 100], ['Jill', 'Aft', 150], ['Rob', 'Aft', 40], ['Rob', 'aft', 5], ['Dozy', 'Aft', 1], # Dozy doesn't show up till lunch-time ['Nemo', 'never', -1], ] NAME, TIME, AMOUNT = range(3) xlate_time = {'morn': "AM", "aft": "PM"} print ctab = CrossTab(missing=None, ) # ctab.dump(header='=== after init ===') for s in data: ctab.add_item( row_key=s[NAME], col_key= xlate_time.get(s[TIME].lower(), "XXXX"), value=s[AMOUNT]) # ctab.dump(header='=== after add_item ===') print ctab.get_col_headings() # ctab.dump(header='=== after get_col_headings ===') for x in ctab.generate_row_info(): print x 

Output:

 ['AM', 'PM', 'XXXX'] ('Dozy', [None, 1, None]) ('Jill', [100, 150, None]) ('Joe', [70, 80, None]) ('Nemo', [None, None, -1]) ('Rob', [240, 345, None]) 
+4
source

I will make your request first

 SELECT COUNT(*), name, strf("%W:%Y", time, "unixepoch") FROM events GROUP BY strf("%W:%Y", time, "unixepoch"), name ORDER BY time 

and then do post processing using python.

Thus, you do not need to iterate over 224 000 lines, but more than 6000 lines. You can easily store these 6,000 lines in memory (for processing using Python). I think you can also store 224,000 lines in memory, but that requires a lot more memory.

However: Newer versions of sqlite support the group_concat aggregation function. Maybe you can use this function to rotate using SQL? I cannot try because I am using an older version.

+1
source

All Articles