Our application / data
We have a python c application Userin Transactionwhich has Commissions, Fees, etc. c Contact, which get EmailMessages, and which Activitytake place on ( Documentloaded, Statuschanges, etc.).
Our reports
We generate reports for spreadsheets for our clients, which detail such information as the number of documents loaded into transactions, the amounts of various types of fees, fees, charges, actions, etc. These reports in some cases provide statistics for the customer’s account for each month in a given year (each month in a separate row in the spreadsheet).
Our problem
We reached a point with our web application when some of the spreadsheet reports we generate take minutes to create (everyone is waiting for Postgres), despite efforts to optimize queries, add indexes, and despite this, we only use SSDs and we have enough RAM to put the database in memory. In fact, we have reached the point where some basic reports become too expensive to run as simple aggregate queries to our production database.
The Solutions I Consider
- Denormalize statistics into existing tables in Postgres
- Memcached Cache Statistics
- Reduce / simplify queries by moving some of the crunches in Python
- Run expensive reports in line and notify administrators when they are ready.
- ( ..).
- Sharding
1-4 , . , 4 , , 5 ( - Redshift). 6 - , , .
?
Redshift, -, , (), " ". , " -", , , , (- ..)?
Quicksight, , , - , .
, ? Redshift , ? - , ?