Note. Scroll down to the Background section for more details. Suppose the project uses Python-Django and South, in the following figure.
What is the best way to import the next CSV
"john","doe","savings","personal" "john","doe","savings","business" "john","doe","checking","personal" "john","doe","checking","business" "jemma","donut","checking","personal"
To a PostgreSQL database with related tables Person, Account and AccountType, taking into account:
- Administrator users can change the database model and real-time CSV import view through the user interface
- Stored CSV database table and field mappings are used when regular users import CSV files.
So far, two approaches have been considered.
- ETL-API Approach: Providing an ETL API spreadsheet, table / field mappings in a CSV database, and information about connecting to the target database. The API will then load the spreadsheet and populate the target database tables. Looking at pygrametl, I donβt think that what I aim for is possible. In fact, I'm not sure any ETL APIs do this.
- Row-level insertion approach: analysis of mappings of tables and fields of a CSV database, parsing a spreadsheet, and creating SQL attachments to the join order.
I implemented the second approach, but I am struggling with the flaws of the algorithm and the complexity of the code. Is there an python ETL API that does what I want? Or an approach that does not involve reinventing the wheel?
Background
The company I work with wants to move hundreds of project tables hosted on sharepoint to databases. We are almost completing a web application that meets the needs, allowing the administrator to define / model a database for each project, store spreadsheets in it and determine the viewing experience. At this stage of completion, switching to a commercial tool is not an option. Think of a web application as an alternative to django-admin, although it is not, with a DB modeling user interface, CSV import / export function, custom viewing and modular code to solve project-specific settings.
The implemented CSV import interface is cumbersome and buggy, so I'm trying to get feedback and find alternative approaches.