I have a bunch of reports that I collect manually every day, and this happens forever, so I thought about automating the whole process. I will extract data from: (1) HTML, (2) CSV / XLS, (3) PDF. I basically have only scraped data from CSV / HTML with PHP and wondered if there are any reliable libraries or ways to capture tabular data from PDF to PHP?
I also just started to learn Python and see that it would be nice to try to do this with PDFMiner in combination with Scrapy. Would it be better? Or are there other options?
Please let me know. Thanks!
source share