I have several PDF documents in a folder that has a specific structure:

Now I want to be able to parse information from a PDF. Note that paragraphs have different lengths.
Obviously, I am not asking you to solve the problem for me, but I need some guidance on how this can be achieved.
I used nokogiri before and technically I need something similar, but for PDF files.
So, the pseudo-result for my example would look like this:
- ItemA
- Title: ItemA
- File: 123456789.pdf
- Image: ImageA.png (the image was stored on disk)
- Subtitle1: Content for subtitle 1
- Subtitle2: Content for subtitle 2
- Subtitle3: Content for subtitle 3
- TitleB
- [...]
source
share