I am doing a streaming read of an object using BufferedReader.
I need to do two things with this object:
- Pass it to the csv reader SuperCSV
- Get the raw strings and save them in a lazy sequence <(w120>)
Currently, I have to use two different BufferedReaders: one as an argument to the SuperCSV CSV reading class and one to initialize the lazy sequence of source lines. I am effectively loading the S3 object twice, which is expensive ($) and slower.
One of my colleagues noted that something similar to the tee Unix command is what I'm looking for. BufferedReader, which can be somehow βbrokenβ, load a piece of data and transfer a copy for both the lazy sequence and the csv read function. It would be helpful.
I am currently also studying whether it is possible to wrap a lazy sequence in a BufferedReader and pass it to super csv. I had several problems with the Java heap when passing very large lazy sequences to multiple consumers, so I am a bit worried about using this solution.
Another solution is to simply upload the file locally and then open two streams in that file. This eliminates the original motivation for streaming: it allows you to start working with the file as soon as the data begins.
The final solution, which I would consider only if nothing works, is implemented by my own CSV reader, which returns both the analyzed CSV and the original non-parameterized string. If you used a very robust CSV reader that can return both a Java hash of analyzed CSV data and the original string without links, please let me know!
Thanks!
source share