I have two spark streams, first of all, the data related to the products comes in: their price for the supplier, currency, their description, supplier ID. These data are enriched by the category guessed by the analysis of the description and the price in dollars. Then they are saved in the parquet data set.
The second stream contains data on the auction of these products, then on the price at which they were sold, and on the date.
Considering the fact that a product can arrive today in the first stream and be sold in a year, how can I join the second stream with all the history contained in the parquet dataset of the first stream?
As a result, it should be clear that average daily income over a price range ...
source
share