Apache Flink is not associated with specific storage systems or formats. The best place to store Flink calculated results depends on your use case.
- Is a batch or streaming job running?
- What do you want to do with the result?
- Do you need batch (full view), point or continuous streaming data access?
- What format does the data have? flat structured (relational), nested, blob, ...
Depending on the answer to these questions, you can choose from various repositories, such as - Apache HDFS for packet access (with different storage formats, such as Parquet, ORC, user binary) - Apache Kafka, if you want to access data as to a stream - storage with key values, such as Apache HBase and Apache Cassandra for point-to-point access to data - a database such as MongoDB, MySQL, ...
Flink provides OutputFormats for most of these systems (some through the shell for Hadoop OutputFormats). The โbestโ system depends on your use case.
source share