Light flow

I used Apache Arrow with Spark for some time in Python and could easily convert between dataframes and Arrow objects using Pandas as an intermediary.

Recently, Ive moved from Python to Scala to interact with Spark, and using Arrow is not intuitive in Scala (Java), as in Python. My main need is to convert a Spark data frame (or RDD, as it is easy to convert) to an Arrow object as quickly as possible. My initial thought was to first go to Parquet and go from Parquet to Strelka, as I remembered that this PR could be read from Parquet. However, please correct me if I am mistaken by looking at the Arrow Java docs for a while, I could not find the Parquet to Arrow function. Does this feature not exist in the Java version? Is there any other way to get the Spark framework for an Arrow object? Perhaps converting dataframe columns to arrays and then converting to arrow objects?

Any help would be greatly appreciated. thank you

EDIT: Find the following link that converts the parquet layout to the arrow layout. But it doesn't seem to return an Arrow object from the parquet file, as I need: https://github.com/apache/parquet-mr/blob/70f28810a5547219e18ffc3465f519c454fee6e5/parquet-arrow/src/main/java/org/apache/parquet /arrow/schema/SchemaConverter.java

+6
source share

All Articles