How to disable data using SparkR?

Using SparkRhow nested arrays can “explode”? I tried using explodeas follows:

 dat <- nested_spark_df %>% 
     mutate(a=explode(metadata)) %>%
     head()

but, despite the fact that the above does not cause an exception, it does not contribute to the nested fields in metadatathe upper level. Essentially, I'm looking for behavior similar to the behavior of a Hive function LATERAL VIEW explode()without relying on HiveContext.

Please note that in the code snippet I am using NSE with support SparkRext. I think the equivalent direct meaning SparkRwould be something like ... %>% mutate(a=explode(nested_spark_df$metadata)) ...or something like that.

EDIT

I tried using LATERAL VIEW explode(...)in functions SparkR::sql. It seems to work great with Parquet and ORC data. However, when working with Avro nested data, I tried:

dat <- collect(sql(HiveContext,
                   paste0("SELECT a.id, ax.arrival_airport, x.arrival_runway ",
                          "FROM avrodb.flight a ",  
                             "LATERAL VIEW explode(a.metadata) a AS ax ",
                          "WHERE ax.arrival_airport='ATL'")))

, avrodb parquetdb, , , .

Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 5.0 failed 4 times, most recent failure: Lost task 4.3 in stage 5.0 (TID 1345, dev-dn04.myorg.org): org.apache.avro.AvroTypeException: Found metadata, expecting union
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avr
Calls: <Anonymous> ... collect -> collect -> .local -> callJStatic -> invokeJava

, DataBricks Avro Spark. SQLContext ( HiveContext) , , , explode(). , , Hive, HQL, SparkR::sql(HiveContext, hql)

0
2

@Sim. . , explode, , select. :

dat <- nested_spark_df %>% 
 mutate(a=explode(nested_spark_df$metadata)) %>%
 select("id", "a.fld1", "a.fld2")

SparkR DataFrame : id, fld1 fld2 (no a. preended).

, , PIG flatten, .

+1

dplyr , , . . , explode() Spark. , , DSL explode (. ), SQL sql().

0

All Articles