The key point here is that it only takes a lot of time for the first request to download the file metadata. The reason is that SparkSQL does not save section metadata in the hive metastar. For partitioned Hive tables, partition information should be stored in the metastore. Depending on how the table is created, how it behaves will be determined. From the information provided, it seems that you have created a SparkSQL table.
SparkSQL stores the table schema (which includes the partition information) and the root directory of your table, but each time the query starts, it discovers each partition directory in S3 dynamically. I understand that this is a compromise, so you do not need to manually add new partitions whenever the table is updated.
source
share