Hadoop | Creditvidya

Read Spark Parquet Output Using Python Job Using Arrow

Jun 10, 2021 · 2 min read · Hadoop BigData Apache Spark Apache Arrow Python Parquet Pandas Dataframe ·

We have a lot of different jobs which run on our Spark Clusters and produce the output in different formats as per the job specifications. Generally the data format that we use are json, parquet and parquet with data partitioning. There is this one case where we wanted the paquet output data generated by one of the …
Read More