You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to merge hundreds of small parquet files into bigger ones. Sadly they are not all the same schema (e.g. missing columns), nor is the schema known at compile time.
I am just wondering what would be the most eficient way to get only the schema of a parquet file.
Currently i am looking into the first RowParquetRecord but as there might be NullValues....
Further, i am interested if there is a complete list of how to map scala types properly to fields, like this Types.primitive(INT32, OPTIONAL).as(LogicalTypeAnnotation.dateType()).named(Birthday)
Thanks
The text was updated successfully, but these errors were encountered:
Parquet4s doesn't expose file schema in its own API (it is a thing that could be added). However, you can easily access it by calling the original Java API that Parquet4s is using under the hood. Check org.apache.parquet.hadoop.ParquetFileReader, e.g.:
val reader = ParquetFileReader.open(inputFile, readerOptions)
try {
val schema: MessageType = reader.getFileMetaData.getSchema
...
} finally reader.close()
First i want to thank you for this great library!
I need to merge hundreds of small parquet files into bigger ones. Sadly they are not all the same schema (e.g. missing columns), nor is the schema known at compile time.
I am just wondering what would be the most eficient way to get only the schema of a parquet file.
Currently i am looking into the first RowParquetRecord but as there might be NullValues....
Further, i am interested if there is a complete list of how to map scala types properly to fields, like this
Types.primitive(INT32, OPTIONAL).as(LogicalTypeAnnotation.dateType()).named(Birthday)
Thanks
The text was updated successfully, but these errors were encountered: