You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can proceed with this and can then develop a generic implementation using google.protobuf.Any, with separate .proto files defining various file formats.
The text was updated successfully, but these errors were encountered:
Thanks @sanjibansg. I can add a bit of context. These are based on Arrow's CSV reader implementation. There is a similar "giant block of CSV options" in pandas. I think my big question (for the Substrait community) would be whether something like this is in scope of Substrait and, if so, how it should be added?
I think it should be partially added to core Substrait. Some of these things seem very arrow specific, some seem very generic (specific: use threads, generic: delimiter).
Let's start by focusing on adding the things that are common to most delimited text readers. Then we can potentially define some structured hints that may be useful but could be ignored. For example, use threads feels like a hint, not a semantic piece of information (implementations could ignore and still provide logically equivalent results). Some of these things also don't really make any sense. For example, I don't know what column names would mean in the context of substrait (and there are several properties focused on this).
With reference to #138, we can have the implementation for CSV file format by defining the required messages. (Prototype code can be found here)
and then the
file_type
can be defined byone_of
,We can proceed with this and can then develop a generic implementation using
google.protobuf.Any
, with separate.proto
files defining various file formats.The text was updated successfully, but these errors were encountered: