-
Notifications
You must be signed in to change notification settings - Fork 963
Description
Is your feature request related to a problem? Please describe.
I wish I could use cuDF to directly convert a struct column in a cudf.Series to a string representation using something like cudf.Series.struct.astype(str). Currently, cuDF does not seem to support a straightforward way to cast a struct-type column to a string, which makes it challenging to work with struct data when I need a human-readable format or to pass it to other systems expecting strings.
Describe the solution you’d like I’d like a method such as cudf.Series.struct.astype(str) that converts a struct column in a cudf.Series to a string representation. For example, if I have a struct column with fields like {"a": 1, "b": 2}, it could output a string like "{a: 1, b: 2}" or another consistent format. This would make it easier to manipulate, display, or export struct data without needing complex workarounds.
Describe alternatives you’ve considered I’ve considered extracting the individual fields of the struct using cudf.Series.struct.explode() or accessing fields directly and then concatenating them manually into a string. However, this is cumbersome, especially for structs with many fields or nested structures, and it doesn’t scale well for large datasets. Another option is converting the data to a pandas DataFrame and using pandas string operations, but this defeats the purpose of using cuDF for GPU acceleration.
Additional context
This feature would be particularly useful for debugging, logging, or interoperability with systems that expect string data. For example, in a dataset with a struct column like {"x": int, "y": float}, being able to call .struct.astype(str) would simplify workflows significantly. Existing implementations in libraries like pandas have astype(str) for general use, so extending this to cuDF’s struct handling would align with user expectations and enhance functionality.