Skip to content

BUG: make to_json with JSON Table Schema work correctly with string dtype #61889

Open
@jorisvandenbossche

Description

@jorisvandenbossche

(noticed because of some doctest failures cfr #61886)

Currently, for the strings as object dtype, it seems that we assume that object dtype are actually strings, and encode that as such in the schema part of the JSON Table Schema output:

>>> pd.Series(["a", "b", None], dtype=object).to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

But for the now-default string dtype, this is still seen as some custom extension dtype:

>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

(note the "type":"string" vs "type":"any","extDtype":"str")

Given that the Table Schema spec has a "string" type, let's also use that when exporting our string dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO JSONread_json, to_json, json_normalizeStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions