Open
Description
(noticed because of some doctest failures cfr #61886)
Currently, for the strings as object dtype, it seems that we assume that object dtype are actually strings, and encode that as such in the schema part of the JSON Table Schema output:
>>> pd.Series(["a", "b", None], dtype=object).to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'
But for the now-default string dtype, this is still seen as some custom extension dtype:
>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'
(note the "type":"string"
vs "type":"any","extDtype":"str"
)
Given that the Table Schema spec has a "string" type, let's also use that when exporting our string dtype.