Skip to content

GH-48241: [Python] Scalar inferencing doesn't infer UUID#48727

Open
tadeja wants to merge 11 commits intoapache:mainfrom
tadeja:48241-scalar-infer-UUID
Open

GH-48241: [Python] Scalar inferencing doesn't infer UUID#48727
tadeja wants to merge 11 commits intoapache:mainfrom
tadeja:48241-scalar-infer-UUID

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Jan 5, 2026

Rationale for this change

This closes #48241, #44224 and #43855.
Currently uuid.UUID objects are not inferred/converted automatically in PyArrow, requiring users to explicitly specify the type.

What changes are included in this PR?

Adding support for Python's uuid.UUID objects in PyArrow's type inference and conversion.

Are these changes tested?

Yes, added test_uuid_scalar_from_python() and test_uuid_array_from_python() in test_extension.py.

Are there any user-facing changes?

Users can now pass Python uuid.UUID objects directly to PyArrow functions like pa.scalar() and pa.array() without specifying the type;

import uuid
import pyarrow as pa

pa.scalar(uuid.uuid4())

<pyarrow.UuidScalar: UUID('958174b9-3a5c-4cdd-8fc5-d51a2fc55784')>

pa.array([uuid.uuid4()])

<pyarrow.lib.UuidArray object at 0x1217725f0>
[
73611FD81F764A209C8B9CDBADDA1F53
]

@tadeja
Copy link
Contributor Author

tadeja commented Jan 5, 2026

@AlenkaF Would you recommend any good place to document this UUID change? - I see @amoeba indicated the need for documentation in his draft pull request #44242.
ӇƛƤƤƳ_ƝЄƜ_ƳЄƛƦ:)

@AlenkaF
Copy link
Member

AlenkaF commented Jan 7, 2026

Happy New Year! ❤️

I would suggest adding the documentation to the Extending PyArrow page under the Canonical extension types section as a separate subsection next to Fixed size tensor one.

@tadeja
Copy link
Contributor Author

tadeja commented Jan 16, 2026

@AlenkaF, @rok do you have the chance to review this one - should enable multiple UUID use-cases.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Two minor nits.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jan 16, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jan 22, 2026
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just minor suggestions for comments.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jan 22, 2026
@rok
Copy link
Member

rok commented Jan 22, 2026

@pitrou could you take a look at this PR? Especially cython change could use your expertise.

@tadeja tadeja force-pushed the 48241-scalar-infer-UUID branch from 8b1e4a2 to 2974e12 Compare February 13, 2026 13:23
@tadeja
Copy link
Contributor Author

tadeja commented Feb 20, 2026

@pitrou, any wise thoughts on changes here?

GetUuidStaticSymbols();
uuid_static_initialized = true;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're duplicating code here between different module imports. It would be really nice to write something like this:

struct UuidModuleData {
  PyObject* UUID_class = nullptr;
};

UuidModuleData* InitUuidStaticData() {
  static ModuleOnceRunner runner("uuid");
  return runner.Run([&](OwnedRef module) -> UuidModuleData {
    UuidModuleData data;
    OwnedRef ref;
    if (ImportFromModule(module.obj(), "UUID", &ref).ok()) {
      data.UUID_class = ref.obj();
    }
    return data;
  });
}
struct ModuleOnceRunner {
  std::string module_name;
#ifdef Py_GIL_DISABLED
  std::once_flag initialized;
#else
  bool initialized = false;
#endif

  template <typename Func>
  auto Run(Func&& func) -> decltype(func(OwnedRef()) {
    using RetType = decltype(func(OwnedRef());
    RetType ret{};
    auto wrapper_func = [&]() {
      OwnerRef module;
      if (ImportModule("uuid", &module).ok()) {
        ret = func(std::move(module));
      }
    };
#ifdef Py_GIL_DISABLED
    std::call_once(initialized, wrapper_func);
#else
    if (!initialized) {
      initialized = true;
      wrapper_func();
    }
#endif
    return ret;
  };
};

I think @rok can help.

Comment on lines 1292 to 1293
ARROW_ASSIGN_OR_RAISE(auto converter, (MakeConverter<PyConverter, PyConverterTrait>(
options.type, options, pool)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MakeConverter support extension types here? I see that we only unwrap the extension type in the inference path above.

@tadeja tadeja requested a review from rok February 25, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Scalar inferencing doesn't infer UUID

4 participants