Skip to content

3436: AVRO-1759: [java] Automatic union types for sealed classes#13

Open
martin-augment wants to merge 3 commits intomainfrom
pr-3436-2025-10-24-12-29-53
Open

3436: AVRO-1759: [java] Automatic union types for sealed classes#13
martin-augment wants to merge 3 commits intomainfrom
pr-3436-2025-10-24-12-29-53

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Oct 24, 2025

3436: To review by AI

Summary by CodeRabbit

  • New Features

    • Added support for Java sealed classes in the reflection module, enabling automatic serialization and deserialization of polymorphic types with permitted subclasses.
  • Tests

    • Added comprehensive test coverage for polymorphic encoding and decoding of sealed class hierarchies.

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

Avro's reflection-based serialization gains support for sealed classes by introducing a new union resolver that checks both Union annotations and sealed class permits. Internal union handling refactors to use Class[] arrays instead of annotation objects, enabling automatic polymorphic type discovery. A new test validates sealed interface serialization with concrete implementations.

Changes

Cohort / File(s) Summary
Sealed class union support
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
Adds getUnion(AnnotatedElement) utility to resolve union types from Union annotations or sealed class permits via reflection. Refactors getAnnotatedUnion, createSchema, createFieldSchema, and getMessage to use Class[] union representation instead of Union annotation objects. Introduces static initialization block to cache references to SpecificData's sealed-class introspection methods.
Polymorphic encoding test
lang/java/java17-test/src/test/java/org/apache/avro/reflect/TestPolymorphicEncoding.java
New test file validating polymorphic encode/decode using sealed Animal interface with Cat and Dog implementations. Includes helper methods for binary serialization via ReflectDatumWriter/Reader and verifies round-trip equality.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ReflectData
    participant AnnotationCheck
    participant SealedCheck
    participant SpecificData
    
    Client->>ReflectData: createSchema(Animal.class)
    ReflectData->>ReflectData: getUnion(Animal.class)
    ReflectData->>AnnotationCheck: Check for @Union annotation
    alt Union annotation present
        AnnotationCheck-->>ReflectData: Return Union.labelled types
    else No annotation
        ReflectData->>SealedCheck: Check if class is sealed
        SealedCheck->>SpecificData: isSealed(Animal.class)
        SpecificData-->>SealedCheck: true
        SealedCheck->>SpecificData: getPermittedSubclasses(Animal.class)
        SpecificData-->>SealedCheck: [Cat.class, Dog.class]
        SealedCheck-->>ReflectData: Return permitted subclasses
    end
    ReflectData-->>Client: Schema with Cat/Dog union types
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The changes involve core reflection logic modifications in ReflectData.java with union resolution refactoring affecting multiple methods, combined with Java 17 sealed-class feature integration requiring careful verification of backward compatibility and reflection correctness.

Poem

🐰 Sealed classes hop into view,
Permits paint the union true,
Reflection whispers which subclass to choose,
Cat and Dog now refuse to lose!
Polymorphic dreams in Avro's embrace, 🎀

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "3436: AVRO-1759: [java] Automatic union types for sealed classes" directly and clearly describes the main change in the changeset. The raw summary confirms that the primary modification introduces reflection-based support for sealed classes, enabling automatic union types through cached SpecificData methods that expose permitted subclasses. The title is concise, specific, and avoids vague language; it accurately conveys the purpose of the changes across both the ReflectData.java modifications and the supporting test file. A teammate reviewing the git history would immediately understand that this PR adds automatic union type support for sealed classes in the Java implementation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3436-2025-10-24-12-29-53

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the Java label Oct 24, 2025
@claude
Copy link

claude bot commented Oct 24, 2025

PR Review: AVRO-1759 - Automatic union types for sealed classes

Summary

This PR adds automatic union type detection for Java sealed classes, eliminating the need for explicit @Union annotations when using sealed interfaces/classes. This is a nice improvement that leverages modern Java features (Java 17+).


Code Quality & Best Practices

✅ Strengths

  1. Backward compatibility: The reflection-based approach gracefully falls back to null on pre-Java 17 runtimes, maintaining compatibility
  2. Clean abstraction: The getUnion() method nicely encapsulates the logic for both annotation-based and sealed-class unions
  3. Appropriate test location: The test is correctly placed in java17-test module which targets Java 17+
  4. Good test coverage: The test validates both serialization and deserialization with multiple concrete types

⚠️ Issues & Concerns

1. CRITICAL BUG: Wrong class used for reflection (ReflectData.java:77)

Class<? extends Class> classClass = SpecificData.class.getClass();

This should be:

Class<? extends Class> classClass = Class.class;

Explanation: You're getting the metaclass of SpecificData.class (which is Class<SpecificData>), not the Class class itself. This means isSealed() and getPermittedSubclasses() methods will never be found, and the sealed class feature will silently fail.

Impact: The sealed class feature is currently non-functional due to this bug. The methods will always be null, and the test may be passing due to other reasons or test environment specifics.

2. Missing null check for AnnotatedElement (ReflectData.java:842)

private Class[] getUnion(AnnotatedElement element) {
    Union union = element.getAnnotation(Union.class);
    if (union != null) {
        return union.value();
    }
    
    if (element instanceof Class) {  // Good - checking instance type

While the instanceof check protects the sealed class logic, the method signature accepts AnnotatedElement which could theoretically be null depending on callers. Consider adding a null check at the start if this is public/protected API.

3. Overly broad exception handling (ReflectData.java:845)

} catch (ReflectiveOperationException e) {
    throw new AvroRuntimeException(e);
}

This catches all reflection exceptions but could hide legitimate issues. Consider:

  • Catching specific exceptions (IllegalAccessException, InvocationTargetException)
  • Preserving the exception message in the AvroRuntimeException
  • Adding context about what operation failed

Suggested improvement:

} catch (IllegalAccessException | InvocationTargetException e) {
    throw new AvroRuntimeException("Failed to check sealed class status for " + element, e);
}

4. Inconsistent error handling between static init and runtime

The static initializer silently catches NoSuchMethodException and sets methods to null (lines 80-86), but the runtime code throws AvroRuntimeException on reflection errors. This is acceptable but should be documented - the static init failing is expected on Java <17, but runtime failures indicate real problems.


Performance Considerations

Minor concerns:

  1. Repeated reflection calls: Each call to getUnion() for a sealed class performs reflection. Consider caching the results in a ConcurrentHashMap<Class<?>, Class[]> to avoid repeated reflection overhead
  2. Method invocation overhead: Using Method.invoke() adds overhead vs direct method calls, but this is acceptable since schema creation is typically not a hot path

Security Concerns

✅ No major security issues identified

  1. The reflection API usage is read-only (querying type information)
  2. No dynamic class loading or code execution
  3. Properly handles missing methods on older Java versions

Test Coverage

✅ Good basic coverage

The test validates:

  • Schema generation for sealed interfaces
  • Serialization and deserialization
  • Multiple concrete implementations (Cat and Dog)
  • Round-trip data integrity

📋 Additional test cases to consider:

  1. Nested sealed classes: Test sealed class hierarchies
  2. Empty sealed class: What happens if permitted subclasses is empty?
  3. Mixed annotations: What if a sealed class also has @Union annotation? (Current code would use the annotation, which seems correct)
  4. Non-final permitted subclasses: Test with non-sealed or sealed permitted subclasses
  5. Schema compatibility: Verify that sealed class schemas are compatible with explicitly annotated union schemas
  6. Error cases: Test behavior on Java <17 (should work but not detect sealed classes)

Recommendations

Must Fix:

  1. Fix the critical bug on line 77 - Change SpecificData.class.getClass() to Class.class
  2. Add a test that would fail without the sealed class detection to ensure the feature actually works

Should Fix:

  1. Improve exception handling with more specific catch clauses and better error messages
  2. Add caching for sealed class reflection results to improve performance
  3. Add test cases for edge cases mentioned above

Nice to Have:

  1. Add JavaDoc to the getUnion() method explaining the sealed class detection feature
  2. Consider logging at DEBUG level when sealed classes are detected (for troubleshooting)

Overall Assessment

This is a well-intentioned feature that improves the developer experience for modern Java, but it contains a critical bug that prevents it from working. Once the bug on line 77 is fixed, this will be a solid addition to Avro.

Recommendation: Request changes to fix the critical bug before merging.


Code References

  • Critical bug: lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java:77
  • Exception handling: lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java:845
  • New method: lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java:833

@github-actions
Copy link

Major Issue

  • lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java:735getUnion() now returns the direct results of Class.getPermittedSubclasses() and feeds them straight into Schema.createUnion(). That list can contain sealed or non‑sealed abstract types, not just concrete finals. When a permitted subtype is abstract/non‑final, its schema can’t match any concrete instance written later (e.g., Animal permits Mammal, Mammal permits Cat, Dog). The generated union will contain Mammal, but the writer resolves unions using the concrete schema for Cat, so an AvroTypeException is thrown at runtime. The implementation should expand permitted subclasses recursively (and probably skip non‑final types) before building the union, or otherwise ensure every branch corresponds to an encodable concrete type.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java (3)

920-923: Unify union resolution at field level.

Prefer calling getUnion(field) to centralize logic and enable sealed-class detection consistently at field sites.

Suggested diff:

-    Union union = field.getAnnotation(Union.class);
-    if (union != null)
-      return getAnnotatedUnion(union.value(), names);
+    Class[] union = getUnion(field);
+    if (union != null)
+      return getAnnotatedUnion(union, names);

966-973: Apply centralized union derivation to parameters.

To keep behavior consistent (annotations first, sealed fallback), use getUnion(parameter) rather than inspecting annotations manually.

Minimal change:

-        else if (annotation instanceof Union) // union
-          paramSchema = getAnnotatedUnion(((Union) annotation).value(), names);
+        else if (annotation instanceof Union) // union
+          paramSchema = getAnnotatedUnion(((Union) annotation).value(), names);
...
+      Class[] paramUnion = getUnion(parameter);
+      if (paramUnion != null) {
+        paramSchema = getAnnotatedUnion(paramUnion, names);
+      }

833-851: Add defensive null-check for GET_PERMITTED_SUBCLASSES_METHOD; sorting and flattening lack test evidence.

The null-check for GET_PERMITTED_SUBCLASSES_METHOD (line 843) is a valid defensive improvement, even though both methods are initialized together in the same try-catch block. Adding it is recommended for clarity and robustness.

However, the sorting-by-name suggestion lacks evidence of necessity—no deterministic union ordering requirement is documented or tested in the codebase. Similarly, the flattening of nested sealed hierarchies is speculative; there are no tests for sealed class unions at all, making this an unverified edge-case optimization rather than a confirmed issue.

The core null-check should be applied. The sorting and flattening suggestions should be treated as optional enhancements pending concrete use cases or test failures.

lang/java/java17-test/src/test/java/org/apache/avro/reflect/TestPolymorphicEncoding.java (1)

40-47: Optional: Validate schema shape too.

Add an assertion that ReflectData.get().getSchema(Animal.class) is a UNION with Cat and Dog branches to guard against regressions.

Example (non-diff):

Schema s = ReflectData.get().getSchema(Animal.class);
assertEquals(Schema.Type.UNION, s.getType());
assertEquals(
  Arrays.asList(Cat.class.getName(), Dog.class.getName()),
  s.getTypes().stream().map(t -> t.getFullName()).sorted().toList()
);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a80fc3f and 479223a.

📒 Files selected for processing (2)
  • lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java (7 hunks)
  • lang/java/java17-test/src/test/java/org/apache/avro/reflect/TestPolymorphicEncoding.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lang/java/java17-test/src/test/java/org/apache/avro/reflect/TestPolymorphicEncoding.java (2)
lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java (1)
  • DataFileStream (50-400)
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java (1)
  • ReflectData (69-1118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: maven4
  • GitHub Check: Java Interop (ubuntu-latest)
  • GitHub Check: Java Interop (ubuntu-24.04-arm)
  • GitHub Check: Java Test (ubuntu-24.04-arm)
  • GitHub Check: Java Test (ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: claude-review
  • GitHub Check: codex
🔇 Additional comments (4)
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java (4)

41-41: No issues; import is appropriate.


727-731: OK: class-level union derivation is now shared.

This refactor reads cleanly. Ensure branches are deterministically ordered to keep schema stable across runs. The ordering concern is addressed if you sort inside getUnion for sealed-class paths (see separate comment).


853-858: LGTM: updated utility for union schema creation.

Signature change to Class[] is consistent and reads well.


981-982: OK: method return union handling updated.

Adapting to Class[] via getAnnotatedUnion is consistent with the refactor.

Comment on lines +76 to +89
static {
Class<? extends Class> classClass = SpecificData.class.getClass();
Method isSealed;
Method getPermittedSubclasses;
try {
isSealed = classClass.getMethod("isSealed");
getPermittedSubclasses = classClass.getMethod("getPermittedSubclasses");
} catch (NoSuchMethodException e) {
isSealed = null;
getPermittedSubclasses = null;
}
IS_SEALED_METHOD = isSealed;
GET_PERMITTED_SUBCLASSES_METHOD = getPermittedSubclasses;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Safer reflective lookup and future-proofing.

  • Use Class.class directly; current SpecificData.class.getClass() is indirect.
  • Guard both methods being non-null to avoid NPEs.
  • Consider catching SecurityException too.

Apply this diff:

-  static {
-    Class<? extends Class> classClass = SpecificData.class.getClass();
+  static {
+    Class<?> classClass = Class.class;
     Method isSealed;
     Method getPermittedSubclasses;
     try {
       isSealed = classClass.getMethod("isSealed");
       getPermittedSubclasses = classClass.getMethod("getPermittedSubclasses");
-    } catch (NoSuchMethodException e) {
+    } catch (NoSuchMethodException | SecurityException e) {
       isSealed = null;
       getPermittedSubclasses = null;
     }
     IS_SEALED_METHOD = isSealed;
     GET_PERMITTED_SUBCLASSES_METHOD = getPermittedSubclasses;
   }
🤖 Prompt for AI Agents
In lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java around
lines 76 to 89, replace the indirect reflective lookup using
SpecificData.class.getClass() with Class.class directly, wrap getMethod calls in
a try/catch that catches both NoSuchMethodException and SecurityException, and
ensure both IS_SEALED_METHOD and GET_PERMITTED_SUBCLASSES_METHOD are assigned
safely (set to null if either lookup fails) so downstream code does not get an
NPE; i.e., obtain Method references from Class.class, catch
NoSuchMethodException and SecurityException, and only assign non-null Methods
(or null) to the static fields.

Comment on lines +51 to +53
try (DataFileStream<T> dataFileReader = new DataFileStream<>(new ByteArrayInputStream(toDecode, 0, toDecode.length),
datumReader);) {
List<T> toReturn = new ArrayList<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix try-with-resources syntax (trailing semicolon).

The extra semicolon inside the resource list causes a compile error.

Apply this diff:

-    try (DataFileStream<T> dataFileReader = new DataFileStream<>(new ByteArrayInputStream(toDecode, 0, toDecode.length),
-        datumReader);) {
+    try (DataFileStream<T> dataFileReader = new DataFileStream<>(
+        new ByteArrayInputStream(toDecode, 0, toDecode.length), datumReader)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try (DataFileStream<T> dataFileReader = new DataFileStream<>(new ByteArrayInputStream(toDecode, 0, toDecode.length),
datumReader);) {
List<T> toReturn = new ArrayList<>();
try (DataFileStream<T> dataFileReader = new DataFileStream<>(
new ByteArrayInputStream(toDecode, 0, toDecode.length), datumReader)) {
List<T> toReturn = new ArrayList<>();
🤖 Prompt for AI Agents
In
lang/java/java17-test/src/test/java/org/apache/avro/reflect/TestPolymorphicEncoding.java
around lines 51 to 53, the try-with-resources resource list incorrectly contains
a trailing semicolon which causes a compile error; remove the extra semicolon
after the DataFileStream resource declaration so the try declaration reads with
no semicolon inside the parentheses, leaving the rest of the try block and
resource initialization unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants