Skip to content

Conversation

@yingsu00
Copy link
Contributor

What problem does this PR solve?

Today, the Hive connector’s internal structures are widely referenced across unrelated modules (e.g., exec/tests, dwio, and others). This creates an undesirable coupling and violates the design principle of modular connector boundaries. Before introducing the new Iceberg connector, we should refactor the connector layer to establish a clean, common connector interface that external components depend on, rather than exposing or leaking implementation details of a specific connector.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

In order to decouple Hive and other specific connectors from exec and core modules, we need to put the connector names in a central location at connectors/ConnectorNames.h. The idea is similar to dwio::common:: FileFormat where all file formats are specified. Modules outside of the connectors module can just reference this central and connector- agnostic header instead of connector specific headers like HiveConnector.h.

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

N/A

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No
  • Yes (Description: ...)

@CLAassistant
Copy link

CLAassistant commented Jan 21, 2026

CLA assistant check
All committers have signed the CLA.

@frankobe frankobe requested a review from ZacBlanco January 21, 2026 02:47
@yingsu00 yingsu00 force-pushed the ConnectorNames branch 2 times, most recently from 2bfb849 to 775ce9d Compare January 21, 2026 09:38
Copy link
Collaborator

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Ying

In order to decouple Hive and other specific connectors from exec and
core modules, we need to put the connector names in a central location
at connectors/ConnectorNames.h. The idea is similar to dwio::common::
FileFormat where all file formats are specified. Modules outside of
the connectors module can just reference this central and connector-
agnostic header instead of connector specific headers like HiveConnector.h.
@ZacBlanco
Copy link
Collaborator

@yingsu00 I rebased your PR to get it ready to merge but looks like there are some failures in CI now. Could you take a look? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants