Skip to content

gsoc26: Format Conversion Layer (Layer 2) + CLI refactor (#59, #61)#62

Open
DhanashreePetare wants to merge 7 commits into
dbpedia:gsoc-2026from
DhanashreePetare:feature/format-conversion
Open

gsoc26: Format Conversion Layer (Layer 2) + CLI refactor (#59, #61)#62
DhanashreePetare wants to merge 7 commits into
dbpedia:gsoc-2026from
DhanashreePetare:feature/format-conversion

Conversation

@DhanashreePetare

@DhanashreePetare DhanashreePetare commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Pull Request

Description

Implements Layer 2 (Format Conversion) for the Databus Python Client download pipeline, bringing it to feature parity with the Java client as described in Frey et al. Users can now convert between RDF serialization formats and tabular formats on-the-fly during download using the new --format flag. Also refactors the compression CLI (Issue #61) by replacing --convert-to / --convert-from with a single --compression flag.

Related Issues
Issue #59 (Format and Mapping Conversion Layer — Layer 2)
Issue #61 (Refactor CLI compression)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • This change requires a documentation update
  • Housekeeping

Checklist:

  • My code follows the ruff code style of this project.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
    • poetry run pytest - all tests passed
    • poetry run ruff check - no linting errors

What was added:

databusclient/filehandling/format.py — Layer 2 with TripleHandler, QuadHandler, TSDHandler classes using rdflib.Graph, rdflib.Dataset, and list[list[str]] as intermediate representations respectively. Each handler exposes read(), write(), and convert().

What was changed:

  • --format flag replaces --convert-format, short aliases added (nt, ttl, rdf, xml, nq, jsonld)
  • --compression replaces --convert-to / --convert-from (source auto-detected from file extension)
  • Download pipeline: decompress once → convert → recompress once (was double decompress/recompress)
  • Original file deleted after successful conversion
  • Safe temp file handling via tempfile.NamedTemporaryFile

Tests:

  • 9 format round trip tests in tests/test_format_round_trips.py (one per format, IR captured before conversion)

Closes #59
Closes #61

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f9ce7af-e4f8-46a2-970c-903a786ded61

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant