Add HtmlIO asset handler for text/html (C2PA 2.4 Appendix A.7)#2188
Open
erik-sv wants to merge 5 commits into
Open
Add HtmlIO asset handler for text/html (C2PA 2.4 Appendix A.7)#2188erik-sv wants to merge 5 commits into
erik-sv wants to merge 5 commits into
Conversation
Add a TextIO asset handler that embeds and extracts C2PA JUMBF manifests in plain text files using the c2pa-text crate. The crate encodes binary manifest data as invisible Unicode Variation Selectors, following the C2PA text embedding specification (Section A.7). The handler implements CAIReader, CAIWriter, and AssetPatch for full read/write/patch support. Hash object positions span the entire text content with an exclusion range covering the embedded manifest bytes. Registers "txt" and "text/plain" as supported types in the MIME utility and adds TextIO to all three handler maps (readers, writers, file-based). The c2pa-text reference implementation is at: https://github.com/encypherai/c2pa-text
Git dependencies are rejected by crates.io during publish. c2pa-text v1.1.0 is already published on crates.io, so reference it directly.
- Bump c2pa-text from 1.1.0 to 2.0.0 (released crates.io version adding the structured (A.9) and HTML (A.7) pipelines; TextIO's embed_manifest/ extract_manifest usage is unchanged). - Resolve a merge conflict left in sdk/Cargo.toml by the prior 'Merge branch main' commit: keep c2pa-text and drop the 'config' dependency, which main removed (it is unused in the sdk). sdk compiles and the text_io tests pass.
Implements the C2PA HTML embedding method (Appendix A.7) on top of c2pa-text 2.0.0: - read: extracts a Base64-encoded Manifest Store from an inline <script type="application/c2pa"> element. External <link rel="c2pa-manifest"> references are recognized but treated as an external (non-embedded) store. - write: embeds the Manifest Store as a <script> element in the document <head>, replacing any existing C2PA element so writes are idempotent. - object locations: the c2pa.hash.data exclusion covers the <script> element (Appendix A.7.1.3). - remove: strips the C2PA <script> element, restoring clean HTML. Registers html / htm / text/html in the reader, writer, and AssetIO handler maps and the MIME extension<->type tables. Includes round-trip, replace+remove, object-location, and no-manifest tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an
HtmlIOasset handler implementing the C2PA 2.4 Appendix A.7 HTML embedding method, built on thec2pa-text2.0.0crate. This is a companion to thetext/plainhandler in #2117 and covers a distinct embedding method, so it is proposed separately for clean review.What it does
read_cai): extracts a Base64-encoded C2PA Manifest Store from an inline<script type="application/c2pa">element. An external<link rel="c2pa-manifest">reference is recognized but treated as an external (non-embedded) store.write_cai): embeds the Manifest Store as a<script>element in the document<head>, removing any existing C2PA element first so writes are idempotent.c2pa.hash.dataexclusion range covers the entire<script>element, per Appendix A.7.1.3.<script>element, restoring clean HTML.Registers
html/htm/text/htmlin the reader, writer, andAssetIOhandler maps and the MIME extension↔type tables.Scope
This handler implements only the HTML method (A.7). It does not touch:
TextIO, Add text/plain asset handler via c2pa-text crate #2117.SvgIO.Dependency
Stacked on #2117, which introduces the
c2pa-text 2.0.0dependency this handler uses (c2pa_text::html). Until #2117 merges, the diff here also shows that version bump; it will narrow to the HTML changes once #2117 lands.Tests
cargo test -p c2pa --lib --features file_io html_io— round-trip, replace+remove, object-location, and no-manifest cases pass, plus thejumbf_ioreader/writer/AssetIO registration tests.