Skip to content

RFC for table encryption#2183

Draft
xanderbailey wants to merge 4 commits intoapache:mainfrom
xanderbailey:xb/encryption_rfc
Draft

RFC for table encryption#2183
xanderbailey wants to merge 4 commits intoapache:mainfrom
xanderbailey:xb/encryption_rfc

Conversation

@xanderbailey
Copy link
Contributor

Which issue does this PR close?

RFC for table encryption
Part of: #2034
Rough draft with some of the key parts: #2042

  • Closes #.

What changes are included in this PR?

Are these changes tested?

@mbutrovich
Copy link
Collaborator

Not specific feedback for the RFC, just sharing some context links and previous discussion that was helpful in getting the EncryptionFactory design into DataFusion that interfaces with the parquet crate. Comet makes good use of this design with a custom EncryptionFactory that uses JNI to interface with Spark-based custom KMSs.

apache/datafusion#16779
apache/datafusion#15216 (comment)

Also tagging @ggershinsky in case he has any cycles to read this. His guidance was instrumental on the DataFusion and Arrow-rs PME work, as well as Iceberg Java encryption implementation.

@ggershinsky
Copy link

Thanks, I'll be glad to have a look.

```
Master Key (in KMS)
└── wraps → KEK (Key Encryption Key) — stored in table metadata as EncryptedKey
└── wraps → DEK (Data Encryption Key) — stored in StandardKeyMetadata per file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some DEKs (those for manifest list files) are also stored in table metadata as EncryptedKey. These DEKs are indeed packaged in a StandardKeyMetadata (along with AAD prefix and file length). The serialized StandardKeyMetadata is encrypted/wrapped by the KEK, and stored in the table metadata / encrypted_keys structure.

The manifest file DEKs are packaged in StandardKeyMetadata, and stored as-is (without encryption) in manifest list files. The latter are encrypted then.

The data file DEKs are packaged in StandardKeyMetadata, and stored as-is (without encryption) in manifest files. The latter are encrypted then.


- **Master keys** live in the KMS and never leave it
- **KEKs** are wrapped by the master key and stored in `TableMetadata.encryption_keys`
- **DEKs** are wrapped by a KEK and stored per-file in `StandardKeyMetadata`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only manifest list DEKs are wrapped by a KEK. Other DEKs are encrypted in the parent files, by the parent DEKs

load_manifest_list(file_io, table_metadata)
1. Look up encryption_key_id in table_metadata.encryption_keys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to unwrap the KEK (via a KMS client)

a. file_io.new_encrypted_output(path) → AGS1-encrypting OutputFile
b. em.wrap_key_metadata() → EncryptedKey for table metadata
c. Store key_id on Snapshot.encryption_key_id
3. Table updates include AddEncryptionKey for new KEKs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to wrap the KEK (via a KMS client)

@xanderbailey
Copy link
Contributor Author

Thanks for taking a look @ggershinsky I've tried to fill in some of the details here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants