Skip to content

Add CASE session initiator (Sigma1/Sigma2/Sigma3)#408

Closed
jonlil wants to merge 5 commits into
project-chip:mainfrom
viska-ab:feature/case-initiator
Closed

Add CASE session initiator (Sigma1/Sigma2/Sigma3)#408
jonlil wants to merge 5 commits into
project-chip:mainfrom
viska-ab:feature/case-initiator

Conversation

@jonlil
Copy link
Copy Markdown
Contributor

@jonlil jonlil commented Apr 11, 2026

Closes #371

Related: #407 (fixes stale CASE exchange on the responder side — discovered while testing this initiator)

Summary

Adds client-side CASE session initiation — rs-matter can now initiate secure sessions with Matter devices by sending CASESigma1 and processing Sigma2/Sigma3.

This implements the initiator side described in #371:

We should also support the other way around, where rs-matter can initiate with CASE-Sigma1 and ongoing. This is beneficial not just for the controller use case, but also for the device use case, when the device needs to actively initiate an exchange over a new session because of pending reporting over a subscription.

Implementation

  • New module sc/case/initiator.rs (~550 lines)
  • CaseInitiator::initiate() — performs the full Sigma1→Sigma2→Sigma3 handshake
  • Establishes a new encrypted session on success, ready for IM exchanges
  • Uses the existing Exchange::initiate_for_session() transport API
  • Gated behind case-initiator feature flag to avoid binary size impact on server-only embedded builds

Feature flag

[dependencies]
rs-matter = { features = ["case-initiator"] }

Without the feature, the module is not compiled — zero cost for devices that only act as CASE responders.

Testing

Tested in production with a Matter controller (rs-matter client) connecting to a Matter bridge (rs-matter server) with 8 endpoints (lamps, switches, sensors). The controller uses CaseInitiator::initiate() to establish CASE sessions and then subscribes to device attributes via IM. Running continuously with repeated reconnects after server restarts — which is how we discovered and fixed the stale exchange issue in #407.

@jonlil jonlil force-pushed the feature/case-initiator branch from ff5cc33 to 2f926b3 Compare April 11, 2026 12:30
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the CASE (Certificate Authenticated Session Establishment) initiator implementation, enabling controllers to establish secure sessions with commissioned Matter devices. The changes include a new case-initiator feature and the CaseInitiator module which handles the Sigma1, Sigma2, and Sigma3 handshake phases. Feedback focuses on correcting the hardcoded fabric index by adding it to the controller credentials and ensuring that CASE Authenticated Tag (CAT) IDs are properly extracted from the responder's certificate to support accurate access control evaluation.

Comment on lines +101 to +116
/// Controller's Intermediate CA Certificate (TLV encoded, optional)
pub icac: Option<&'a [u8]>,
/// Root CA certificate (TLV encoded, shared with the device)
pub root_ca: &'a [u8],
/// Controller's operational private key
pub secret_key: CanonPkcSecretKeyRef<'a>,
/// Identity Protection Key (operational key)
pub ipk: CanonAeadKeyRef<'a>,
/// Fabric ID
pub fabric_id: u64,
/// Controller's node ID in the fabric
pub node_id: u64,
}

/// CASE Initiator for establishing secure sessions with commissioned Matter devices.
///
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ControllerCredentials struct is missing the local fabric index (fab_idx). In Matter, while fabric_id is the 64-bit identifier shared across the network, fab_idx is the 1-based local index used to identify the fabric in the device's internal storage. This index is required when establishing a new session to correctly associate it with the local fabric state.

Suggested change
/// Controller's Intermediate CA Certificate (TLV encoded, optional)
pub icac: Option<&'a [u8]>,
/// Root CA certificate (TLV encoded, shared with the device)
pub root_ca: &'a [u8],
/// Controller's operational private key
pub secret_key: CanonPkcSecretKeyRef<'a>,
/// Identity Protection Key (operational key)
pub ipk: CanonAeadKeyRef<'a>,
/// Fabric ID
pub fabric_id: u64,
/// Controller's node ID in the fabric
pub node_id: u64,
}
/// CASE Initiator for establishing secure sessions with commissioned Matter devices.
///
pub struct ControllerCredentials<'a> {
/// Controller's Node Operational Certificate (TLV encoded)
pub noc: &'a [u8],
/// Controller's Intermediate CA Certificate (TLV encoded, optional)
pub icac: Option<&'a [u8]>,
/// Root CA certificate (TLV encoded, shared with the device)
pub root_ca: &'a [u8],
/// Controller's operational private key
pub secret_key: CanonPkcSecretKeyRef<'a>,
/// Identity Protection Key (operational key)
pub ipk: CanonAeadKeyRef<'a>,
/// Fabric ID
pub fabric_id: u64,
/// Controller's node ID in the fabric
pub node_id: u64,
/// Local fabric index
pub fab_idx: NonZeroU8,
}


exchange.acknowledge().await?;

info!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fabric index (fab_idx) is hardcoded to 1. This is incorrect as the local fabric index can be any value from 1 to 254 depending on how many fabrics are commissioned on the controller. It should use the fab_idx provided in the credentials.

Suggested change
info!(
fab_idx: creds.fab_idx,

fab_idx: unwrap!(NonZeroU8::new(1)),
cat_ids: peer_catids,
},
Some(dec_key),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initiator is currently initializing peer_catids with default values (empty). It should extract the CASE Authenticated Tag (CAT) IDs from the responder's Node Operational Certificate (NOC) received in Sigma2. These IDs are necessary for proper Access Control List (ACL) evaluation for subsequent interactions over this session.

Suggested change
Some(dec_key),
let mut peer_catids: NocCatIds = Default::default();
responder_noc.get_cat_ids(&mut peer_catids)?;

Add client-side CASE handshake (Sigma1/Sigma2/Sigma3) for controllers
that need to initiate secure sessions with Matter devices.

Gated behind `case-initiator` feature flag to avoid binary size impact
on server-only builds (embedded MCUs with limited flash).

550 lines of new code in sc/case/initiator.rs — not compiled unless
the feature is enabled.
@jonlil jonlil force-pushed the feature/case-initiator branch from 2f926b3 to 3ba3d56 Compare April 11, 2026 12:56
@jonlil
Copy link
Copy Markdown
Contributor Author

jonlil commented Apr 11, 2026

Addressed all three review items:

  1. fab_idx in ControllerCredentials — added fab_idx: NonZeroU8 field, no longer hardcoded to 1
  2. Use creds.fab_idx in session creationSessionMode::Case { fab_idx: creds.fab_idx, .. }
  3. Extract peer CAT IDs from responder NOCresponder_noc.get_cat_ids(&mut peer_catids)? now called after certificate chain verification, matching the responder-side pattern in case.rs:287

Tested: verified CASE session still establishes correctly with our production setup (controller with fab_idx=1, single fabric).

@github-actions
Copy link
Copy Markdown

PR #408: Size comparison from 2af46c4 to 354383c

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
platform target config section 2af46c4 354383c change % change
(core) riscv32imac-unknown-none-elf infodefmt-optz-ltofat FLASH 403870 403870 0 0.0
RAM 68072 68072 0 0.0
thumbv6m-none-eabi infodefmt-optz-ltofat FLASH 337860 337880 20 0.0
RAM 64312 64320 8 0.0
thumbv7em-none-eabi infodefmt-optz-ltofat FLASH 312748 312760 12 0.0
RAM 63800 63808 8 0.0
x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 839687 839687 0 0.0
RAM 66616 66616 0 0.0
dimmable-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1957576 1957592 16 0.0
RAM 52352 52352 0 0.0
onoff-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1890784 1890784 0 0.0
RAM 52056 52056 0 0.0
onoff-light-bt x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 3218056 3218744 688 0.0
RAM 5784 5784 0 0.0
speaker x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 850424 850424 0 0.0
RAM 2856 2856 0 0.0

jonlil added 2 commits April 12, 2026 12:16
The CASE initiator was using the raw IPK epoch key directly in dest_id
computation and all key derivations (S2K, S3K, session keys). The CASE
responder uses the HKDF-derived operational key (GroupKey v1.0), causing
a permanent 'Fabric Index mismatch' on every CASE handshake.

Derive the operational key from epoch key + compressed fabric ID before
use, matching the responder's GroupKeys::update() derivation.
@jonlil
Copy link
Copy Markdown
Contributor Author

jonlil commented Apr 12, 2026

Pushed two fixes based on testing:

IPK operational key derivation — The initiator was using the raw IPK epoch key directly for destination ID computation and all CASE key derivations (S2K, S3K, session keys). The responder derives an operational key from the epoch key via HKDF (GroupKey v1.0), so handshakes always failed with "Fabric Index mismatch". The initiator now derives the operational key from epoch_key + compressed_fabric_id before use, matching the responder's GroupKeys::update() derivation. This is what enables dynamic fabric discovery — the responder uses the destination ID to identify which fabric a Sigma1 belongs to, and that only works when both sides compute it with the same derived key.

Review fixes:

  • ControllerCredentials now carries fab_idx (local fabric index) instead of hardcoding NonZeroU8::new(1), so multi-fabric setups work correctly.
  • Extract peer CAT IDs from the responder's NOC and pass them into the session, rather than leaving them as Default::default().
  • Fabric::compute_compressed_fabric_id changed from pub(crate) to pub since the CASE initiator needs it for the IPK derivation.

@github-actions
Copy link
Copy Markdown

PR #408: Size comparison from 2af46c4 to e735fad

Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
platform target config section 2af46c4 e735fad change % change
(core) riscv32imac-unknown-none-elf infodefmt-optz-ltofat FLASH 403870 403870 0 0.0
RAM 68072 68072 0 0.0
thumbv6m-none-eabi infodefmt-optz-ltofat FLASH 337860 337880 20 0.0
RAM 64312 64320 8 0.0
thumbv7em-none-eabi infodefmt-optz-ltofat FLASH 312748 312760 12 0.0
RAM 63800 63808 8 0.0
x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 839687 839687 0 0.0
RAM 66616 66616 0 0.0
dimmable-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1957576 1957608 32 0.0
RAM 52352 52352 0 0.0
onoff-light x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 1890784 1890752 -32 -0.0
RAM 52056 52056 0 0.0
onoff-light-bt x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 3218056 3218168 112 0.0
RAM 5784 5784 0 0.0
speaker x86_64-unknown-linux-gnu infologs-optz-ltofat FLASH 850424 850424 0 0.0
RAM 2856 2856 0 0.0

Copy link
Copy Markdown
Contributor

@ivmarkov ivmarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonlil Thank you for this PR!
I also created #410 and was therefore able to compare yours with #410.

What is re-assuring is that their approach is pretty much equivalent to yours!

I would nevertheless suggest we go with #410 due to the less code duplication in #410. As I mention during the code review of your PR, it seems the LLM in your case was unable to grasp the purpose of the casep.rs module - which is - to be a shared set of CASE-related algorithms that can be used by either the responder or the initiator. In their case the LLM did a tad better, or maybe theirs was coded manually.

Also, #410 has a unit/integration test in the tests/ folder, which is kinda nice.

Very important for you: Your PR is introducing ControllerCredentials, while #410 just uses Fabric which seems a bit more unified with the rest for me. If you really need ControllerCredentials and can't use a regular Fabric, please explain why and eventually we can re-introduce ControllerCredentials on top of #410

const CASE_LARGE_BUF_SIZE: usize = MAX_CERT_TLV_LEN * 2 + 224;

/// Sigma2 nonce: "NCASE_Sigma2N" (13 bytes)
const SIGMA2_NONCE: AeadNonceRef = AeadNonceRef::new(&[
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGMA2_NONCE and SIGMA3_NONCE are duplicating equivalent consts already defined in casep.rs. I suspect other/all constants are duplicated as well.

It seems the LLM was unable to grasp the purpose of the casep.rs module - which is - to be a shared set of CASE-related algorithms that can be used by either the responder or the initiator.

///
/// This contains the controller's own fabric identity, used to authenticate
/// to commissioned devices via CASE.
pub struct ControllerCredentials<'a> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially duplicating a minimum version of the Fabric type. Why is this necessary rather than just using the Fabrics/Fabric state which is already in rs-matter?

let compressed_fabric_id =
Fabric::compute_compressed_fabric_id(crypto, root_pubkey, creds.fabric_id);

const GRP_KEY_INFO: &[u8] = &[
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded? Why? Isn't this very unsafe? Shouldn't this be part of ControllerCredentials (and therefore the question if we need that is highlighted again, as Fabric already has groups-related IPK).

.map_err(|_| ErrorCode::InvalidData)?;
}

// Decrypt Sigma2 encrypted payload
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good a large portion of this code to be moved to CaseP in the form of sigma2_decrypt which would be similar to the already existing sigma2_encrypt used by the responder.

}

// Verify certificate chain: NOC -> (ICAC) -> Root CA
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic below already available as CaseP::validate_certs


// Verify Sigma2 signature
// TBS2 = { responder_noc, responder_icac?, responder_pub_key, our_pub_key }
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic is duplicated here. It is already available as CaseP::validate_sigma3_signature, where validate_sigma3_signature should be renamed to validate_peer_tbs_signature or such to indicate that it would be used by both initiator and responder now.

Comment thread rs-matter/src/fabric.rs

/// Compute the compressed fabric ID
pub(crate) fn compute_compressed_fabric_id<C: Crypto>(
pub fn compute_compressed_fabric_id<C: Crypto>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary if the Fabric type is also used by the initiator.

Comment thread rs-matter/Cargo.toml
[features]
default = ["os", "rustcrypto", "log"]

# Optional protocol features
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you positive that the LLM is not hallucinating the code/flash size increase? The equivalent PR #410 shows 0% code-size increase even though its code is not hidden behind a feature.

@jonlil
Copy link
Copy Markdown
Contributor Author

jonlil commented Apr 14, 2026

Thanks @ivmarkov for the thorough review and for pointing me to #410 — I hadn't seen it when I opened this. Agreed that #410's approach is the right one: reusing Fabric and sharing logic via casep.rs is structurally cleaner, and having proper integration tests is a real improvement.

Closing this in favor of #410. We're going to pull #410 into our branch tree and continue running our production controller workload (edge device managing a Matter bridge with ~8 endpoints, continuous subscriptions, repeated reconnects after server restarts) against it. Happy to contribute test scenarios or report back findings on that branch.

@jonlil jonlil closed this Apr 14, 2026
@ivmarkov
Copy link
Copy Markdown
Contributor

Thanks @ivmarkov for the thorough review and for pointing me to #410 — I hadn't seen it when I opened this. Agreed that #410's approach is the right one: reusing Fabric and sharing logic via casep.rs is structurally cleaner, and having proper integration tests is a real improvement.

#410 was opened after you opened yours, even if the original code was created before yours. It was just sitting in the Estincelle rs-matter fork waiting to be contributed upstream.

Happy to contribute test scenarios or report back findings on that branch.

Would appreciate that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance CASE (case.rs) with support for session negotiation initiation

2 participants