Add CASE session initiator (Sigma1/Sigma2/Sigma3)#408
Conversation
ff5cc33 to
2f926b3
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces the CASE (Certificate Authenticated Session Establishment) initiator implementation, enabling controllers to establish secure sessions with commissioned Matter devices. The changes include a new case-initiator feature and the CaseInitiator module which handles the Sigma1, Sigma2, and Sigma3 handshake phases. Feedback focuses on correcting the hardcoded fabric index by adding it to the controller credentials and ensuring that CASE Authenticated Tag (CAT) IDs are properly extracted from the responder's certificate to support accurate access control evaluation.
| /// Controller's Intermediate CA Certificate (TLV encoded, optional) | ||
| pub icac: Option<&'a [u8]>, | ||
| /// Root CA certificate (TLV encoded, shared with the device) | ||
| pub root_ca: &'a [u8], | ||
| /// Controller's operational private key | ||
| pub secret_key: CanonPkcSecretKeyRef<'a>, | ||
| /// Identity Protection Key (operational key) | ||
| pub ipk: CanonAeadKeyRef<'a>, | ||
| /// Fabric ID | ||
| pub fabric_id: u64, | ||
| /// Controller's node ID in the fabric | ||
| pub node_id: u64, | ||
| } | ||
|
|
||
| /// CASE Initiator for establishing secure sessions with commissioned Matter devices. | ||
| /// |
There was a problem hiding this comment.
The ControllerCredentials struct is missing the local fabric index (fab_idx). In Matter, while fabric_id is the 64-bit identifier shared across the network, fab_idx is the 1-based local index used to identify the fabric in the device's internal storage. This index is required when establishing a new session to correctly associate it with the local fabric state.
| /// Controller's Intermediate CA Certificate (TLV encoded, optional) | |
| pub icac: Option<&'a [u8]>, | |
| /// Root CA certificate (TLV encoded, shared with the device) | |
| pub root_ca: &'a [u8], | |
| /// Controller's operational private key | |
| pub secret_key: CanonPkcSecretKeyRef<'a>, | |
| /// Identity Protection Key (operational key) | |
| pub ipk: CanonAeadKeyRef<'a>, | |
| /// Fabric ID | |
| pub fabric_id: u64, | |
| /// Controller's node ID in the fabric | |
| pub node_id: u64, | |
| } | |
| /// CASE Initiator for establishing secure sessions with commissioned Matter devices. | |
| /// | |
| pub struct ControllerCredentials<'a> { | |
| /// Controller's Node Operational Certificate (TLV encoded) | |
| pub noc: &'a [u8], | |
| /// Controller's Intermediate CA Certificate (TLV encoded, optional) | |
| pub icac: Option<&'a [u8]>, | |
| /// Root CA certificate (TLV encoded, shared with the device) | |
| pub root_ca: &'a [u8], | |
| /// Controller's operational private key | |
| pub secret_key: CanonPkcSecretKeyRef<'a>, | |
| /// Identity Protection Key (operational key) | |
| pub ipk: CanonAeadKeyRef<'a>, | |
| /// Fabric ID | |
| pub fabric_id: u64, | |
| /// Controller's node ID in the fabric | |
| pub node_id: u64, | |
| /// Local fabric index | |
| pub fab_idx: NonZeroU8, | |
| } |
|
|
||
| exchange.acknowledge().await?; | ||
|
|
||
| info!( |
There was a problem hiding this comment.
| fab_idx: unwrap!(NonZeroU8::new(1)), | ||
| cat_ids: peer_catids, | ||
| }, | ||
| Some(dec_key), |
There was a problem hiding this comment.
The initiator is currently initializing peer_catids with default values (empty). It should extract the CASE Authenticated Tag (CAT) IDs from the responder's Node Operational Certificate (NOC) received in Sigma2. These IDs are necessary for proper Access Control List (ACL) evaluation for subsequent interactions over this session.
| Some(dec_key), | |
| let mut peer_catids: NocCatIds = Default::default(); | |
| responder_noc.get_cat_ids(&mut peer_catids)?; |
Add client-side CASE handshake (Sigma1/Sigma2/Sigma3) for controllers that need to initiate secure sessions with Matter devices. Gated behind `case-initiator` feature flag to avoid binary size impact on server-only builds (embedded MCUs with limited flash). 550 lines of new code in sc/case/initiator.rs — not compiled unless the feature is enabled.
2f926b3 to
3ba3d56
Compare
|
Addressed all three review items:
Tested: verified CASE session still establishes correctly with our production setup (controller with fab_idx=1, single fabric). |
|
PR #408: Size comparison from 2af46c4 to 354383c Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
|
The CASE initiator was using the raw IPK epoch key directly in dest_id computation and all key derivations (S2K, S3K, session keys). The CASE responder uses the HKDF-derived operational key (GroupKey v1.0), causing a permanent 'Fabric Index mismatch' on every CASE handshake. Derive the operational key from epoch key + compressed fabric ID before use, matching the responder's GroupKeys::update() derivation.
|
Pushed two fixes based on testing: IPK operational key derivation — The initiator was using the raw IPK epoch key directly for destination ID computation and all CASE key derivations (S2K, S3K, session keys). The responder derives an operational key from the epoch key via HKDF (GroupKey v1.0), so handshakes always failed with "Fabric Index mismatch". The initiator now derives the operational key from Review fixes:
|
|
PR #408: Size comparison from 2af46c4 to e735fad Full report (8 builds for (core), dimmable-light, onoff-light, onoff-light-bt, speaker)
|
ivmarkov
left a comment
There was a problem hiding this comment.
@jonlil Thank you for this PR!
I also created #410 and was therefore able to compare yours with #410.
What is re-assuring is that their approach is pretty much equivalent to yours!
I would nevertheless suggest we go with #410 due to the less code duplication in #410. As I mention during the code review of your PR, it seems the LLM in your case was unable to grasp the purpose of the casep.rs module - which is - to be a shared set of CASE-related algorithms that can be used by either the responder or the initiator. In their case the LLM did a tad better, or maybe theirs was coded manually.
Also, #410 has a unit/integration test in the tests/ folder, which is kinda nice.
Very important for you: Your PR is introducing ControllerCredentials, while #410 just uses Fabric which seems a bit more unified with the rest for me. If you really need ControllerCredentials and can't use a regular Fabric, please explain why and eventually we can re-introduce ControllerCredentials on top of #410
| const CASE_LARGE_BUF_SIZE: usize = MAX_CERT_TLV_LEN * 2 + 224; | ||
|
|
||
| /// Sigma2 nonce: "NCASE_Sigma2N" (13 bytes) | ||
| const SIGMA2_NONCE: AeadNonceRef = AeadNonceRef::new(&[ |
There was a problem hiding this comment.
SIGMA2_NONCE and SIGMA3_NONCE are duplicating equivalent consts already defined in casep.rs. I suspect other/all constants are duplicated as well.
It seems the LLM was unable to grasp the purpose of the casep.rs module - which is - to be a shared set of CASE-related algorithms that can be used by either the responder or the initiator.
| /// | ||
| /// This contains the controller's own fabric identity, used to authenticate | ||
| /// to commissioned devices via CASE. | ||
| pub struct ControllerCredentials<'a> { |
There was a problem hiding this comment.
This is essentially duplicating a minimum version of the Fabric type. Why is this necessary rather than just using the Fabrics/Fabric state which is already in rs-matter?
| let compressed_fabric_id = | ||
| Fabric::compute_compressed_fabric_id(crypto, root_pubkey, creds.fabric_id); | ||
|
|
||
| const GRP_KEY_INFO: &[u8] = &[ |
There was a problem hiding this comment.
Hardcoded? Why? Isn't this very unsafe? Shouldn't this be part of ControllerCredentials (and therefore the question if we need that is highlighted again, as Fabric already has groups-related IPK).
| .map_err(|_| ErrorCode::InvalidData)?; | ||
| } | ||
|
|
||
| // Decrypt Sigma2 encrypted payload |
There was a problem hiding this comment.
Would be good a large portion of this code to be moved to CaseP in the form of sigma2_decrypt which would be similar to the already existing sigma2_encrypt used by the responder.
| } | ||
|
|
||
| // Verify certificate chain: NOC -> (ICAC) -> Root CA | ||
| { |
There was a problem hiding this comment.
Logic below already available as CaseP::validate_certs
|
|
||
| // Verify Sigma2 signature | ||
| // TBS2 = { responder_noc, responder_icac?, responder_pub_key, our_pub_key } | ||
| { |
There was a problem hiding this comment.
The validation logic is duplicated here. It is already available as CaseP::validate_sigma3_signature, where validate_sigma3_signature should be renamed to validate_peer_tbs_signature or such to indicate that it would be used by both initiator and responder now.
|
|
||
| /// Compute the compressed fabric ID | ||
| pub(crate) fn compute_compressed_fabric_id<C: Crypto>( | ||
| pub fn compute_compressed_fabric_id<C: Crypto>( |
There was a problem hiding this comment.
Unnecessary if the Fabric type is also used by the initiator.
| [features] | ||
| default = ["os", "rustcrypto", "log"] | ||
|
|
||
| # Optional protocol features |
There was a problem hiding this comment.
Are you positive that the LLM is not hallucinating the code/flash size increase? The equivalent PR #410 shows 0% code-size increase even though its code is not hidden behind a feature.
|
Thanks @ivmarkov for the thorough review and for pointing me to #410 — I hadn't seen it when I opened this. Agreed that #410's approach is the right one: reusing Closing this in favor of #410. We're going to pull #410 into our branch tree and continue running our production controller workload (edge device managing a Matter bridge with ~8 endpoints, continuous subscriptions, repeated reconnects after server restarts) against it. Happy to contribute test scenarios or report back findings on that branch. |
#410 was opened after you opened yours, even if the original code was created before yours. It was just sitting in the Estincelle
Would appreciate that! |
Closes #371
Related: #407 (fixes stale CASE exchange on the responder side — discovered while testing this initiator)
Summary
Adds client-side CASE session initiation — rs-matter can now initiate secure sessions with Matter devices by sending CASESigma1 and processing Sigma2/Sigma3.
This implements the initiator side described in #371:
Implementation
sc/case/initiator.rs(~550 lines)CaseInitiator::initiate()— performs the full Sigma1→Sigma2→Sigma3 handshakeExchange::initiate_for_session()transport APIcase-initiatorfeature flag to avoid binary size impact on server-only embedded buildsFeature flag
Without the feature, the module is not compiled — zero cost for devices that only act as CASE responders.
Testing
Tested in production with a Matter controller (rs-matter client) connecting to a Matter bridge (rs-matter server) with 8 endpoints (lamps, switches, sensors). The controller uses
CaseInitiator::initiate()to establish CASE sessions and then subscribes to device attributes via IM. Running continuously with repeated reconnects after server restarts — which is how we discovered and fixed the stale exchange issue in #407.