Skip to content

Support for RFC-7049 Canonical CBOR key ordering #13

@vmx

Description

@vmx

This library explicitly specifies RF-8949, so this request might be out of scope.

In the project I want to use cbor4ii for I'm stuck with RFC-7049 Canonical CBOR key ordering. This means that keys are sorted by their length first. I wonder if that could perhaps be added behind a feature flag. Here is an implementation the seems to work. I didn't create a PR as this clearly needs more discussion first.

    fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
    where
        K: ser::Serialize,
        V: ser::Serialize,
        I: IntoIterator<Item = (K, V)>,
    {
        #[cfg(not(feature = "use_std"))]
        use crate::alloc::vec::Vec;
        use serde::ser::SerializeMap;

        // TODO vmx 2022-04-04: This could perhaps be upstreamed, or the
        // `cbor4ii::serde::buf_writer::BufWriter` could be made public.
        impl enc::Write for Vec<u8> {
            type Error = crate::alloc::collections::TryReserveError;

            fn push(&mut self, input: &[u8]) -> Result<(), Self::Error> {
                self.try_reserve(input.len())?;
                self.extend_from_slice(input);
                Ok(())
            }
        }

        // CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
        // This was later revised with RFC-8949, but we need to stick to the original order to stay
        // compatible with existing data.
        // We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
        // comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
        // with the length. Once sorted they are written to the actual output.
        let mut buffer: Vec<u8> = Vec::new();
        let mut mem_serializer = Serializer::new(&mut buffer);
        let mut serializer = Collect {
            bounded: true,
            ser: &mut mem_serializer,
        };
        let mut entries = Vec::new();
        for (key, value) in iter {
            serializer.serialize_entry(&key, &value)
               .map_err(|_| enc::Error::Msg("Map entry cannot be serialized.".into()))?;
            entries.push(serializer.ser.writer.clone());
            serializer.ser.writer.clear();
        }

        TypeNum::new(major::MAP << 5, entries.len() as u64).encode(&mut self.writer)?;
        entries.sort_unstable();
        for entry in entries {
            self.writer.push(&entry)?;
        }

        Ok(())
    }

I'd also like to note that I need even more changes for my use case (it's a subset of CBOR), for which I will need to fork this library. Nonetheless I think it would be a useful addition and I'd also prefer if the fork would be as minimal as possible. I thought I bring it up, to make clear that it won't be a showstopper if this change wouldn't be accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions