-
Notifications
You must be signed in to change notification settings - Fork 6
[ISSUE #72]✨Add UTF-8 validation methods and deprecate unsafe conversions in CheetahString #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ions in CheetahString
📝 WalkthroughWalkthroughThis pull request introduces UTF-8 validation pathways for CheetahString by adding safe constructor methods Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds UTF-8 validation methods and introduces safety warnings for existing unsafe byte-to-string conversions in CheetahString. It addresses issue #72 by providing safer alternatives (try_from_vec and try_from_bytes) that validate UTF-8 encoding before creating CheetahString instances.
Key Changes:
- Introduces new error handling infrastructure with
ErrorandResulttypes - Adds UTF-8-validating methods
try_from_vec()andtry_from_bytes() - Documents existing unsafe
From<Vec<u8>>andFrom<&[u8]>implementations with safety warnings
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| src/error.rs | New error module defining Error enum for UTF-8 validation errors with proper Display and std::error::Error trait implementations |
| src/lib.rs | Exports new Error and Result types from the error module |
| src/cheetah_string.rs | Adds try_from_vec and try_from_bytes validation methods; documents existing unsafe conversions; adds safety comments to unsafe code blocks |
| tests/basic.rs | Comprehensive test suite covering new validation methods, invalid UTF-8 handling, and various CheetahString constructors and conversions |
Comments suppressed due to low confidence (1)
src/cheetah_string.rs:322
- The
from_arc_vecmethod acceptsArc<Vec<u8>>without performing UTF-8 validation, but later code assumes the bytes are valid UTF-8 when usingfrom_utf8_unchecked. This violates the safety invariant documented in the code and can lead to undefined behavior if invalid UTF-8 bytes are passed in.
This method should either:
- Add UTF-8 validation before creating the CheetahString, or
- Be marked as unsafe with clear documentation about the caller's responsibility to ensure valid UTF-8, or
- Be deprecated in favor of a safe alternative
pub fn from_arc_vec(s: Arc<Vec<u8>>) -> Self {
CheetahString {
inner: InnerString::ArcVecString(s),
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn test_try_from_vec_method() { | ||
| let bytes = vec![104, 101, 108, 108, 111]; | ||
| let s = CheetahString::try_from_vec(bytes).unwrap(); | ||
| assert_eq!(s, "hello"); | ||
|
|
||
| let invalid = vec![0xFF, 0xFE]; | ||
| assert!(CheetahString::try_from_vec(invalid).is_err()); | ||
| } | ||
|
|
||
| #[test] |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests test_try_from_vec_method and test_try_from_valid_vec/test_try_from_invalid_vec are duplicating test coverage for the same functionality. Consider removing test_try_from_vec_method since the behavior is already covered by the more descriptive pair of tests that separate valid and invalid cases.
| fn test_try_from_vec_method() { | |
| let bytes = vec![104, 101, 108, 108, 111]; | |
| let s = CheetahString::try_from_vec(bytes).unwrap(); | |
| assert_eq!(s, "hello"); | |
| let invalid = vec![0xFF, 0xFE]; | |
| assert!(CheetahString::try_from_vec(invalid).is_err()); | |
| } | |
| #[test] |
| // SAFETY: ArcVecString is only created from validated UTF-8 sources. | ||
| // All constructors ensure this invariant is maintained. | ||
| unsafe { std::str::from_utf8_unchecked(s.as_ref()) } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety comment claims "ArcVecString is only created from validated UTF-8 sources" and "All constructors ensure this invariant is maintained." However, the from_arc_vec method at line 318 does not validate UTF-8, creating a path for invalid UTF-8 to reach this unsafe code. This violates the documented safety invariant and can cause undefined behavior.
| // SAFETY: ArcVecString is only created from validated UTF-8 sources. | |
| // All constructors ensure this invariant is maintained. | |
| unsafe { std::str::from_utf8_unchecked(s.as_ref()) } | |
| // Enforce that ArcVecString only contains valid UTF-8. | |
| std::str::from_utf8(s.as_ref()) | |
| .expect("ArcVecString must contain valid UTF-8") |
| // SAFETY: Bytes variant should only be created from valid UTF-8 sources | ||
| unsafe { String::from_utf8_unchecked(b.to_vec()) } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety comment claims "Bytes variant should only be created from valid UTF-8 sources", but the from_bytes method does not perform UTF-8 validation. This allows invalid UTF-8 to reach this unsafe code, which can cause undefined behavior when converting to String.
| // SAFETY: Bytes variant should only be created from valid UTF-8 sources | |
| unsafe { String::from_utf8_unchecked(b.to_vec()) } | |
| String::from_utf8(b.to_vec()) | |
| .expect("Bytes variant must contain valid UTF-8") |
| pub fn try_from_bytes(b: &[u8]) -> Result<Self, Utf8Error> { | ||
| let s = std::str::from_utf8(b)?; |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try_from_bytes method returns Result<Self, Utf8Error> instead of using the new crate::Result<T> type alias (which is Result<T, crate::Error>). This creates an inconsistent error handling API. Consider changing the return type to crate::Result<Self> to maintain consistency with the error handling infrastructure introduced in this PR.
| pub fn try_from_bytes(b: &[u8]) -> Result<Self, Utf8Error> { | |
| let s = std::str::from_utf8(b)?; | |
| pub fn try_from_bytes(b: &[u8]) -> crate::Result<Self> { | |
| let s = std::str::from_utf8(b).map_err(Into::into)?; |
| /// if the bytes are not valid UTF-8. Consider using `CheetahString::try_from_bytes()` | ||
| /// for safe UTF-8 validation. | ||
| /// | ||
| /// This implementation will be deprecated in a future version. |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says "This implementation will be deprecated in a future version" but doesn't use the #[deprecated] attribute. Consider adding the #[deprecated] attribute with a message directing users to try_from_bytes() to provide compiler warnings and IDE hints for users of this API.
| /// This implementation will be deprecated in a future version. | |
| /// This implementation will be deprecated in a future version. | |
| #[deprecated( | |
| note = "Use CheetahString::try_from_bytes() for safe UTF-8 validation instead; this unsafe From<&[u8]> impl will be removed in a future version." | |
| )] |
| /// if the bytes are not valid UTF-8. Consider using `CheetahString::try_from_vec()` | ||
| /// for safe UTF-8 validation. | ||
| /// | ||
| /// This implementation will be deprecated in a future version. |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says "This implementation will be deprecated in a future version" but doesn't use the #[deprecated] attribute. Consider adding the #[deprecated] attribute with a message directing users to try_from_vec() to provide compiler warnings and IDE hints for users of this API.
| /// This implementation will be deprecated in a future version. | |
| /// This implementation will be deprecated in a future version. | |
| #[deprecated(note = "use CheetahString::try_from_vec() instead for safe UTF-8 validation")] |
| InnerString::Bytes(b) => { | ||
| // SAFETY: Bytes variant is only created from validated UTF-8 sources. | ||
| // The from_bytes constructor ensures this invariant. | ||
| unsafe { std::str::from_utf8_unchecked(b.as_ref()) } | ||
| } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety comment claims "Bytes variant is only created from validated UTF-8 sources" and "The from_bytes constructor ensures this invariant." However, the from_bytes method does not perform UTF-8 validation, which means this safety claim is incorrect and the use of from_utf8_unchecked here can cause undefined behavior if invalid UTF-8 is passed through that constructor.
| } => { | ||
| // SAFETY: ArcVecString should only be created from valid UTF-8 sources | ||
| unsafe { String::from_utf8_unchecked(s.to_vec()) } | ||
| } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety comment claims "ArcVecString should only be created from valid UTF-8 sources", but the from_arc_vec method at line 318 does not validate UTF-8. This creates a path for invalid UTF-8 data to reach this unsafe code, which can cause undefined behavior when converting to String.
| /// let invalid = vec![0xFF, 0xFE]; | ||
| /// assert!(CheetahString::try_from_vec(invalid).is_err()); | ||
| /// ``` | ||
| pub fn try_from_vec(v: Vec<u8>) -> Result<Self, Utf8Error> { |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try_from_vec and try_from_bytes methods return Result<Self, Utf8Error> instead of using the new crate::Result<T> type alias (which is Result<T, crate::Error>). This creates an inconsistent error handling API. Consider changing the return type to crate::Result<Self> to maintain consistency with the error handling infrastructure introduced in this PR.
| pub fn try_from_vec(v: Vec<u8>) -> Result<Self, Utf8Error> { | |
| pub fn try_from_vec(v: Vec<u8>) -> crate::Result<Self> { |
| fn test_try_from_bytes_method() { | ||
| let bytes = b"hello world"; | ||
| let s = CheetahString::try_from_bytes(bytes).unwrap(); | ||
| assert_eq!(s, "hello world"); | ||
|
|
||
| let invalid = &[0xFF, 0xFE]; | ||
| assert!(CheetahString::try_from_bytes(invalid).is_err()); | ||
| } | ||
|
|
||
| #[test] |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests test_try_from_bytes_method and test_try_from_valid_bytes/test_try_from_invalid_bytes are duplicating test coverage for the same functionality. Consider removing test_try_from_bytes_method since the behavior is already covered by the more descriptive pair of tests that separate valid and invalid cases.
| fn test_try_from_bytes_method() { | |
| let bytes = b"hello world"; | |
| let s = CheetahString::try_from_bytes(bytes).unwrap(); | |
| assert_eq!(s, "hello world"); | |
| let invalid = &[0xFF, 0xFE]; | |
| assert!(CheetahString::try_from_bytes(invalid).is_err()); | |
| } | |
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/cheetah_string.rs (3)
262-266: Critical soundness bug:from_veccreates ArcVecString without UTF-8 validation.This public method creates
ArcVecStringfrom aVec<u8>without validating UTF-8. Later,as_str()(lines 357-361) usesfrom_utf8_uncheckedon this data, which causes undefined behavior if the bytes aren't valid UTF-8.Either:
- Mark this method as
unsafeand document the UTF-8 requirement, or- Deprecate it in favor of
try_from_vec, or- Add UTF-8 validation here
🔎 Recommended fix: Add UTF-8 validation
#[inline] pub fn from_vec(s: Vec<u8>) -> Self { + // Validate UTF-8 to maintain safety invariants + let _ = std::str::from_utf8(&s).expect("from_vec requires valid UTF-8 bytes"); CheetahString { inner: InnerString::ArcVecString(Arc::new(s)), } }Or better, deprecate it:
+#[deprecated(since = "0.x.0", note = "Use `try_from_vec` for safe UTF-8 validation")] #[inline] pub fn from_vec(s: Vec<u8>) -> Self { CheetahString { inner: InnerString::ArcVecString(Arc::new(s)), } }
318-322: Critical soundness bug:from_arc_veccreates ArcVecString without UTF-8 validation.Same issue as
from_vec- this createsArcVecStringwithout UTF-8 validation, violating the safety invariant relied upon byas_str().🔎 Recommended fix: Mark as unsafe or add validation
Option 1 - Mark as unsafe:
#[inline] -pub fn from_arc_vec(s: Arc<Vec<u8>>) -> Self { +pub unsafe fn from_arc_vec(s: Arc<Vec<u8>>) -> Self { CheetahString { inner: InnerString::ArcVecString(s), } }Option 2 - Add validation:
#[inline] pub fn from_arc_vec(s: Arc<Vec<u8>>) -> Self { + let _ = std::str::from_utf8(&s).expect("from_arc_vec requires valid UTF-8 bytes"); CheetahString { inner: InnerString::ArcVecString(s), } }
344-350: Critical soundness bug:from_bytescreates Bytes variant without UTF-8 validation.The
from_bytesmethod creates aBytesvariant without validating UTF-8. Theas_str()method later usesfrom_utf8_uncheckedon this data (lines 363-367), which assumes allBytesare valid UTF-8. This can cause undefined behavior.Additionally, the
bytes::Bytestype does not guarantee UTF-8 encoding - it's just a byte buffer.🔎 Recommended fix: Add UTF-8 validation or mark as unsafe
#[inline] #[cfg(feature = "bytes")] pub fn from_bytes(b: bytes::Bytes) -> Self { + let _ = std::str::from_utf8(&b).expect("from_bytes requires valid UTF-8"); CheetahString { inner: InnerString::Bytes(b), } }Or provide a safe alternative:
#[cfg(feature = "bytes")] pub fn try_from_bytes_buf(b: bytes::Bytes) -> Result<Self, Utf8Error> { std::str::from_utf8(&b)?; Ok(CheetahString { inner: InnerString::Bytes(b), }) }
🧹 Nitpick comments (2)
src/error.rs (1)
6-13: Consider deferring unused error variants.The
IndexOutOfBoundsandInvalidCharBoundaryvariants are defined but not currently used in this PR. While forward-planning is good, you may want to consider adding these when they're actually needed to keep the API surface minimal.tests/basic.rs (1)
1-296: Comprehensive test coverage, but missing tests for unsafe constructors.The test suite provides excellent coverage of the CheetahString API, including the new
try_from_bytesandtry_from_vecmethods with both valid and invalid UTF-8 inputs. However, there are no tests for the unsafe constructors that were identified as having soundness issues:
from_vec(with invalid UTF-8)from_arc_vec(with invalid UTF-8)from_bytes(with invalid UTF-8)Consider adding tests that demonstrate the undefined behavior risk with these methods (though these tests would need to be marked as ignored or compile_fail since they trigger UB).
Example test to demonstrate the soundness issue
#[test] #[should_panic] // or #[ignore] since this is UB fn test_from_vec_invalid_utf8_causes_ub() { let invalid = vec![0xFF, 0xFE]; let s = CheetahString::from_vec(invalid); // This call to as_str() triggers undefined behavior let _ = s.as_str(); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cheetah_string.rssrc/error.rssrc/lib.rstests/basic.rs
🧰 Additional context used
🧬 Code graph analysis (3)
src/lib.rs (1)
src/serde.rs (1)
cheetah_string(22-84)
src/cheetah_string.rs (1)
src/error.rs (1)
from(40-42)
tests/basic.rs (1)
src/cheetah_string.rs (24)
new(250-252)is_empty(397-406)len(385-394)as_str(353-370)default(20-24)from_static_str(255-259)from(29-31)from(36-38)from(43-45)from(57-61)from(81-85)from(90-95)from(100-105)from(119-121)from(162-164)from(169-171)from(176-201)from(236-238)from_string(332-336)borrow(504-506)try_from_bytes(312-315)try_from_vec(286-292)empty(243-247)from_str(67-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (6)
src/lib.rs (1)
22-28: LGTM! Clean error module integration.The error module is properly declared and its public types are exposed at the crate root following idiomatic Rust patterns.
src/error.rs (3)
15-27: LGTM! Clear and informative error messages.The Display implementation provides user-friendly error messages for all variants.
29-37: LGTM! Proper std::error::Error implementation.The conditional implementation correctly exposes the underlying
Utf8Erroras the error source when the std feature is enabled.
39-46: LGTM! Standard error handling patterns.The
Fromimplementation andResulttype alias follow idiomatic Rust conventions for error handling.src/cheetah_string.rs (2)
268-292: LGTM! Safe UTF-8 validation implemented correctly.This method properly validates UTF-8 before creating the
CheetahString, providing a safe alternative to the unsafefrom_vecmethod. The documentation with examples is also excellent.
294-315: LGTM! Safe UTF-8 validation implemented correctly.This method properly validates UTF-8 before construction. Good documentation and examples.
| /// # Safety Warning | ||
| /// | ||
| /// This implementation uses `unsafe` code and may cause undefined behavior | ||
| /// if the bytes are not valid UTF-8. Consider using `CheetahString::try_from_bytes()` | ||
| /// for safe UTF-8 validation. | ||
| /// | ||
| /// This implementation will be deprecated in a future version. | ||
| impl From<&[u8]> for CheetahString { | ||
| #[inline] | ||
| fn from(b: &[u8]) -> Self { | ||
| // SAFETY: This is unsafe and may cause UB if bytes are not valid UTF-8. | ||
| // This will be deprecated in favor of try_from_bytes in the next version. | ||
| CheetahString::from_slice(unsafe { std::str::from_utf8_unchecked(b) }) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: This From implementation is unsound and must be deprecated immediately.
This implementation creates ArcString via from_utf8_unchecked, but it's callable from safe code without any validation. While you've documented that it "will be deprecated in a future version," this should be:
- Marked with
#[deprecated]attribute now - Or removed entirely in favor of
try_from_bytes
The safety warning in documentation doesn't prevent misuse from safe code.
🔎 Recommended fix: Add deprecation attribute
+#[deprecated(since = "0.x.0", note = "Use `CheetahString::try_from_bytes()` instead for safe UTF-8 validation")]
impl From<&[u8]> for CheetahString {
#[inline]
fn from(b: &[u8]) -> Self {
- // SAFETY: This is unsafe and may cause UB if bytes are not valid UTF-8.
- // This will be deprecated in favor of try_from_bytes in the next version.
CheetahString::from_slice(unsafe { std::str::from_utf8_unchecked(b) })
}
}| /// # Safety Warning | ||
| /// | ||
| /// This implementation uses `unsafe` code and may cause undefined behavior | ||
| /// if the bytes are not valid UTF-8. Consider using `CheetahString::try_from_vec()` | ||
| /// for safe UTF-8 validation. | ||
| /// | ||
| /// This implementation will be deprecated in a future version. | ||
| impl From<Vec<u8>> for CheetahString { | ||
| #[inline] | ||
| fn from(v: Vec<u8>) -> Self { | ||
| // SAFETY: This is unsafe and may cause UB if bytes are not valid UTF-8. | ||
| // This will be deprecated in favor of try_from_vec in the next version. | ||
| CheetahString::from_slice(unsafe { std::str::from_utf8_unchecked(&v) }) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: This From implementation is unsound and must be deprecated immediately.
Same issue as From<&[u8]> - this uses from_utf8_unchecked from safe code without validation.
🔎 Recommended fix: Add deprecation attribute
+#[deprecated(since = "0.x.0", note = "Use `CheetahString::try_from_vec()` instead for safe UTF-8 validation")]
impl From<Vec<u8>> for CheetahString {
#[inline]
fn from(v: Vec<u8>) -> Self {
- // SAFETY: This is unsafe and may cause UB if bytes are not valid UTF-8.
- // This will be deprecated in favor of try_from_vec in the next version.
CheetahString::from_slice(unsafe { std::str::from_utf8_unchecked(&v) })
}
}🤖 Prompt for AI Agents
In src/cheetah_string.rs around lines 72 to 86, the From<Vec<u8>> implementation
unsafely uses from_utf8_unchecked; mark this impl as deprecated and make it safe
by delegating to the fallible conversion: add a deprecation attribute (e.g.
#[deprecated(since = "x.y.z", note = "unsound: use CheetahString::try_from_vec
instead")]) to the impl, and replace the body to call
CheetahString::try_from_vec(v).expect("From<Vec<u8>> is deprecated and requires
valid UTF-8; use try_from_vec for fallible conversion") so the code no longer
performs unchecked UTF-8 conversion while signaling deprecation and guiding
users to the safe API.
| } => { | ||
| // SAFETY: ArcVecString should only be created from valid UTF-8 sources | ||
| unsafe { String::from_utf8_unchecked(s.to_vec()) } | ||
| } | ||
| #[cfg(feature = "bytes")] | ||
| CheetahString { | ||
| inner: InnerString::Bytes(b), | ||
| } => unsafe { String::from_utf8_unchecked(b.to_vec()) }, | ||
| } => { | ||
| // SAFETY: Bytes variant should only be created from valid UTF-8 sources | ||
| unsafe { String::from_utf8_unchecked(b.to_vec()) } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all code that constructs ArcVecString or Bytes variants
echo "=== Searching for ArcVecString construction ==="
rg -n -C3 'InnerString::ArcVecString'
echo -e "\n=== Searching for Bytes variant construction ==="
rg -n -C3 'InnerString::Bytes'Repository: mxsm/cheetah-string
Length of output: 7539
🏁 Script executed:
# Let's look at the specific constructors mentioned
sed -n '262,266p' src/cheetah_string.rs
echo "---"
sed -n '318,322p' src/cheetah_string.rs
echo "---"
sed -n '346,350p' src/cheetah_string.rs
echo "---"
# Also check if there's a from_vec_checked constructor
sed -n '280,293p' src/cheetah_string.rsRepository: mxsm/cheetah-string
Length of output: 934
Fix the UTF-8 safety invariant for ArcVecString and Bytes variants.
The unsafe conversions at lines 186-196 use from_utf8_unchecked assuming valid UTF-8, but public constructors violate this invariant:
from_vec(line 262) createsArcVecStringfrom arbitraryVec<u8>without validationfrom_arc_vec(line 318) createsArcVecStringfrom arbitraryArc<Vec<u8>>without validationfrom_bytes(line 346) createsBytesvariant from arbitrarybytes::Byteswithout validation
Calling to_string() or as_str() on these unvalidated instances triggers undefined behavior. The safety comments claiming "all constructors ensure this invariant" are false.
Either validate UTF-8 in these constructors (like try_from_vec does at line 288) or mark them unsafe to explicitly document the responsibility falls to callers.
🤖 Prompt for AI Agents
In src/cheetah_string.rs around lines 186-196, the unsafe use of
String::from_utf8_unchecked for ArcVecString and Bytes assumes a UTF-8 invariant
that is not actually enforced by the public constructors at lines 262
(from_vec), 318 (from_arc_vec) and 346 (from_bytes) (only try_from_vec at 288
validates); fix by either (A) validating input in those constructors (use
std::str::from_utf8 or String::from_utf8 and return Result or panic on invalid
UTF-8) so the unchecked conversions are safe, or (B) mark those constructors
unsafe and update docs/comments to require callers ensure valid UTF-8; pick one
approach and make it consistent across all constructors and any places that rely
on the invariant (e.g., as_str/to_string) so no unchecked conversion can be
reached with unvalidated data.
| InnerString::ArcVecString(s) => { | ||
| // SAFETY: ArcVecString is only created from validated UTF-8 sources. | ||
| // All constructors ensure this invariant is maintained. | ||
| unsafe { std::str::from_utf8_unchecked(s.as_ref()) } | ||
| } | ||
| #[cfg(feature = "bytes")] | ||
| InnerString::Bytes(b) => std::str::from_utf8(b.as_ref()).unwrap(), | ||
| InnerString::Bytes(b) => { | ||
| // SAFETY: Bytes variant is only created from validated UTF-8 sources. | ||
| // The from_bytes constructor ensures this invariant. | ||
| unsafe { std::str::from_utf8_unchecked(b.as_ref()) } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe code relies on violated invariant - undefined behavior risk.
The safety comments claim "ArcVecString is only created from validated UTF-8 sources" and "Bytes variant is only created from validated UTF-8 sources," but as flagged in other comments, the following constructors violate this invariant:
from_vec(line 262)from_arc_vec(line 318)from_bytes(line 346)
This makes the from_utf8_unchecked calls here unsound. The invariant must be maintained by ALL constructors, or this code will cause undefined behavior.
🤖 Prompt for AI Agents
In src/cheetah_string.rs around lines 357 to 367, the unsafe from_utf8_unchecked
calls rely on an invariant that some constructors do not maintain, making this
unsound; fix by ensuring the invariant is enforced or by removing the unsafe
conversion: either validate UTF-8 in every constructor that can produce
ArcVecString/Bytes (from_vec, from_arc_vec, from_bytes) and return a Result or
panic on invalid UTF-8, or replace the unchecked conversions here with safe
std::str::from_utf8 and propagate or handle the error; update the safety
comments to reflect the chosen strategy.
fix #72
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.