-
Notifications
You must be signed in to change notification settings - Fork 1
Implement pickle more #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis change implements enhanced Python pickle protocol support (protocol >= 2) by refactoring object state serialization logic in the core VM, adding getstate/setstate methods to five I/O classes for custom pickle handling, and introducing getnewargs_ex constant support for advanced object reconstruction. Changes
Sequence DiagramsequenceDiagram
participant Pickle as Pickle Protocol
participant Reduce as common_reduce
participant NewObj as reduce_newobj
participant GetState as object_getstate
participant Helpers as Helper Functions
Pickle->>Reduce: call reduce (protocol >= 2)
Reduce->>NewObj: call reduce_newobj
NewObj->>Helpers: get_new_arguments
Helpers-->>NewObj: (args, kwargs)
NewObj->>GetState: object_getstate
GetState->>Helpers: is_getstate_overridden?
alt User override exists
Helpers-->>GetState: call user __getstate__
else Use default
Helpers-->>GetState: call default implementation
end
GetState-->>NewObj: state
NewObj->>Helpers: get_items_iter
Helpers-->>NewObj: listitems, dictitems
NewObj-->>Reduce: (newobj, newargs, state, listitems, dictitems)
Reduce-->>Pickle: reduce tuple
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@crates/vm/src/builtins/object.rs`:
- Around line 637-647: The code calls the retrieved __getstate__ attribute
unconditionally, which fails if a class sets __getstate__ = None; modify
object_getstate to, after obtaining getstate via obj.get_attr(identifier!(vm,
__getstate__), vm) and before calling getstate.call(...), check whether the
attribute is None (using the PyObject/Value API available in this
module/context) and if so return the None state (e.g.
Ok(vm.ctx.none()))—otherwise proceed to call getstate; keep the existing
fallback to object_getstate_default and existing error propagation from get_attr
intact.
- Around line 210-229: The basicsize calculation for pickle currently omits
space for automatic weakref storage, causing types with a "__weakref__" slot to
falsely fail the size check; update the block in object.rs (around basicsize,
obj.class(), slot_names) to detect if the type provides weakref storage and add
core::mem::size_of::<PyObjectRef>() to basicsize when present—do this by
checking the type's slots for a "__weakref__" entry (e.g., inspect slot_names
for "__weakref__" or use the heap-type extension/heap type metadata on
obj.class() to detect automatic weakref support) rather than relying on a
non-existent PyTypeFlags::HAS_WEAKREF, then re-run the existing comparison
against obj.class().slots.basicsize.
In `@crates/vm/src/stdlib/io.rs`:
- Around line 4311-4334: __setstate__ currently overwrites zelf.buffer.write()
without honoring active exports, bypassing the resize guard used by
write/truncate; change the code to acquire the same resizable/export guard used
by the write/truncate paths before replacing the buffer: use zelf.buffer(vm)?
(the same accessor used below) to obtain the buffer handle that enforces
exports/resize safety (the guard used by write/truncate), then replace the inner
BufferedIO (BufferedIO::new(Cursor::new(...))) via that guarded handle instead
of directly calling zelf.buffer.write(), so active exports are checked and the
buffer protocol invariants are preserved.
- Around line 4106-4127: The __setstate__ implementation mutates zelf.buffer via
write() before calling zelf.buffer(vm)? which performs the closed-object
validation; change the flow to validate first by calling zelf.buffer(vm)? to
obtain the validated guard and then perform mutations through that guard
(replace the BufferedIO via the guard and seek using the validated guard),
removing the earlier direct write() call and ensuring no mutation occurs before
the closed check.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/vm/src/builtins/object.rscrates/vm/src/stdlib/io.rscrates/vm/src/vm/context.rs
🧰 Additional context used
🧬 Code graph analysis (2)
crates/vm/src/builtins/object.rs (2)
crates/vm/src/builtins/tuple.rs (4)
class(41-43)class(527-529)len(194-196)__getnewargs__(352-362)crates/vm/src/protocol/mapping.rs (1)
items(182-188)
crates/vm/src/stdlib/io.rs (1)
crates/vm/src/stdlib/itertools.rs (7)
__setstate__(101-130)__setstate__(558-567)__setstate__(645-654)__setstate__(961-976)__setstate__(1111-1118)__setstate__(1364-1386)__setstate__(1866-1873)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Collect regression test data
- GitHub Check: Collect benchmark data
- GitHub Check: Collect code coverage data
- GitHub Check: Collect what is left data
🔇 Additional comments (2)
crates/vm/src/vm/context.rs (1)
137-139: LGTM: interned magic name added cleanly.This keeps the const-name set aligned with pickle protocol helpers.
crates/vm/src/stdlib/io.rs (1)
160-163: LGTM: PyDict import is appropriate.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| if required { | ||
| let mut basicsize = obj.class().slots.basicsize; | ||
| // if obj.class().slots.dict_offset > 0 | ||
| // && !obj.class().slots.flags.has_feature(PyTypeFlags::MANAGED_DICT) | ||
| // { | ||
| // basicsize += std::mem::size_of::<PyObjectRef>(); | ||
| // } | ||
| // if obj.class().slots.weaklist_offset > 0 { | ||
| // basicsize += std::mem::size_of::<PyObjectRef>(); | ||
| // } | ||
| // Start with PyBaseObject_Type's basicsize | ||
| let mut basicsize = vm.ctx.types.object_type.slots.basicsize; | ||
|
|
||
| // Add __dict__ size if type has dict | ||
| if obj.class().slots.flags.has_feature(PyTypeFlags::HAS_DICT) { | ||
| basicsize += core::mem::size_of::<PyObjectRef>(); | ||
| } | ||
|
|
||
| // Add slots size | ||
| if let Some(ref slot_names) = slot_names { | ||
| basicsize += core::mem::size_of::<PyObjectRef>() * slot_names.__len__(); | ||
| } | ||
|
|
||
| // Fail if actual type's basicsize > expected basicsize | ||
| if obj.class().slots.basicsize > basicsize { | ||
| return Err( | ||
| vm.new_type_error(format!("cannot pickle {:.200} object", obj.class().name())) | ||
| vm.new_type_error(format!("cannot pickle '{}' object", obj.class().name())) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
Script executed:
#!/bin/bash
# Search for PyTypeFlags definitions and weakref-related code
rg -n "HAS_WEAKREF|HAS_DICT" crates/vm/src -g '*.rs' | head -20Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/vm/src/vm/context.rs:515: flags: PyTypeFlags::heap_type_flags() | PyTypeFlags::HAS_DICT,
crates/vm/src/vm/context.rs:628: class.slots.flags.contains(PyTypeFlags::HAS_DICT),
crates/vm/src/types/slot.rs:228: const HAS_DICT = 1 << 40;
crates/vm/src/object/payload.rs:66: let dict = if cls.slots.flags.has_feature(PyTypeFlags::HAS_DICT) {
crates/vm/src/object/core.rs:636: .has_feature(crate::types::PyTypeFlags::HAS_DICT)
crates/vm/src/exceptions.rs:577: flags(BASETYPE, HAS_DICT)
crates/vm/src/exception_group.rs:18: flags: PyTypeFlags::heap_type_flags() | PyTypeFlags::HAS_DICT,
crates/vm/src/class.rs:145: if Self::TP_FLAGS.has_feature(PyTypeFlags::HAS_DICT) {
crates/vm/src/builtins/function.rs:470: flags(HAS_DICT, METHOD_DESCRIPTOR)
crates/vm/src/builtins/module.rs:168:#[pyclass(with(GetAttr, Initializer, Representable), flags(BASETYPE, HAS_DICT))]
crates/vm/src/builtins/module.rs:195: .has_feature(crate::types::PyTypeFlags::HAS_DICT)
crates/vm/src/builtins/object.rs:69: // Only create dict if the class has HAS_DICT flag (i.e., __slots__ was not defined
crates/vm/src/builtins/object.rs:74: .has_feature(crate::types::PyTypeFlags::HAS_DICT)
crates/vm/src/builtins/object.rs:215: if obj.class().slots.flags.has_feature(PyTypeFlags::HAS_DICT) {
crates/vm/src/builtins/object.rs:447: |typ: &Py<PyType>| typ.slots.flags.has_feature(PyTypeFlags::HAS_DICT);
crates/vm/src/builtins/type.rs:302: // Inherit HAS_DICT from any base in MRO that has it
crates/vm/src/builtins/type.rs:306: .any(|b| b.slots.flags.has_feature(PyTypeFlags::HAS_DICT))
crates/vm/src/builtins/type.rs:308: slots.flags |= PyTypeFlags::HAS_DICT
crates/vm/src/builtins/type.rs:368: if base.slots.flags.has_feature(PyTypeFlags::HAS_DICT) {
crates/vm/src/builtins/type.rs:369: slots.flags |= PyTypeFlags::HAS_DICT
Script executed:
#!/bin/bash
# Search for PyTypeFlags enum definition
rg -n "pub struct PyTypeFlags|pub enum.*Flags|HAS_WEAKREF|HAS_DICT" crates/vm/src -g '*.rs' -B 2 -A 10 | head -50Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/vm/src/vm/context.rs-513- name: interned_name.as_str(),
crates/vm/src/vm/context.rs-514- basicsize: 0,
crates/vm/src/vm/context.rs:515: flags: PyTypeFlags::heap_type_flags() | PyTypeFlags::HAS_DICT,
crates/vm/src/vm/context.rs-516- ..PyTypeSlots::default()
crates/vm/src/vm/context.rs-517- };
crates/vm/src/vm/context.rs-518- PyType::new_heap(
crates/vm/src/vm/context.rs-519- name,
crates/vm/src/vm/context.rs-520- bases,
crates/vm/src/vm/context.rs-521- attrs,
crates/vm/src/vm/context.rs-522- slots,
crates/vm/src/vm/context.rs-523- self.types.type_type.to_owned(),
crates/vm/src/vm/context.rs-524- self,
crates/vm/src/vm/context.rs-525- )
--
crates/vm/src/vm/context.rs-626- pub fn new_base_object(&self, class: PyTypeRef, dict: Option<PyDictRef>) -> PyObjectRef {
crates/vm/src/vm/context.rs-627- debug_assert_eq!(
crates/vm/src/vm/context.rs:628: class.slots.flags.contains(PyTypeFlags::HAS_DICT),
crates/vm/src/vm/context.rs-629- dict.is_some()
crates/vm/src/vm/context.rs-630- );
crates/vm/src/vm/context.rs-631- PyRef::new_ref(object::PyBaseObject, class, dict).into()
crates/vm/src/vm/context.rs-632- }
crates/vm/src/vm/context.rs-633-
crates/vm/src/vm/context.rs-634- pub fn new_code(&self, code: impl code::IntoCodeObject) -> PyRef<PyCode> {
crates/vm/src/vm/context.rs-635- let code = code.into_code_object(self);
crates/vm/src/vm/context.rs-636- PyRef::new_ref(PyCode { code }, self.types.code_type.to_owned(), None)
crates/vm/src/vm/context.rs-637- }
crates/vm/src/vm/context.rs-638-}
--
crates/vm/src/types/slot.rs-213- #[derive(Copy, Clone, Debug, PartialEq)]
crates/vm/src/types/slot.rs-214- #[non_exhaustive]
crates/vm/src/types/slot.rs:215: pub struct PyTypeFlags: u64 {
crates/vm/src/types/slot.rs-216- const MANAGED_DICT = 1 << 4;
crates/vm/src/types/slot.rs-217- const SEQUENCE = 1 << 5;
crates/vm/src/types/slot.rs-218- const MAPPING = 1 << 6;
crates/vm/src/types/slot.rs-219- const DISALLOW_INSTANTIATION = 1 << 7;
crates/vm/src/types/slot.rs-220- const IMMUTABLETYPE = 1 << 8;
crates/vm/src/types/slot.rs-221- const HEAPTYPE = 1 << 9;
crates/vm/src/types/slot.rs-222- const BASETYPE = 1 << 10;
crates/vm/src/types/slot.rs-223- const METHOD_DESCRIPTOR = 1 << 17;
crates/vm/src/types/slot.rs-224- // For built-in types that match the subject itself in pattern matching
crates/vm/src/types/slot.rs-225- // (bool, int, float, str, bytes, bytearray, list, tuple, dict, set, frozenset)
crates/vm/src/types/slot.rs-226- // This is not a stable API
crates/vm/src/types/slot.rs-227- const _MATCH_SELF = 1 << 22;
crates/vm/src/types/slot.rs:228: const HAS_DICT = 1 << 40;
crates/vm/src/types/slot.rs-229-
crates/vm/src/types/slot.rs-230- #[cfg(debug_assertions)]
crates/vm/src/types/slot.rs-231- const _CREATED_WITH_FLAGS = 1 << 63;
crates/vm/src/types/slot.rs-232- }
crates/vm/src/types/slot.rs-233-}
crates/vm/src/types/slot.rs-234-
Script executed:
#!/bin/bash
# Look at the exact implementation in object.rs around lines 210-229
sed -n '200,240p' crates/vm/src/builtins/object.rsRepository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
// let state = object_get_dict(obj.clone(), obj.ctx()).unwrap();
let Some(state) = obj.dict() else {
return Ok(vm.ctx.none());
};
state.into()
};
let slot_names =
type_slot_names(obj.class(), vm).map_err(|_| vm.new_type_error("cannot pickle object"))?;
if required {
// Start with PyBaseObject_Type's basicsize
let mut basicsize = vm.ctx.types.object_type.slots.basicsize;
// Add __dict__ size if type has dict
if obj.class().slots.flags.has_feature(PyTypeFlags::HAS_DICT) {
basicsize += core::mem::size_of::<PyObjectRef>();
}
// Add slots size
if let Some(ref slot_names) = slot_names {
basicsize += core::mem::size_of::<PyObjectRef>() * slot_names.__len__();
}
// Fail if actual type's basicsize > expected basicsize
if obj.class().slots.basicsize > basicsize {
return Err(
vm.new_type_error(format!("cannot pickle '{}' object", obj.class().name()))
);
}
}
if let Some(slot_names) = slot_names {
let slot_names_len = slot_names.__len__();
if slot_names_len > 0 {
let slots = vm.ctx.new_dict();
for i in 0..slot_names_len {
let borrowed_names = slot_names.borrow_vec();
let name = borrowed_names[i].downcast_ref::<PyStr>().unwrap();
let Ok(value) = obj.get_attr(name, vm) else {
continue;
Script executed:
#!/bin/bash
# Search for copyreg._slotnames and how it handles __weakref__
rg -n "_slotnames|slotnames" crates -g '*.rs' -g '*.py' -A 3 -B 1Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/vm/src/vm/context.rs-219- __slots__,
crates/vm/src/vm/context.rs:220: __slotnames__,
crates/vm/src/vm/context.rs-221- __str__,
crates/vm/src/vm/context.rs-222- __sub__,
crates/vm/src/vm/context.rs-223- __subclasscheck__,
--
crates/vm/src/builtins/object.rs-163- // let attributes = typ.attributes.read();
crates/vm/src/builtins/object.rs:164: // if let Some(slot_names) = attributes.get(identifier!(vm.ctx, __slotnames__)) {
crates/vm/src/builtins/object.rs-165- // return match_class!(match slot_names.clone() {
crates/vm/src/builtins/object.rs-166- // l @ super::PyList => Ok(Some(l)),
crates/vm/src/builtins/object.rs-167- // _n @ super::PyNone => Ok(None),
crates/vm/src/builtins/object.rs-168- // _ => Err(vm.new_type_error(format!(
crates/vm/src/builtins/object.rs:169: // "{:.200}.__slotnames__ should be a list or None, not {:.200}",
crates/vm/src/builtins/object.rs-170- // typ.name(),
crates/vm/src/builtins/object.rs-171- // slot_names.class().name()
crates/vm/src/builtins/object.rs-172- // ))),
--
crates/vm/src/builtins/object.rs-176- let copyreg = vm.import("copyreg", 0)?;
crates/vm/src/builtins/object.rs:177: let copyreg_slotnames = copyreg.get_attr("_slotnames", vm)?;
crates/vm/src/builtins/object.rs:178: let slot_names = copyreg_slotnames.call((typ.to_owned(),), vm)?;
crates/vm/src/builtins/object.rs-179- let result = match_class!(match slot_names {
crates/vm/src/builtins/object.rs-180- l @ super::PyList => Some(l),
crates/vm/src/builtins/object.rs-181- _n @ super::PyNone => None,
crates/vm/src/builtins/object.rs:182: _ => return Err(vm.new_type_error("copyreg._slotnames didn't return a list or None")),
crates/vm/src/builtins/object.rs-183- });
crates/vm/src/builtins/object.rs-184- Ok(result)
crates/vm/src/builtins/object.rs-185-}
--
crates/vm/Lib/python_builtins/__reducelib.py-49- state = getattr(obj, "__dict__", None)
crates/vm/Lib/python_builtins/__reducelib.py:50: names = slotnames(cls) # not checking for list
crates/vm/Lib/python_builtins/__reducelib.py-51- if names is not None:
crates/vm/Lib/python_builtins/__reducelib.py-52- slots = {}
crates/vm/Lib/python_builtins/__reducelib.py-53- for name in names:
--
crates/vm/Lib/python_builtins/__reducelib.py-73-
crates/vm/Lib/python_builtins/__reducelib.py:74:def slotnames(cls):
crates/vm/Lib/python_builtins/__reducelib.py-75- if not isinstance(cls, type):
crates/vm/Lib/python_builtins/__reducelib.py-76- return None
crates/vm/Lib/python_builtins/__reducelib.py-77-
crates/vm/Lib/python_builtins/__reducelib.py-78- try:
crates/vm/Lib/python_builtins/__reducelib.py:79: return cls.__dict__["__slotnames__"]
crates/vm/Lib/python_builtins/__reducelib.py-80- except KeyError:
crates/vm/Lib/python_builtins/__reducelib.py-81- pass
crates/vm/Lib/python_builtins/__reducelib.py-82-
crates/vm/Lib/python_builtins/__reducelib.py:83: slotnames = copyreg._slotnames(cls)
crates/vm/Lib/python_builtins/__reducelib.py:84: if not isinstance(slotnames, list) and slotnames is not None:
crates/vm/Lib/python_builtins/__reducelib.py:85: raise TypeError("copyreg._slotnames didn't return a list or None")
crates/vm/Lib/python_builtins/__reducelib.py:86: return slotnames
Script executed:
#!/bin/bash
# Find the copyreg._slotnames implementation
rg -n "_slotnames" crates/vm/Lib -A 20 | head -80Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/vm/Lib/python_builtins/__reducelib.py:79: return cls.__dict__["__slotnames__"]
crates/vm/Lib/python_builtins/__reducelib.py-80- except KeyError:
crates/vm/Lib/python_builtins/__reducelib.py-81- pass
crates/vm/Lib/python_builtins/__reducelib.py-82-
crates/vm/Lib/python_builtins/__reducelib.py:83: slotnames = copyreg._slotnames(cls)
crates/vm/Lib/python_builtins/__reducelib.py-84- if not isinstance(slotnames, list) and slotnames is not None:
crates/vm/Lib/python_builtins/__reducelib.py:85: raise TypeError("copyreg._slotnames didn't return a list or None")
crates/vm/Lib/python_builtins/__reducelib.py-86- return slotnames
Script executed:
#!/bin/bash
# Look for copyreg module implementation
find crates -name "copyreg*" -o -name "*copyreg*"Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/vm/Lib/core_modules/copyreg.py
Script executed:
#!/bin/bash
# Check __reducelib.py for _slotnames and weakref handling
sed -n '1,150p' crates/vm/Lib/python_builtins/__reducelib.pyRepository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
# Modified from code from the PyPy project:
# https://bitbucket.org/pypy/pypy/src/default/pypy/objspace/std/objectobject.py
# The MIT License
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
import copyreg
def _abstract_method_error(typ):
methods = ", ".join(sorted(typ.__abstractmethods__))
err = "Can't instantiate abstract class %s with abstract methods %s"
raise TypeError(err % (typ.__name__, methods))
def reduce_2(obj):
cls = obj.__class__
try:
getnewargs = obj.__getnewargs__
except AttributeError:
args = ()
else:
args = getnewargs()
if not isinstance(args, tuple):
raise TypeError("__getnewargs__ should return a tuple")
try:
getstate = obj.__getstate__
except AttributeError:
state = getattr(obj, "__dict__", None)
names = slotnames(cls) # not checking for list
if names is not None:
slots = {}
for name in names:
try:
value = getattr(obj, name)
except AttributeError:
pass
else:
slots[name] = value
if slots:
state = state, slots
else:
state = getstate()
listitems = iter(obj) if isinstance(obj, list) else None
dictitems = iter(obj.items()) if isinstance(obj, dict) else None
newobj = copyreg.__newobj__
args2 = (cls,) + args
return newobj, args2, state, listitems, dictitems
def slotnames(cls):
if not isinstance(cls, type):
return None
try:
return cls.__dict__["__slotnames__"]
except KeyError:
pass
slotnames = copyreg._slotnames(cls)
if not isinstance(slotnames, list) and slotnames is not None:
raise TypeError("copyreg._slotnames didn't return a list or None")
return slotnames
Script executed:
#!/bin/bash
# Search for __weakref__ references in the codebase
rg -n "__weakref__" crates -g '*.rs' -g '*.py'Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
crates/doc/src/data.inc.rs:483: "_curses.error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:629: "_curses_panel.error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:691: "_dbm.error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:730: "_gdbm.error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:839: "_hashlib.UnsupportedDigestmodError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:912: "_interpchannels.ChannelClosedError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:937: "_interpchannels.ChannelEmptyError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:962: "_interpchannels.ChannelError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:1061: "_interpchannels.ChannelNotEmptyError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:1086: "_interpchannels.ChannelNotFoundError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:1934: "_lzma.LZMAError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:2235: "_pickle.PickleError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:2284: "_pickle.PicklingError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:2333: "_pickle.UnpicklingError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:2369: "_queue.Empty.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:3294: "_tkinter.TclError.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:3598: "binascii.Error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:3623: "binascii.Incomplete.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:4074: "builtins.DynamicClassAttribute.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:4203: "builtins.ExceptionGroup.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:9293: "termios.error.__weakref__" => "list of weak references to the object",
crates/doc/src/data.inc.rs:9558: "zlib.error.__weakref__" => "list of weak references to the object",
crates/vm/src/builtins/type.rs:1232: .any(|s| s.as_str() != "__dict__" && s.as_str() != "__weakref__");
crates/vm/src/builtins/type.rs:1261: // Check for duplicate __weakref__
crates/vm/src/builtins/type.rs:1262: if slot_name == "__weakref__" {
crates/vm/src/builtins/type.rs:1265: "__weakref__ slot disallowed: we already got one".to_owned(),
crates/vm/src/builtins/type.rs:1287: // Check if base class already has __weakref__ - can't redefine it
crates/vm/src/builtins/type.rs:1290: // 2. It's a heap type with __weakref__ in its __slots__
crates/vm/src/builtins/type.rs:1296: // Heap type with __slots__ - check if __weakref__ is in slots
crates/vm/src/builtins/type.rs:1297: Some(base_slots) => base_slots.iter().any(|s| s.as_str() == "__weakref__"),
crates/vm/src/builtins/type.rs:1300: // Builtin type - check if it has __weakref__ descriptor
crates/vm/src/builtins/type.rs:1301: let weakref_name = vm.ctx.intern_str("__weakref__");
crates/vm/src/builtins/type.rs:1307: "__weakref__ slot disallowed: we already got one".to_owned(),
Script executed:
#!/bin/bash
# Check if there's a native copyreg._slotnames implementation
cat crates/vm/Lib/core_modules/copyreg.py | head -200Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
"""Helper to provide extensibility for pickle.
This is only useful to add pickle support for extension types defined in
C, not for instances of user-defined classes.
"""
__all__ = ["pickle", "constructor",
"add_extension", "remove_extension", "clear_extension_cache"]
dispatch_table = {}
def pickle(ob_type, pickle_function, constructor_ob=None):
if not callable(pickle_function):
raise TypeError("reduction functions must be callable")
dispatch_table[ob_type] = pickle_function
# The constructor_ob function is a vestige of safe for unpickling.
# There is no reason for the caller to pass it anymore.
if constructor_ob is not None:
constructor(constructor_ob)
def constructor(object):
if not callable(object):
raise TypeError("constructors must be callable")
# Example: provide pickling support for complex numbers.
def pickle_complex(c):
return complex, (c.real, c.imag)
pickle(complex, pickle_complex, complex)
def pickle_union(obj):
import functools, operator
return functools.reduce, (operator.or_, obj.__args__)
pickle(type(int | str), pickle_union)
# Support for pickling new-style objects
def _reconstructor(cls, base, state):
if base is object:
obj = object.__new__(cls)
else:
obj = base.__new__(cls, state)
if base.__init__ != object.__init__:
base.__init__(obj, state)
return obj
_HEAPTYPE = 1<<9
_new_type = type(int.__new__)
# Python code for object.__reduce_ex__ for protocols 0 and 1
def _reduce_ex(self, proto):
assert proto < 2
cls = self.__class__
for base in cls.__mro__:
if hasattr(base, '__flags__') and not base.__flags__ & _HEAPTYPE:
break
new = base.__new__
if isinstance(new, _new_type) and new.__self__ is base:
break
else:
base = object # not really reachable
if base is object:
state = None
else:
if base is cls:
raise TypeError(f"cannot pickle {cls.__name__!r} object")
state = base(self)
args = (cls, base, state)
try:
getstate = self.__getstate__
except AttributeError:
if getattr(self, "__slots__", None):
raise TypeError(f"cannot pickle {cls.__name__!r} object: "
f"a class that defines __slots__ without "
f"defining __getstate__ cannot be pickled "
f"with protocol {proto}") from None
try:
dict = self.__dict__
except AttributeError:
dict = None
else:
if (type(self).__getstate__ is object.__getstate__ and
getattr(self, "__slots__", None)):
raise TypeError("a class that defines __slots__ without "
"defining __getstate__ cannot be pickled")
dict = getstate()
if dict:
return _reconstructor, args, dict
else:
return _reconstructor, args
# Helper for __reduce_ex__ protocol 2
def __newobj__(cls, *args):
return cls.__new__(cls, *args)
def __newobj_ex__(cls, args, kwargs):
"""Used by pickle protocol 4, instead of __newobj__ to allow classes with
keyword-only arguments to be pickled correctly.
"""
return cls.__new__(cls, *args, **kwargs)
def _slotnames(cls):
"""Return a list of slot names for a given class.
This needs to find slots defined by the class and its bases, so we
can't simply return the __slots__ attribute. We must walk down
the Method Resolution Order and concatenate the __slots__ of each
class found there. (This assumes classes don't modify their
__slots__ attribute to misrepresent their slots after the class is
defined.)
"""
# Get the value from a cache in the class if possible
names = cls.__dict__.get("__slotnames__")
if names is not None:
return names
# Not cached -- calculate the value
names = []
if not hasattr(cls, "__slots__"):
# This class has no slots
pass
else:
# Slots found -- gather slot names from all base classes
for c in cls.__mro__:
if "__slots__" in c.__dict__:
slots = c.__dict__['__slots__']
# if class has a single slot, it can be given as a string
if isinstance(slots, str):
slots = (slots,)
for name in slots:
# special descriptors
if name in ("__dict__", "__weakref__"):
continue
# mangled names
elif name.startswith('__') and not name.endswith('__'):
stripped = c.__name__.lstrip('_')
if stripped:
names.append('_%s%s' % (stripped, name))
else:
names.append(name)
else:
names.append(name)
# Cache the outcome in the class if at all possible
try:
cls.__slotnames__ = names
except:
pass # But don't die if we can't
return names
# A registry of extension codes. This is an ad-hoc compression
# mechanism. Whenever a global reference to <module>, <name> is about
# to be pickled, the (<module>, <name>) tuple is looked up here to see
# if it is a registered extension code for it. Extension codes are
# universal, so that the meaning of a pickle does not depend on
# context. (There are also some codes reserved for local use that
# don't have this restriction.) Codes are positive ints; 0 is
# reserved.
_extension_registry = {} # key -> code
_inverted_registry = {} # code -> key
_extension_cache = {} # code -> object
# Don't ever rebind those names: pickling grabs a reference to them when
# it's initialized, and won't see a rebinding.
def add_extension(module, name, code):
"""Register an extension code."""
code = int(code)
if not 1 <= code <= 0x7fffffff:
raise ValueError("code out of range")
key = (module, name)
if (_extension_registry.get(key) == code and
_inverted_registry.get(code) == key):
return # Redundant registrations are benign
if key in _extension_registry:
raise ValueError("key %s is already registered with code %s" %
(key, _extension_registry[key]))
if code in _inverted_registry:
raise ValueError("code %s is already in use for key %s" %
(code, _inverted_registry[code]))
_extension_registry[key] = code
_inverted_registry[code] = key
def remove_extension(module, name, code):
"""Unregister an extension code. For testing only."""
key = (module, name)
if (_extension_registry.get(key) != code or
_inverted_registry.get(code) != key):
raise ValueError("key %s is not registered with code %s" %
(key, code))
del _extension_registry[key]
del _inverted_registry[code]
if code in _extension_cache:
Script executed:
#!/bin/bash
# Search for where weakref descriptor/slot is actually stored/handled
rg -n "weakref" crates/vm/src/builtins/type.rs -B 2 -A 2 | head -100Repository: youknowone/RustPython
Repository: youknowone/RustPython
Exit code: 0
stdout:
347- new_type.init_slots(ctx);
348-
349: let weakref_type = super::PyWeak::static_type();
350- for base in new_type.bases.read().iter() {
351- base.subclasses.write().push(
352- new_type
353- .as_object()
354: .downgrade_with_weakref_typ_opt(None, weakref_type.to_owned())
355- .unwrap(),
356- );
--
403- Self::set_new(&new_type.slots, &new_type.base);
404-
405: let weakref_type = super::PyWeak::static_type();
406- for base in new_type.bases.read().iter() {
407- base.subclasses.write().push(
408- new_type
409- .as_object()
410: .downgrade_with_weakref_typ_opt(None, weakref_type.to_owned())
411- .unwrap(),
412- );
--
758-
759- // Register this type as a subclass of its new bases
760: let weakref_type = super::PyWeak::static_type();
761- for base in zelf.bases.read().iter() {
762- base.subclasses.write().push(
763- zelf.as_object()
764: .downgrade_with_weakref_typ_opt(None, weakref_type.to_owned())
765- .unwrap(),
766- );
--
1227- // Check if base has itemsize > 0 - can't add arbitrary slots to variable-size types
1228- // Types like int, bytes, tuple have itemsize > 0 and don't allow custom slots
1229: // But types like weakref.ref have itemsize = 0 and DO allow slots
1230- let has_custom_slots = slots
1231- .iter()
1232: .any(|s| s.as_str() != "__dict__" && s.as_str() != "__weakref__");
1233- if has_custom_slots && base.slots.itemsize > 0 {
1234- return Err(vm.new_type_error(format!(
--
1240- // Validate slot names and track duplicates
1241- let mut seen_dict = false;
1242: let mut seen_weakref = false;
1243- for slot in slots.iter() {
1244- // Use isidentifier for validation (handles Unicode properly)
--
1259- }
1260-
1261: // Check for duplicate __weakref__
1262: if slot_name == "__weakref__" {
1263: if seen_weakref {
1264- return Err(vm.new_type_error(
1265: "__weakref__ slot disallowed: we already got one".to_owned(),
1266- ));
1267- }
1268: seen_weakref = true;
1269- }
1270-
--
1285- }
1286-
1287: // Check if base class already has __weakref__ - can't redefine it
1288: // A base has weakref support if:
1289: // 1. It's a heap type without explicit __slots__ (automatic weakref), OR
1290: // 2. It's a heap type with __weakref__ in its __slots__
1291: if seen_weakref {
1292: let base_has_weakref = if let Some(ref ext) = base.heaptype_ext {
1293- match &ext.slots {
1294: // Heap type without __slots__ - has automatic weakref
1295- None => true,
1296: // Heap type with __slots__ - check if __weakref__ is in slots
1297: Some(base_slots) => base_slots.iter().any(|s| s.as_str() == "__weakref__"),
1298- }
1299- } else {
1300: // Builtin type - check if it has __weakref__ descriptor
1301: let weakref_name = vm.ctx.intern_str("__weakref__");
1302: base.attributes.read().contains_key(weakref_name)
1303- };
1304-
1305: if base_has_weakref {
1306- return Err(vm.new_type_error(
1307: "__weakref__ slot disallowed: we already got one".to_owned(),
1308- ));
1309- }
Account for weakref storage in the basicsize check.
Line 215: the expected basicsize only adds dict + slot entries. However, __weakref__ is excluded from copyreg._slotnames (see copyreg.py line 144), so types with weakref support may incorrectly fail the basicsize > expected check.
Note: The proposed fix cannot use PyTypeFlags::HAS_WEAKREF since this flag does not exist in RustPython. A correct fix would instead need to check if the type's __slots__ contains "__weakref__" (via the heaptype_ext if available) or detect automatic weakref support on heap types. Consider examining how CPython's pickle handles this case and whether the basicsize calculation should account for it separately.
🤖 Prompt for AI Agents
In `@crates/vm/src/builtins/object.rs` around lines 210 - 229, The basicsize
calculation for pickle currently omits space for automatic weakref storage,
causing types with a "__weakref__" slot to falsely fail the size check; update
the block in object.rs (around basicsize, obj.class(), slot_names) to detect if
the type provides weakref storage and add core::mem::size_of::<PyObjectRef>() to
basicsize when present—do this by checking the type's slots for a "__weakref__"
entry (e.g., inspect slot_names for "__weakref__" or use the heap-type
extension/heap type metadata on obj.class() to detect automatic weakref support)
rather than relying on a non-existent PyTypeFlags::HAS_WEAKREF, then re-run the
existing comparison against obj.class().slots.basicsize.
| /// object_getstate - calls __getstate__ method or default implementation | ||
| fn object_getstate(obj: &PyObject, required: bool, vm: &VirtualMachine) -> PyResult { | ||
| // If __getstate__ is not overridden, use the default implementation with required flag | ||
| if !is_getstate_overridden(obj, vm) { | ||
| return object_getstate_default(obj, required, vm); | ||
| } | ||
|
|
||
| // __getstate__ is overridden, call it without required | ||
| let getstate = obj.get_attr(identifier!(vm, __getstate__), vm)?; | ||
| getstate.call((), vm) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against __getstate__ = None before calling.
Line 645: if a class sets __getstate__ = None, is_getstate_overridden() returns true and this path attempts to call None, raising a TypeError. The prior behavior returned None state.
🐛 Proposed fix
// __getstate__ is overridden, call it without required
let getstate = obj.get_attr(identifier!(vm, __getstate__), vm)?;
- getstate.call((), vm)
+ if vm.is_none(&getstate) {
+ return Ok(vm.ctx.none());
+ }
+ getstate.call((), vm)🤖 Prompt for AI Agents
In `@crates/vm/src/builtins/object.rs` around lines 637 - 647, The code calls the
retrieved __getstate__ attribute unconditionally, which fails if a class sets
__getstate__ = None; modify object_getstate to, after obtaining getstate via
obj.get_attr(identifier!(vm, __getstate__), vm) and before calling
getstate.call(...), check whether the attribute is None (using the
PyObject/Value API available in this module/context) and if so return the None
state (e.g. Ok(vm.ctx.none()))—otherwise proceed to call getstate; keep the
existing fallback to object_getstate_default and existing error propagation from
get_attr intact.
| fn __setstate__(zelf: PyRef<Self>, state: PyTupleRef, vm: &VirtualMachine) -> PyResult<()> { | ||
| if state.len() != 4 { | ||
| return Err(vm.new_type_error(format!( | ||
| "__setstate__ argument should be 4-tuple, got {}", | ||
| state.len() | ||
| ))); | ||
| } | ||
|
|
||
| let content: PyStrRef = state[0].clone().try_into_value(vm)?; | ||
| // state[1] is newline - TODO: use when newline handling is implemented | ||
| let pos: u64 = state[2].clone().try_into_value(vm)?; | ||
| let dict = &state[3]; | ||
|
|
||
| // Set content | ||
| let raw_bytes = content.as_bytes().to_vec(); | ||
| *zelf.buffer.write() = BufferedIO::new(Cursor::new(raw_bytes)); | ||
|
|
||
| // Set position | ||
| zelf.buffer(vm)? | ||
| .seek(SeekFrom::Start(pos)) | ||
| .map_err(|err| os_err(vm, err))?; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid mutating closed StringIO before validation.
Line 4120 resets the buffer via write() before buffer(vm)? enforces the closed check, so a closed object can be partially mutated even though the call later errors. Prefer validating once and performing mutation through the validated guard.
🛠️ Suggested fix
- // Set content
- let raw_bytes = content.as_bytes().to_vec();
- *zelf.buffer.write() = BufferedIO::new(Cursor::new(raw_bytes));
-
- // Set position
- zelf.buffer(vm)?
- .seek(SeekFrom::Start(pos))
- .map_err(|err| os_err(vm, err))?;
+ // Validate closed state and update buffer atomically
+ let mut buffer = zelf.buffer(vm)?;
+ let raw_bytes = content.as_bytes().to_vec();
+ *buffer = BufferedIO::new(Cursor::new(raw_bytes));
+ buffer
+ .seek(SeekFrom::Start(pos))
+ .map_err(|err| os_err(vm, err))?;
+ drop(buffer);🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 4106 - 4127, The __setstate__
implementation mutates zelf.buffer via write() before calling zelf.buffer(vm)?
which performs the closed-object validation; change the flow to validate first
by calling zelf.buffer(vm)? to obtain the validated guard and then perform
mutations through that guard (replace the BufferedIO via the guard and seek
using the validated guard), removing the earlier direct write() call and
ensuring no mutation occurs before the closed check.
| #[pymethod] | ||
| fn __setstate__(zelf: PyRef<Self>, state: PyTupleRef, vm: &VirtualMachine) -> PyResult<()> { | ||
| if zelf.closed.load() { | ||
| return Err(vm.new_value_error("__setstate__ on closed file")); | ||
| } | ||
| if state.len() != 3 { | ||
| return Err(vm.new_type_error(format!( | ||
| "__setstate__ argument should be 3-tuple, got {}", | ||
| state.len() | ||
| ))); | ||
| } | ||
|
|
||
| let content: PyBytesRef = state[0].clone().try_into_value(vm)?; | ||
| let pos: u64 = state[1].clone().try_into_value(vm)?; | ||
| let dict = &state[2]; | ||
|
|
||
| // Set content | ||
| *zelf.buffer.write() = BufferedIO::new(Cursor::new(content.as_bytes().to_vec())); | ||
|
|
||
| // Set position | ||
| zelf.buffer(vm)? | ||
| .seek(SeekFrom::Start(pos)) | ||
| .map_err(|err| os_err(vm, err))?; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respect BytesIO export safety when restoring state.
Line 4328 overwrites the buffer without checking active exports. This bypasses the existing resize guard (exports) and can violate buffer protocol expectations. Use the same resizable guard used by write/truncate.
🛠️ Suggested fix
- // Set content
- *zelf.buffer.write() = BufferedIO::new(Cursor::new(content.as_bytes().to_vec()));
-
- // Set position
- zelf.buffer(vm)?
- .seek(SeekFrom::Start(pos))
- .map_err(|err| os_err(vm, err))?;
+ // Set content (respect export guards)
+ let mut buffer = zelf.try_resizable(vm)?;
+ *buffer = BufferedIO::new(Cursor::new(content.as_bytes().to_vec()));
+ buffer
+ .seek(SeekFrom::Start(pos))
+ .map_err(|err| os_err(vm, err))?;
+ drop(buffer);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[pymethod] | |
| fn __setstate__(zelf: PyRef<Self>, state: PyTupleRef, vm: &VirtualMachine) -> PyResult<()> { | |
| if zelf.closed.load() { | |
| return Err(vm.new_value_error("__setstate__ on closed file")); | |
| } | |
| if state.len() != 3 { | |
| return Err(vm.new_type_error(format!( | |
| "__setstate__ argument should be 3-tuple, got {}", | |
| state.len() | |
| ))); | |
| } | |
| let content: PyBytesRef = state[0].clone().try_into_value(vm)?; | |
| let pos: u64 = state[1].clone().try_into_value(vm)?; | |
| let dict = &state[2]; | |
| // Set content | |
| *zelf.buffer.write() = BufferedIO::new(Cursor::new(content.as_bytes().to_vec())); | |
| // Set position | |
| zelf.buffer(vm)? | |
| .seek(SeekFrom::Start(pos)) | |
| .map_err(|err| os_err(vm, err))?; | |
| #[pymethod] | |
| fn __setstate__(zelf: PyRef<Self>, state: PyTupleRef, vm: &VirtualMachine) -> PyResult<()> { | |
| if zelf.closed.load() { | |
| return Err(vm.new_value_error("__setstate__ on closed file")); | |
| } | |
| if state.len() != 3 { | |
| return Err(vm.new_type_error(format!( | |
| "__setstate__ argument should be 3-tuple, got {}", | |
| state.len() | |
| ))); | |
| } | |
| let content: PyBytesRef = state[0].clone().try_into_value(vm)?; | |
| let pos: u64 = state[1].clone().try_into_value(vm)?; | |
| let dict = &state[2]; | |
| // Set content (respect export guards) | |
| let mut buffer = zelf.try_resizable(vm)?; | |
| *buffer = BufferedIO::new(Cursor::new(content.as_bytes().to_vec())); | |
| buffer | |
| .seek(SeekFrom::Start(pos)) | |
| .map_err(|err| os_err(vm, err))?; | |
| drop(buffer); |
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 4311 - 4334, __setstate__ currently
overwrites zelf.buffer.write() without honoring active exports, bypassing the
resize guard used by write/truncate; change the code to acquire the same
resizable/export guard used by the write/truncate paths before replacing the
buffer: use zelf.buffer(vm)? (the same accessor used below) to obtain the buffer
handle that enforces exports/resize safety (the guard used by write/truncate),
then replace the inner BufferedIO (BufferedIO::new(Cursor::new(...))) via that
guarded handle instead of directly calling zelf.buffer.write(), so active
exports are checked and the buffer protocol invariants are preserved.
* Updated urllib + urllib tests * Updated http + test * Annotated failing tests * Fixed issues in httpservers, robotparser and urllib2net * Fixed windows only success * Annotated flaky test in test_logging
* Add all other bytecodes * Mark passing/failing tests
* correct buitins type * Auto-format: ruff format --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
✏️ Tip: You can customize this high-level summary in your review settings.