Skip to content

Conversation

@mg12
Copy link
Member

@mg12 mg12 commented Dec 12, 2025

When the VM fails to claim pages on a single node:

  • Before this change: incorrect: the VM ends up with the memory spread across nodes, but the VCPUs affine to the PCPUs of a single node.
  • After this change: correct: the VM ends up with both the memory spread across nodes and the VCPUs affine to the PCPUs of all nodes.
2025-12-11T02:24:19.613750+00:00 genus-34-29d xenopsd-xc: [debug||41 |VM.pool_migrate R:6973ccb34e76|softaffinity] 
Allocated resources: { "affinity" =\x0A   [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; \x0A    19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; \x0A    36; 37; 38; 39];\x0A  "cores" = 20;\x0A  "memfree" = 63555600384 }
2025-12-11T02:24:19.613835+00:00 genus-34-29d xenopsd-xc: [ info||41 |VM.pool_migrate R:6973ccb34e76|xenops] 
Domain.numa_placement.(fun): unable to claim enough memory, domain 4 won't be hosted in a single NUMA node. (error Cannot allocate memory)
2025-12-11T02:24:19.613881+00:00 genus-34-29d xenopsd-xc: [debug||41 |VM.pool_migrate R:6973ccb34e76|xenops] 
Domain.numa_placement.(fun).reset_affinity: resetting vcpu affinity for domain 4 to all CPUs: [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; \x0A 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; \x0A 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; \x0A 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; \x0A 71; 72; 73; 74; 75; 76; 77; 78; 79]

A debug line is added to verify that the reset of the vcpu affinity occurred as expected.

(Fmt.to_to_string CPUSet.pp_dump all_cpus) ;
let all_cpus_mask = CPUSet.to_mask all_cpus in
for i = 0 to vcpus - 1 do
set_affinity affinity xcext domid i all_cpus_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost the same code appears above (calling set_affiniity for all vcpus)

__FUNCTION__ domid
Unix.(error_message errno) ;
None
) |> on_none reset_affinity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work to invert the order operations?
instead of setting the affinity, then claiming, then resetting the affinity: can we claim first, and if successful then set the affinity?
(Unless the claiming code in Xen looks at affinity?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psafont do you remember if there's any requirement in setting the vcpu affinity before claiming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that setting the affinity can fail. Therefore it make sense that the claims are done before setting the affinity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants