Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't include raiding units in Units::isActive; make getPosition return invalid coords #4959

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

Bumber64
Copy link
Contributor

@Bumber64 Bumber64 commented Sep 24, 2024

Raiding units are unflagged as inactive shortly after they leave the map, as they are removed from units.active. Scan this vector to rule out raiding units. The move_state and can_swap flags are also unset, which can be used to optimize the check.

Units::getPosition now returns invalid coords for inactive units. This impacts Units::isUnitInBox, and Units::getUnitsInBox.

It also impacts Items::getPosition, but an item in items.other.IN_PLAY has its holder is guaranteed to be active (artifact theft bugs notwithstanding).

Only iterate active units in spectate instead of all. (Mainly for efficiency. Units::isActive change would've caught any potential issue with raiding units.)

return false;
else if (unit->flags1.bits.move_state || unit->flags1.bits.can_swap)
return true; // These are always unset when leaving the map
return linear_index(world->units.active, &df::unit::id, unit->id) >= 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Units::isActive is used in many many places. I'm worried that adding an O(N) search for every call will be a performance killer. I'm also concerned that changing the behavior of this function will break existing callers, both Lua and C++. I'd like to see some research done on those two points before merging this PR.

Copy link
Contributor Author

@Bumber64 Bumber64 Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a check on my fort, and all of my units in units.active had at least one of inactive, move_state, or can_swap set. There was an inactive unit without the other two flags set that had its pos directly above my well. It's possible the combination could represent a ghost or climbing unit? Probably not going to be a performance consideration given how infrequently those occur. Then there's the actual raiding units, which you won't hit while iterating units.active, and for which the check is mandatory when iterating units.all.

I went through all the uses of isActive, getPosition, getUnitsInBox, and isUnitInBox I could find, as well as the inactive flag. There's no change in behavior to anything iterating the units.active vector (or indirectly through forCitizens/getCitizens). isDead is used instead of isActive for the case of actual dead units. Anything iterating units.all wasn't properly considering raiding units, which was just RemoteTools.cpp/spectate.

I don't fully understand plugins/preserve-rooms.cpp but it didn't look like the changes break anything. You'd know better than me.

Copy link
Member

@ab9rf ab9rf Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That call to linear_index is going to (potentially) evict the entire L1D cache and up to half of the L2 cache on the executing core (more on older processors with smaller L1 and L2 caches), which is a substantial performance penalty that I do not want in any code that is likely to be run once (or more than once) per tick. The current test requires reading one cache line (that is likely already hot); as revised this will require reading at least one line per unit in active (probably closer to two), which is a way larger performance cost.

That test is not acceptable in a high-velocity code path. If we need this information in a high-velocity code path, we need to figure out a way to cache it (on an intra-tick basis) in a manner that doesn't require thousands of scattered reads every time it's used.

For example, timestream iterates over the active unit list, calling isActive for each unit. This will make timestream much slower because each operation in timestream's adjust_units after the call to isActive will proceed with a cold L1 cache, forcing everything to be pulled down from (hopefully) L2 or L3. Given that the whole point of timestream is to increase performance, this is, to me, not acceptable.

I'm on board with offering this as a secondary Unit predicate, but changing the semantics and performance of isActive in this manner just isn't acceptable.

if (Units::isActive(unit) && !Units::isDead(unit) && active_unit_ids.contains(unit->id)) {
if (!Units::isDead(unit) && Units::isActive(unit) && active_unit_ids.contains(unit->id)) {
Copy link
Contributor Author

@Bumber64 Bumber64 Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved isActive after isDead, which is a simple flag check. Not sure how isActive compares to testing an unordered set. The latter is probably much faster for units that are actually raiding? In that case we should test isActive afterwards.

Or maybe switch to directly testing the inactive flag (since active_unit_ids already handles the raiding units). Just curious how it performs first, though, as a stress test.

@@ -584,7 +584,7 @@ static command_result ListUnits(color_ostream &stream,

if (in->scan_all())
{
auto &vec = df::unit::get_vector();
auto &vec = df::global::world->units.active;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a backwards-incompatible change to the API that should at least be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants