-
Notifications
You must be signed in to change notification settings - Fork 128
Optimize build plan parallelism for faster builds #906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize build plan parallelism for faster builds #906
Conversation
This change improves build planning performance through three key optimizations: 1. **Replace sequential asyncMap with true parallel execution** - ProductPlanner now uses concurrentMap with processor-based parallelism - Product plans are created in parallel instead of sequentially - Expected improvement: 10-20% in task planning phase 2. **Eliminate aggregationQueue serialization bottleneck** - Replaced serial queue with lock-based concurrent collection - ProductPlanResultContext is now thread-safe via internal lock - Task producers can now add tasks concurrently without queuing - Expected improvement: 15-25% in task aggregation phase 3. **Implement adaptive parallelism** - Dynamically adjust parallelism based on available CPU cores - Replaced hard-coded maximumParallelism values - Better utilization of multi-core systems Overall expected impact: 10-20% reduction in build planning time for typical projects, up to 35% for projects with many parallel targets. Thread-safety analysis: - All concurrent writes protected by Lock in ProductPlanResultContext - Proper read-write ordering ensures no races - Each context processed independently - All existing tests pass with no regressions
owenv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - I left a few comments. Can you share more detail on the projects/packages you used to benchmark these changes?
|
|
||
| package import SWBUtil | ||
| package import SWBCore | ||
| import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os module is Darwin-only and doesn't seem to be used here
| // Compute all of the deferred tasks (in parallel). | ||
| delegate.updateProgress(statusMessage: messageShortening == .full ? "Planning deferred tasks" : "Constructing deferred tasks", showInLog: false) | ||
| await TaskGroup.concurrentPerform(iterations: productPlanResultContexts.count, maximumParallelism: 10) { i in | ||
| await TaskGroup.concurrentPerform(iterations: productPlanResultContexts.count, maximumParallelism: mediumParallelism) { i in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximumParallelism is intended to put an upper bound on the number of active tasks rather than fan out as widely as possible, since we fan out at multiple levels of the build planning process. What kind of speedup do you see if this change and the one below are applied in isolation?
| productPlanResultContext.addPlannedTasks(tasks) | ||
| } | ||
| // Direct call - thread-safe via ProductPlanResultContext's internal lock | ||
| productPlanResultContext.addPlannedTasks(tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this from an asynchronous dispatch to a blocking call with internal locking means that the aggregation and task production are less pipelined here - is this change beneficial in isolation?
| private let targetName: String | ||
|
|
||
| /// Lock to protect concurrent access to mutable state | ||
| private let lock = Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the SWBMutex shim over Lock in all new code (and allow annotating ProductPlanResultContext as Sendable), but based on the other comment I'm not convinced this is an improvement over the existing queue
| // Create the plans themselves in parallel. | ||
| var productPlans = await globalProductPlan.allTargets.asyncMap { configuredTarget in | ||
| let maxParallelism = max(1, ProcessInfo.processInfo.activeProcessorCount) | ||
| var productPlans = await globalProductPlan.allTargets.concurrentMap(maximumParallelism: maxParallelism) { configuredTarget in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to switch over to concurrentMap, but like the other changes I'm not convinced it should scale with core count unless that change yields a meaningful speedup in isolation
While adding some logs to the codebase to debug slowness building a description graph with one of my Xcode projects, I thought I'd just check with an agentic coding tool if it could identify any opportunity to optimize any of the internals of the build process. It came back with 3 potential improvements that seem very legit, so I decided that I'd open a PR with those changes to discuss with all of you if those are valid improvements, or if they are not or have the potential to introduce regressions elsewhere.
Summary
This PR improves build planning performance through optimizations to parallelization and task aggregation, targeting 10-35% faster build planning times.
Motivation
While investigating build performance bottlenecks in SWBBuildService, I identified three areas where small changes could yield significant improvements:
asyncMapfunction was executing sequentially despite its nameaggregationQueueforced all parallel task producers to serialize through a single queuemaximumParallelism: 10were too conservative for modern multi-core systemsChanges
1. Replace sequential asyncMap with true parallel execution
Sources/SWBTaskConstruction/ProductPlanning/ProductPlanner.swiftasyncMapwithconcurrentMapusing processor-based parallelism2. Eliminate aggregationQueue serialization bottleneck
Sources/SWBTaskConstruction/ProductPlanning/BuildPlan.swiftaggregationQueueProductPlanResultContextthread-safe with internalLock3. Implement adaptive parallelism
Sources/SWBTaskConstruction/ProductPlanning/BuildPlan.swiftProcessInfo.processInfo.activeProcessorCountThread-Safety Analysis
The changes maintain correctness through:
ProductPlanResultContextare protected byLock.withLockoutputNodes→ Parallel deferred task production (writes) → wait for allplannedTasksfor validationProductPlanResultContextis independent and can be processed concurrentlyTesting
Files Changed
Sources/SWBTaskConstruction/ProductPlanning/ProductPlanner.swiftSources/SWBTaskConstruction/ProductPlanning/BuildPlan.swift