-
Notifications
You must be signed in to change notification settings - Fork 8
Parallelize Antora component build #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gpx1000
wants to merge
12
commits into
KhronosGroup:main
Choose a base branch
from
gpx1000:update-antora
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
43b7e76
Revert "Update dependencies and fix web manifest path"
gpx1000 2ad2f45
Update @antora/lunr-extension to use Khronos Group repository
gpx1000 d4f6764
Add sources-parallel Antora extension scaffold
gpx1000 c6a29c8
Introduce `antora-sources-parallel` extension
gpx1000 e57b1bd
Add `antora:ci` script and optimize CI caching for parallel builds
gpx1000 1f29702
Enhance `antora-sources-parallel` with path rebasing, Lunr optimizati…
gpx1000 ea8fab3
Expand `antora-sources-parallel` to shard sources by `start_paths` fo…
gpx1000 a227eff
Add configurable Antora cache directory and optimize Lunr indexing
gpx1000 8175048
Add global Lunr index builder to `antora-sources-parallel` extension
gpx1000 2b0ec3a
Optimize Lunr indexing with conservative IO concurrency and memory li…
gpx1000 df476b8
Further reduce memory and CPU usage during Lunr indexing
gpx1000 0d76b41
Increase V8 heap for Lunr indexing during CI builds
gpx1000 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule antora-ui-khronos
updated
68 files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| # antora-sources-parallel | ||
|
|
||
| Antora extension to enable parallel-friendly builds. Phase 1 provides safe concurrency hints; a future phase can implement per-source fan-out. | ||
|
|
||
| ## Install / Use (local) | ||
|
|
||
| In your Antora playbook: | ||
|
|
||
| ```yaml | ||
| antora: | ||
| extensions: | ||
| - require: ./extensions/antora-sources-parallel | ||
| sources_parallel: true | ||
| # Optional tuning: | ||
| # min_workers: 3 | ||
| # max_workers: 8 | ||
| ``` | ||
|
|
||
| This extension computes a worker count based on your CPU core count (or `min_workers`, default 3), and sets the following environment variables if not already set: | ||
| - `ANTORA_FETCH_CONCURRENCY` | ||
| - `ANTORA_CONCURRENCY` | ||
| - `ANTORA_SOURCES_PARALLEL_WORKERS` | ||
|
|
||
| These hints can be used by Antora and cooperating extensions to perform work in parallel. | ||
|
|
||
| ## As a separate package | ||
|
|
||
| When extracted to its own repository and published, you can use: | ||
|
|
||
| ```yaml | ||
| antora: | ||
| extensions: | ||
| - require: antora-sources-parallel | ||
| sources_parallel: true | ||
| ``` | ||
|
|
||
| ## Experimental features | ||
|
|
||
| - `experimental_fanout: true` is reserved for a future release supporting per-source fan-out. It is currently not implemented and ignored aside from a warning. | ||
|
|
||
| ## License | ||
|
|
||
| Apache-2.0 | ||
|
|
||
|
|
||
| ## Per-source fan-out (experimental) | ||
|
|
||
| This package includes an optional fan-out orchestrator that builds each content source in parallel and merges outputs. | ||
|
|
||
| How to use locally: | ||
|
|
||
| - Ensure you are in the docs-site directory and have installed dependencies (npm install) | ||
| - Run: | ||
|
|
||
| ``` | ||
| npm run antora-fanout | ||
| ``` | ||
|
|
||
| What it does: | ||
| - Splits the playbook’s content.sources into separate temporary playbooks | ||
| - Runs `npx antora` for each in parallel (workers derived from CPU count or env), outputting to build/.fanout/<idx> | ||
| - Merges all shard outputs into build/site | ||
|
|
||
| Tuning: | ||
| - Set one of these env vars to control concurrency: ANTORA_SOURCES_PARALLEL_WORKERS, ANTORA_CONCURRENCY, ANTORA_FETCH_CONCURRENCY | ||
|
|
||
| Notes: | ||
| - The standard build (npx antora antora-playbook.yml) remains unchanged; fan-out is opt-in | ||
| - Collisions in generated files are resolved "last writer wins" during merge | ||
|
|
||
| ## CI usage and performance tips | ||
|
|
||
| - Fast path in CI: run the parallel fan-out without Lunr indexing. | ||
| - From docs-site/: npm run antora:ci | ||
| - Equivalent: node ./extensions/antora-sources-parallel/bin/fanout.js antora-playbook.yml --no-lunr | ||
| - You can also set ANTORA_NO_LUNR=1 instead of passing --no-lunr. | ||
|
|
||
| - Concurrency tuning: | ||
| - ANTORA_SOURCES_PARALLEL_WORKERS: hard limit for shard workers. | ||
| - ANTORA_CONCURRENCY / ANTORA_FETCH_CONCURRENCY: generic hints also used by some tools. | ||
| - Default worker count is max(cpu cores, 3). | ||
|
|
||
| - Caching: | ||
| - The extension sets ANTORA_CACHE_DIR to build/.cache if not already set. | ||
| - Configure your CI to cache the docs-site/build/.cache directory between runs to reduce repeated work. | ||
|
|
||
| - Release builds (with search index): | ||
| - Use npm run antora-fanout to keep Lunr enabled while still running per-source in parallel. | ||
| - Or run the standard Antora command if you prefer the traditional single-process path. | ||
|
|
||
| - No Makefile changes required: | ||
| - Keep Makefile as-is; point CI to run the npm script from docs-site instead. |
183 changes: 183 additions & 0 deletions
183
docs-site/extensions/antora-sources-parallel/bin/build-lunr-from-site.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| #!/usr/bin/env node | ||
| /* | ||
| * Fast Lunr indexer: builds a global search index directly from the merged Antora site HTML. | ||
| * | ||
| * Usage: | ||
| * node build-lunr-from-site.js <siteDir> | ||
| * | ||
| * Output: | ||
| * <siteDir>/search-index.json | ||
| * <siteDir>/search-index.js (assigns window.searchIndex = { ... }) | ||
| */ | ||
|
|
||
| const fs = require('fs') | ||
| const fsp = fs.promises | ||
| const path = require('path') | ||
| const os = require('os') | ||
| const lunr = require('lunr') | ||
|
|
||
| function cpuWorkers (min = 3) { | ||
| const cores = Array.isArray(os.cpus()) ? os.cpus().length : 1 | ||
| return Math.max(min, cores || 1) | ||
| } | ||
|
|
||
| function getWorkers () { | ||
| const envKeys = ['ANTORA_LUNR_IO_WORKERS', 'ANTORA_SOURCES_PARALLEL_WORKERS', 'ANTORA_CONCURRENCY', 'ANTORA_FETCH_CONCURRENCY'] | ||
| for (const k of envKeys) { | ||
| const v = process.env[k] | ||
| if (v && +v > 0) return +v | ||
| } | ||
| // Default to ultra-conservative IO concurrency to minimize memory pressure | ||
| return 1 | ||
| } | ||
|
|
||
| function getMaxCharsPerPage () { | ||
| const v = process.env.ANTORA_LUNR_MAX_CHARS | ||
| if (v && +v > 0) return +v | ||
| // Tighter upper bound to avoid pathological pages blowing memory | ||
| return 60000 | ||
| } | ||
|
|
||
| function stripHtml (html) { | ||
| // Remove script/style contents | ||
| html = html.replace(/<script[\s\S]*?<\/script>/gi, ' ').replace(/<style[\s\S]*?<\/style>/gi, ' ') | ||
| // Replace tags with spaces | ||
| html = html.replace(/<[^>]+>/g, ' ') | ||
| // Decode a few common entities | ||
| html = html.replace(/ /g, ' ').replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>') | ||
| // Collapse whitespace | ||
| return html.replace(/\s+/g, ' ').trim() | ||
| } | ||
|
|
||
| async function readFileSafe (file) { | ||
| try { | ||
| return await fsp.readFile(file, 'utf8') | ||
| } catch (e) { | ||
| return '' | ||
| } | ||
| } | ||
|
|
||
| function extractTitle (html, fallback) { | ||
| const h1 = html.match(/<h1[^>]*>([\s\S]*?)<\/h1>/i) | ||
| if (h1 && h1[1]) return stripHtml(h1[1]) | ||
| const title = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i) | ||
| if (title && title[1]) return stripHtml(title[1]) | ||
| return fallback || '' | ||
| } | ||
|
|
||
| async function listHtmlFiles (dir) { | ||
| const out = [] | ||
| async function walk (d) { | ||
| const entries = await fsp.readdir(d, { withFileTypes: true }) | ||
| for (const ent of entries) { | ||
| const p = path.join(d, ent.name) | ||
| if (ent.isDirectory()) { | ||
| // skip some known non-page dirs if present | ||
| if (ent.name === '_') continue | ||
| await walk(p) | ||
| } else if (ent.isFile()) { | ||
| if (p.endsWith('.html')) out.push(p) | ||
| } | ||
| } | ||
| } | ||
| await walk(dir) | ||
| return out | ||
| } | ||
|
|
||
| function toSiteUrl (siteDir, filePath) { | ||
| // Convert absolute file path back to site-relative URL | ||
| const rel = path.relative(siteDir, filePath) | ||
| let url = rel.replace(/\\/g, '/') | ||
| // Ensure it starts with a slash for UI expectations | ||
| if (!url.startsWith('/')) url = '/' + url | ||
| return url | ||
| } | ||
|
|
||
| async function buildIndex (siteDir) { | ||
| let files = await listHtmlFiles(siteDir) | ||
| // Index latest-only by default if such pages exist | ||
| const hasLatest = files.some((f) => f.includes(`${path.sep}latest${path.sep}`) || f.includes('/latest/')) | ||
| if (hasLatest) { | ||
| files = files.filter((f) => f.includes(`${path.sep}latest${path.sep}`) || f.includes('/latest/')) | ||
| } | ||
| const ioWorkers = getWorkers() | ||
| const maxChars = getMaxCharsPerPage() | ||
|
|
||
| // Minimal doc metadata to ship with the index | ||
| const docMeta = [] | ||
|
|
||
| // Prepare a Lunr builder so we can add docs incrementally (low memory) | ||
| const builder = new lunr.Builder() | ||
| builder.ref('id') | ||
| builder.field('title', { boost: 10 }) | ||
| builder.field('text') | ||
|
|
||
| // Optional simplified pipeline to reduce memory/CPU | ||
| const simplePipeline = (process.env.ANTORA_LUNR_SIMPLE_PIPELINE || '1') === '1' | ||
| if (simplePipeline) { | ||
| // Remove heavy stemming/stopword filters; keep minimal trimmer | ||
| builder.pipeline.reset() | ||
| builder.searchPipeline.reset() | ||
| if (lunr.trimmer) { | ||
| builder.pipeline.add(lunr.trimmer) | ||
| builder.searchPipeline.add(lunr.trimmer) | ||
| } | ||
| } | ||
|
|
||
| // Process files in small concurrent batches for IO, but add to index immediately | ||
| let idxNext = 0 | ||
| let inFlight = 0 | ||
| await new Promise((resolve, reject) => { | ||
| const next = () => { | ||
| while (inFlight < ioWorkers && idxNext < files.length) { | ||
| const file = files[idxNext++] | ||
| inFlight++ | ||
| ;(async () => { | ||
| const html = await readFileSafe(file) | ||
| const url = toSiteUrl(siteDir, file) | ||
| const title = extractTitle(html, path.basename(file, '.html')) | ||
| let text = stripHtml(html) | ||
| if (maxChars && text.length > maxChars) text = text.slice(0, maxChars) | ||
| // Add to index and discard text immediately | ||
| builder.add({ id: url, title, text }) | ||
| docMeta.push({ id: url, title, url }) | ||
| })() | ||
| .then(() => { | ||
| inFlight-- | ||
| if (idxNext >= files.length && inFlight === 0) resolve() | ||
| else next() | ||
| }) | ||
| .catch((e) => reject(e)) | ||
| } | ||
| if (idxNext >= files.length && inFlight === 0) resolve() | ||
| } | ||
| next() | ||
| }) | ||
|
|
||
| const index = builder.build() | ||
| return { docs: docMeta, index: index.toJSON() } | ||
| } | ||
|
|
||
| async function writeOutputs (siteDir, payload) { | ||
| const jsonPath = path.join(siteDir, 'search-index.json') | ||
| const jsPath = path.join(siteDir, 'search-index.js') | ||
| const json = JSON.stringify(payload) | ||
| await fsp.writeFile(jsonPath, json, 'utf8') | ||
| const js = `window.searchIndex=${json};\n` | ||
| await fsp.writeFile(jsPath, js, 'utf8') | ||
| } | ||
|
|
||
| async function main () { | ||
| const siteDir = path.resolve(process.argv[2] || path.join(process.cwd(), 'build', 'site')) | ||
| console.log(`[lunr-fast] Indexing site at: ${siteDir}`) | ||
| const t0 = Date.now() | ||
| const payload = await buildIndex(siteDir) | ||
| await writeOutputs(siteDir, payload) | ||
| const dt = ((Date.now() - t0) / 1000).toFixed(2) | ||
| console.log(`[lunr-fast] Wrote search-index.json and search-index.js in ${dt}s`) | ||
| } | ||
|
|
||
| main().catch((e) => { | ||
| console.error('[lunr-fast] fatal:', e && e.message ? e.message : e) | ||
| process.exit(1) | ||
| }) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want building the site to clean and regenerate everything in the component repositories every time, which is what this will do AFAICT.