Skip to content

Process large sets of metadata records in chunks to avoid long-running transactions and memory issues#9319

Open
josegar74 wants to merge 2 commits into
geonetwork:mainfrom
GeoCat:44-xslprocessing-batchedit-transactions
Open

Process large sets of metadata records in chunks to avoid long-running transactions and memory issues#9319
josegar74 wants to merge 2 commits into
geonetwork:mainfrom
GeoCat:44-xslprocessing-batchedit-transactions

Conversation

@josegar74

@josegar74 josegar74 commented Jun 8, 2026

Copy link
Copy Markdown
Member

This change request introduces a generic infrastructure to handle batch processing of items within database transactions, addressing code duplication found in several parts of the application where large sets of metadata records are being processed in chunks to avoid long-running transactions and memory issues.

Two new components were added to the jeeves.transaction package:

  • BatchItemProcessor<T>: A functional interface that defines the processing logic for a single item of type T. It allows for clean, lambda-based implementations of business logic.

    @FunctionalInterface
    public interface BatchItemProcessor<T> {
        void process(T item) throws Exception;
    }
  • BatchTransactionalProcessor<T>: The engine that orchestrates the batching. It partitions a collection of items (using Guava's Iterables.partition) and executes each batch within a new transaction managed by TransactionManager.

    • Configurable Batch Size: Defaults to 100, but can be adjusted via setBatchSize(int).
    • Transaction Management: Uses TransactionManager.runInTransaction with CREATE_NEW propagation and ALWAYS_COMMIT behavior for each batch.

XSLT processing and batch editing API's have been updated to use these new components

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

…mponent that processes large sets of metadata records in chunks to avoid long-running transactions and memory issues.
@josegar74 josegar74 added this to the 4.4.12 milestone Jun 8, 2026
@josegar74 josegar74 changed the title Extract common batch transactional processing logic into a generic component that processes large sets of metadata records in chunks to avoid long-running transactions and memory issues. Processes large sets of metadata records in chunks to avoid long-running transactions and memory issues. Jun 8, 2026
@josegar74 josegar74 changed the title Processes large sets of metadata records in chunks to avoid long-running transactions and memory issues. Processes large sets of metadata records in chunks to avoid long-running transactions and memory issues Jun 8, 2026
@josegar74 josegar74 changed the title Processes large sets of metadata records in chunks to avoid long-running transactions and memory issues Process large sets of metadata records in chunks to avoid long-running transactions and memory issues Jun 8, 2026
@josegar74 josegar74 marked this pull request as draft June 8, 2026 17:17
@josegar74 josegar74 marked this pull request as ready for review June 9, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant