Skip to content

Feat: Midaz #688 Add /batch Endpoint for Batch Request Processing #1779

Open
lazaroborges wants to merge 13 commits intoLerianStudio:mainfrom
lazaroborges:main
Open

Feat: Midaz #688 Add /batch Endpoint for Batch Request Processing #1779
lazaroborges wants to merge 13 commits intoLerianStudio:mainfrom
lazaroborges:main

Conversation

@lazaroborges
Copy link

@lazaroborges lazaroborges commented Feb 3, 2026

Midaz Pull Request Checklist

Pull Request Type

  • Documentation
  • Pipeline
  • Infra
  • Ledger
  • Onboarding
  • Transaction
  • CRM
  • Tests

Checklist

Please check each item after it's completed.

  • I have tested these changes locally.
  • I have updated the documentation accordingly.
  • I have added necessary comments to the code, especially in complex areas.
  • I have ensured that my changes adhere to the project's coding standards.
  • I have checked for any potential security issues.
  • I have ensured that all tests pass.
  • I have updated the version appropriately (if applicable).
  • I have confirmed this code is ready for review.

Obs: Please, always remember to target your PR to develop branch instead of main.

Unable to target a development branch because I'm outside the organization.

Overview

This Pull Request introduces the /batch endpoint to the Midaz ledger, addressing the requirements outlined in MDZ-688. The implementation enables high-throughput processing by allowing multiple operations to be submitted in a single request, utilizing a resilient, multi-tenant-aware architecture.

Technical Changes

1. Batch Processing and HTTP Semantics

  • 207 Multi-Status Response: Implemented a granular response model where the API reports individual success or failure for each operation within the batch. This ensures clients can identify specific failed items without assuming the entire batch failed.
  • CQRS Integration: Batch operations are decomposed into individual commands, maintaining consistency with Midaz's existing command-driven domain logic and transactional boundaries.

2. Production Hardening and Security

  • Tenant-Isolated Idempotency: Refactored BatchIdempotencyKey to include organizationID and ledgerID within the Redis hash tag {}. This modification addresses two critical production concerns:

    • Cross-Tenant Isolation: Prevents collisions where two distinct tenants providing the same idempotency key might retrieve each other's cached responses.
    • Redis Cluster Distribution: Spreads the load across all available hash slots in a cluster, preventing the performance bottlenecks associated with a single-slot "hot-spot."
  • Fail-Closed Rate Limiting: Updated unified-server.go to ensure the batch rate limiter is attached whenever enabled. The system now adheres to a "fail-closed" policy: if Redis is unavailable while rate limiting is required, the endpoint will respond with a 503 Service Unavailable rather than allowing unthrottled traffic.

    • Dependency Updates: Upgraded miniredis to v2.36.1 and patched go-redis/v9 against CVE-2025-29923.

    • Memory Optimization: Refactored batch response handling to avoid allocating full response bodies before
      truncation.

3. Validation and Reliability

  • Type Safety: Added explicit type assertion checks in validation functions to mitigate the risk of runtime panics during payload processing.
  • Constructor Validation: Implemented New*Handler constructors for all transaction-related adapters to ensure all mandatory dependencies are present and non-nil during application initialization.

Architecture Alignment

  • Ports and Adapters: The batch logic is encapsulated within the application layer, ensuring the core domain remains decoupled from the specific HTTP transport logic.
  • Dependency Injection: Follows the established Midaz pattern by utilizing validated constructors in config.go, improving the reliability of the application's boot sequence.

Testing and Verification

  • Unit Tests: All automated tests passed, including new test cases for batch handler logic and TTL calculations.
  • Agentic Skill Tests: Successfully validated the implementation against Lerian's Ring agent skill suite to confirm architectural compliance.
  • Manual Validation: Verified 207 Multi-Status behavior for partial failures and confirmed proper key distribution in a simulated Redis Cluster environment.

Checklist

  • All commits are signed with Signed-off-by as per DCO requirements.
  • Commit messages follow the Conventional Commits specification.
  • All local tests passed (Unit and Skill suites).
  • Implementation is based on the latest source from the main branch.

Related Issues

Closes #688

lazaroborges and others added 13 commits January 28, 2026 17:53
- Added BatchHandler to process multiple API requests in a single call.
- Introduced rate limiting configuration in the ledger component.
- Integrated Redis for idempotency and rate limiting support.
- Updated .env.example with new rate limiting parameters.
- Added tests for batch processing and rate limiting functionality.
- Enhanced error handling for batch requests and rate limits.
- Added new dependencies: miniredis for testing and redismock for Redis interactions.
- Improved error response structure in batch request tests to include error codes and messages.
- Updated go.mod and go.sum to reflect new dependencies and their versions.
- Enhanced batch processing logic to track orphaned handler goroutines for better resource management.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
- Implemented New*Handler functions for AssetRate, Balance, Operation, Transaction, and TransactionRoute handlers to ensure required dependencies are not nil.
- Updated the initialization of handlers in config.go to use the new constructors, improving error handling during handler creation.
- Updated validation functions to include type assertion checks for input fields, ensuring they are of the expected types before processing.
- This change enhances error handling and prevents potential panics due to type mismatches.
- Updated the handling of TTL values in the batch handler to ensure that the correct duration is passed to Redis operations.
- Adjusted tests to reflect the new TTL calculation, ensuring consistency between expected and actual values.
- Improved comments for clarity on TTL conversion from seconds to nanoseconds.
### Overview
This Pull Request introduces the `/batch` endpoint to the Midaz ledger, addressing the requirements outlined in MDZ-688. The implementation enables high-throughput processing by allowing multiple operations to be submitted in a single request, utilizing a resilient, multi-tenant-aware architecture.

### Technical Changes

#### 1. Batch Processing and HTTP Semantics
* **207 Multi-Status Response**: Implemented a granular response model where the API reports individual success or failure for each operation within the batch. This ensures clients can identify specific failed items without assuming the entire batch failed.
* **CQRS Integration**: Batch operations are decomposed into individual commands, maintaining consistency with Midaz's existing command-driven domain logic and transactional boundaries.

#### 2. Production Hardening and Security
* **Tenant-Isolated Idempotency**: Refactored `BatchIdempotencyKey` to include `organizationID` and `ledgerID` within the Redis hash tag `{}`. This modification addresses two critical production concerns:
    * **Cross-Tenant Isolation**: Prevents collisions where two distinct tenants providing the same idempotency key might retrieve each other's cached responses.
    * **Redis Cluster Distribution**: Spreads the load across all available hash slots in a cluster, preventing the performance bottlenecks associated with a single-slot "hot-spot."

* **Fail-Closed Rate Limiting**: Updated `unified-server.go` to ensure the batch rate limiter is attached whenever enabled. The system now adheres to a "fail-closed" policy: if Redis is unavailable while rate limiting is required, the endpoint will respond with a 503 Service Unavailable rather than allowing unthrottled traffic.

  * **Dependency Updates**: Upgraded miniredis to v2.36.1 and patched go-redis/v9 against CVE-2025-29923.                 

  * **Memory Optimization**: Refactored batch response handling to avoid allocating full response bodies before           
  truncation. 

#### 3. Validation and Reliability
* **Type Safety**: Added explicit type assertion checks in validation functions to mitigate the risk of runtime panics during payload processing.
* **Constructor Validation**: Implemented `New*Handler` constructors for all transaction-related adapters to ensure all mandatory dependencies are present and non-nil during application initialization.

### Architecture Alignment
* **Ports and Adapters**: The batch logic is encapsulated within the application layer, ensuring the core domain remains decoupled from the specific HTTP transport logic.
* **Dependency Injection**: Follows the established Midaz pattern by utilizing validated constructors in `config.go`, improving the reliability of the application's boot sequence.

### Testing and Verification
* **Unit Tests**: All automated tests passed, including new test cases for batch handler logic and TTL calculations.
* **Agentic Skill Tests**: Successfully validated the implementation against Lerian's Ring agent skill suite to confirm architectural compliance.
* **Manual Validation**: Verified 207 Multi-Status behavior for partial failures and confirmed proper key distribution in a simulated Redis Cluster environment.

### Checklist
- [x] All commits are signed with `Signed-off-by` as per DCO requirements.
- [x] Commit messages follow the Conventional Commits specification.
- [x] All local tests passed (Unit and Skill suites).
- [x] Implementation is based on the latest source from the `main` branch.

### Related Issues
Closes LerianStudio#688
@lazaroborges lazaroborges requested review from a team as code owners February 3, 2026 16:21
@lazaroborges lazaroborges changed the title Feat: Midaz #688 Add /batch Endpoint for Batch Request Processing #3 Feat: Midaz #688 Add /batch Endpoint for Batch Request Processing Feb 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Walkthrough

This pull request introduces a comprehensive batch processing feature for HTTP API requests in the ledger service. The implementation includes a new BatchHandler that validates batch requests, optionally manages idempotency via Redis, processes items concurrently with a bounded worker pool, and returns aggregated responses. Complementary changes add Redis-backed rate limiting middleware supporting both per-request and per-batch-item counting, new error constants for batch and rate-limit scenarios, HTTP request/response models for batch operations, constructor-based dependency injection for HTTP handlers, and configuration wiring to enable the /v1/batch endpoint with optional authorization and rate limiting middleware. A Lua script provides atomic Redis operations for rate-limit check-and-increment logic. Transaction service is extended to expose its Redis client for unified ledger integration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BatchHandler
    participant AuthMiddleware
    participant RateLimiter
    participant Redis
    participant WorkerPool
    participant InternalHandler
    participant Response

    Client->>BatchHandler: POST /v1/batch<br/>(batch request)
    
    BatchHandler->>BatchHandler: Validate batch<br/>(size, IDs, methods, paths)
    
    opt Authorization enabled
        BatchHandler->>AuthMiddleware: Check authorization
        AuthMiddleware->>BatchHandler: Authorized
    end
    
    opt Rate limiting enabled
        BatchHandler->>RateLimiter: Check rate limit
        RateLimiter->>Redis: Execute Lua script<br/>(check & increment)
        Redis->>RateLimiter: {allowed, count, ttl}
        RateLimiter->>BatchHandler: Allowed or 429
    end
    
    BatchHandler->>Redis: Check idempotency key<br/>(org/ledger scoped)
    alt Cached response exists
        Redis->>BatchHandler: Return cached response
        BatchHandler->>Response: Return 200 with cached data
    else Locked (in progress)
        BatchHandler->>Response: Return 409 Conflict
    else Process new batch
        Redis->>BatchHandler: Create lock
        
        BatchHandler->>WorkerPool: Distribute batch items<br/>(with concurrency limit)
        par Process Items
            WorkerPool->>InternalHandler: Process item 1
            InternalHandler->>WorkerPool: Result 1
        and
            WorkerPool->>InternalHandler: Process item 2
            InternalHandler->>WorkerPool: Result 2
        and
            WorkerPool->>InternalHandler: Process item N
            InternalHandler->>WorkerPool: Result N
        end
        
        BatchHandler->>BatchHandler: Aggregate results<br/>(statuses, bodies, errors)
        
        BatchHandler->>Redis: Store idempotency response<br/>(aggregate result)
        Redis->>BatchHandler: Stored
        
        BatchHandler->>Response: Return 201/207<br/>(BatchResponse)
    end
    
    Response->>Client: HTTP response<br/>(aggregate batch results)
Loading
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main objective of adding a /batch endpoint for batch request processing, directly addressing the core change in the PR.
Linked Issues check ✅ Passed The PR fully addresses the core requirements from issue #688: implements POST /batch endpoint with proper request/response structures, supports rate limiting and authentication at batch level, enables partial success/failure semantics, and includes comprehensive validation and error handling.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the /batch endpoint and supporting features. Code refactoring (constructors, type assertions), dependency updates, and supporting infrastructure (rate limiting, idempotency keys) are all necessary for the batch endpoint functionality.
Description check ✅ Passed The PR description is comprehensive and well-structured, following the template's checklist format with all major sections completed and detailed technical explanations provided.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@components/ledger/.env.example`:
- Around line 170-179: Reorder the RATE_LIMIT_* keys in
components/ledger/.env.example to satisfy dotenv-linter's alphabetical ordering:
keep RATE_LIMIT_ENABLED first, then RATE_LIMIT_MAX_BATCH_ITEMS_PER_MINUTE, then
RATE_LIMIT_MAX_BATCH_SIZE, and finally RATE_LIMIT_MAX_REQUESTS_PER_MINUTE;
update the file so the four RATE_LIMIT_* entries appear in that exact order and
keep their current values/comments.

In `@components/ledger/internal/adapters/http/in/batch.go`:
- Around line 677-691: The watchdog goroutine watching resultChan for each
reqItem.ID can accumulate under sustained timeouts; update the path that
increments orphanedHandlerCount to also emit/update a monitoring metric (e.g.,
prometheus gauge named orphaned_handler_count) and trigger an alert when it
exceeds a sensible threshold, and ensure the decrement path updates that metric
too; additionally add short docs/comments near the handler entry (and in the
handler code) instructing implementers to respect context cancellation (check
ctx.Done()) to prevent orphaned handlers, and optionally replace per-item
long-lived goroutines with a pooled/watcher worker if you need to bound
concurrent watchdogs.
- Around line 644-645: Document and encapsulate the use of Fiber v2's internal
user-context key "__local_user_context__" by (1) adding an inline comment where
fasthttpCtx.SetUserValue("__local_user_context__", reqCtx) is called explaining
this is Fiber internals and upgrade risk, (2) extracting that call into a small
helper function (e.g., setFiberLocalUserContext(fasthttpCtx, reqCtx)) so future
changes are localized, and (3) add a unit/integration test that dispatches a
batch item through the existing batch handler and asserts that handler-level
c.UserContext() receives the timeout-aware reqCtx; also add a note in
runbook/docs to verify this key after any Fiber upgrade.

In `@components/ledger/internal/bootstrap/unified-server.go`:
- Around line 130-134: The creation of BatchHandler in registerBatchEndpoint
currently logs and returns silently on error; change this to a strict failure
path: either (A) make registerBatchEndpoint return an error and propagate the
BatchHandler creation error up to the caller (modify the function signature,
return the err when BatchHandler creation fails and ensure callers handle it),
or (B) adopt fail-fast behavior by replacing logger.Errorf+return with
logger.Panicf (or panic) so the process stops on programmer errors; update all
call sites of registerBatchEndpoint accordingly.

In `@components/transaction/internal/adapters/http/in/transaction.go`:
- Around line 42-57: NewTransactionHandler currently mixes error patterns: it
calls pkg.ValidateInternalError(constant.ErrInternalServer, "Command use case
cannot be nil") and similar for qry, but other constructors use fmt.Errorf and
ValidateInternalError expects an entityType (not a message). Fix by making
NewTransactionHandler consistent with the other handlers—either replace the
ValidateInternalError calls with fmt.Errorf("command use case cannot be nil") /
fmt.Errorf("query use case cannot be nil") or, if you prefer
ValidateInternalError, pass the correct entityType (e.g., "TransactionHandler")
as the second arg and supply the error context via the constant or wrapping;
update both nil checks in NewTransactionHandler accordingly.

In `@components/transaction/internal/bootstrap/service.go`:
- Around line 136-140: GetRedisClient currently returns app.Ports.RedisClient
without any nil-safety or documentation, which is inconsistent with the explicit
nil check in GetMetadataIndexPort; update GetRedisClient (method on Service) to
either perform the same nil check and comment "caller should check" when app or
app.Ports or app.Ports.RedisClient is nil, or add a doc comment to
GetRedisClient stating the returned *redis.Client may be nil so callers must
check; reference the Service type, GetRedisClient method and Ports.RedisClient
field when making the change for consistency with GetMetadataIndexPort.

In `@pkg/net/http/scripts/rate_limit_check_and_increment.lua`:
- Around line 20-42: When TTL is -1 (key exists but no expiry) the script should
reapply the expiration so the key doesn't become permanently throttled; update
both the deny path (where you compute ttl before returning {0, current, ttl})
and the allow path after INCRBY: if redis.call('TTL', key) returns < 0, call
redis.call('EXPIRE', key, expiration) and then set ttl = expiration before
returning. Reference the variables and calls: key, current, increment,
max_items, expiration, redis.call('TTL' ...), redis.call('EXPIRE' ...), and
redis.call('INCRBY' ...).

Comment on lines +170 to +179
# RATE LIMITING CONFIGURATION
# This is the rate limiting configuration for batch requests in the ledger.
# Enable/disable rate limiting (default: false)
RATE_LIMIT_ENABLED=false
# Maximum requests per minute for standard endpoints
RATE_LIMIT_MAX_REQUESTS_PER_MINUTE=1000
# Maximum batch items per minute (counts individual items, not requests)
RATE_LIMIT_MAX_BATCH_ITEMS_PER_MINUTE=5000
# Maximum batch size per request (max items in a single batch request)
RATE_LIMIT_MAX_BATCH_SIZE=100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix dotenv-linter UnorderedKey warnings.
Reorder the new RATE_LIMIT_* keys to satisfy the linter expectations.

🔧 Suggested reorder
 RATE_LIMIT_ENABLED=false
-# Maximum requests per minute for standard endpoints
-RATE_LIMIT_MAX_REQUESTS_PER_MINUTE=1000
 # Maximum batch items per minute (counts individual items, not requests)
 RATE_LIMIT_MAX_BATCH_ITEMS_PER_MINUTE=5000
 # Maximum batch size per request (max items in a single batch request)
 RATE_LIMIT_MAX_BATCH_SIZE=100
+# Maximum requests per minute for standard endpoints
+RATE_LIMIT_MAX_REQUESTS_PER_MINUTE=1000
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 177-177: [UnorderedKey] The RATE_LIMIT_MAX_BATCH_ITEMS_PER_MINUTE key should go before the RATE_LIMIT_MAX_REQUESTS_PER_MINUTE key

(UnorderedKey)


[warning] 179-179: [UnorderedKey] The RATE_LIMIT_MAX_BATCH_SIZE key should go before the RATE_LIMIT_MAX_REQUESTS_PER_MINUTE key

(UnorderedKey)

🤖 Prompt for AI Agents
In `@components/ledger/.env.example` around lines 170 - 179, Reorder the
RATE_LIMIT_* keys in components/ledger/.env.example to satisfy dotenv-linter's
alphabetical ordering: keep RATE_LIMIT_ENABLED first, then
RATE_LIMIT_MAX_BATCH_ITEMS_PER_MINUTE, then RATE_LIMIT_MAX_BATCH_SIZE, and
finally RATE_LIMIT_MAX_REQUESTS_PER_MINUTE; update the file so the four
RATE_LIMIT_* entries appear in that exact order and keep their current
values/comments.

Comment on lines +644 to +645
// Key "__local_user_context__" is Fiber v2's internal key for user context.
fasthttpCtx.SetUserValue("__local_user_context__", reqCtx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

Fiber v2 SetUserContext user context fasthttp documentation

💡 Result:

In Fiber v2, request “user context” is handled via these Ctx methods:

  • func (c *fiber.Ctx) SetUserContext(ctx context.Context) — sets a user-provided context.Context. [1]
  • func (c *fiber.Ctx) UserContext() context.Context — returns the previously set context (or a non-nil empty context if none was set). [2]

Under the hood, Fiber stores this on the underlying fasthttp.RequestCtx using user values (the RequestCtx.SetUserValue(...) / UserValue(...) mechanism). In fasthttp, these user values are request-scoped and are cleared after the top-level request handler returns. [3]

Notes:

  • In Fiber v3, UserContext / SetUserContext were removed in favor of SetContext/Context() and having Ctx satisfy context.Context. [4]

Sources:
[1] Fiber v2 Ctx.SetUserContext docs (docs.gofiber.io)
[2] Fiber v2 Ctx.UserContext docs (docs.gofiber.io)
[3] fasthttp RequestCtx.SetUserValue docs (lifetime/cleanup semantics) (pkg.go.dev)
[4] Fiber v3 “What’s New” (removal/migration note) (docs.gofiber.io)

Citations:


🏁 Script executed:

# First, let's look at the actual code context in batch.go
cd components/ledger/internal/adapters/http/in
cat -n batch.go | sed -n '640,650p'

Repository: LerianStudio/midaz

Length of output: 589


🏁 Script executed:

# Let's also check if SetUserContext is used elsewhere in the codebase
rg "SetUserContext|SetUserValue" --type go -B 2 -A 2

Repository: LerianStudio/midaz

Length of output: 7398


🏁 Script executed:

# Let's check what fasthttp RequestCtx supports
# First, find if there are any imports from fasthttp or Fiber
cat -n batch.go | head -30

Repository: LerianStudio/midaz

Length of output: 102


🏁 Script executed:

cd components/ledger/internal/adapters/http/in
cat -n batch.go | sed -n '620,680p'

Repository: LerianStudio/midaz

Length of output: 2817


🏁 Script executed:

# Check the function signature and context
cat -n batch.go | sed -n '550,600p'

Repository: LerianStudio/midaz

Length of output: 102


🏁 Script executed:

# Find the batch.go file
find . -name "batch.go" -type f

Repository: LerianStudio/midaz

Length of output: 138


🏁 Script executed:

# Let's check the function that contains this code
rg "func.*processRequest" --type go -B 5 -A 10 components/ledger/internal/adapters/http/in/

Repository: LerianStudio/midaz

Length of output: 1609


🏁 Script executed:

# Get more context around the processRequest function to understand the full scope
cat -n ./components/ledger/internal/adapters/http/in/batch.go | sed -n '536,560p'

Repository: LerianStudio/midaz

Length of output: 1241


🏁 Script executed:

# Check if there's any Fiber context available
cat -n ./components/ledger/internal/adapters/http/in/batch.go | sed -n '580,650p'

Repository: LerianStudio/midaz

Length of output: 3082


🏁 Script executed:

# Check if there's any way this could use Fiber's proper API
# Look for imports and see if Fiber Ctx is available elsewhere
cat -n ./components/ledger/internal/adapters/http/in/batch.go | head -50

Repository: LerianStudio/midaz

Length of output: 2277


🏁 Script executed:

# Verify that the handler is indeed being called as a raw fasthttp handler
rg "h\.App\.Handler" ./components/ledger/internal/adapters/http/in/ -B 2 -A 2

Repository: LerianStudio/midaz

Length of output: 520


Reliance on Fiber internal API detail is architecturally necessary here, but document upgrade risk.

Using "__local_user_context__" directly on fasthttp.RequestCtx couples to Fiber v2's internal implementation. While the documented c.SetUserContext() method exists, it's only available on fiber.Ctx, not fasthttp.RequestCtx. Since this code deliberately creates a raw fasthttp.RequestCtx and calls handlers directly (bypassing Fiber's context wrapper), using the internal key is the only viable approach.

Mitigate this coupling by:

  • Adding a test that verifies c.UserContext() receives the timeout-aware context in batch item handlers
  • Document in code comments and runbooks that Fiber upgrades require verification that this key still exists
  • Consider extracting this to a helper function for easier future updates if Fiber's internals change
🤖 Prompt for AI Agents
In `@components/ledger/internal/adapters/http/in/batch.go` around lines 644 - 645,
Document and encapsulate the use of Fiber v2's internal user-context key
"__local_user_context__" by (1) adding an inline comment where
fasthttpCtx.SetUserValue("__local_user_context__", reqCtx) is called explaining
this is Fiber internals and upgrade risk, (2) extracting that call into a small
helper function (e.g., setFiberLocalUserContext(fasthttpCtx, reqCtx)) so future
changes are localized, and (3) add a unit/integration test that dispatches a
batch item through the existing batch handler and asserts that handler-level
c.UserContext() receives the timeout-aware reqCtx; also add a note in
runbook/docs to verify this key after any Fiber upgrade.

Comment on lines +677 to +691
// Spawn cleanup goroutine to track when the orphaned handler completes
go func() {
select {
case <-resultChan:
// Handler completed after timeout - decrement orphan count
orphanedHandlerCount.Add(-1)
logger.Debugf("Batch item %s: orphaned handler completed. Remaining orphaned handlers: %d",
reqItem.ID, orphanedHandlerCount.Load())
case <-time.After(5 * time.Minute):
// Handler still running after 5 minutes - log severe warning
logger.Errorf("Batch item %s: orphaned handler still running after 5 minutes. "+
"This indicates a handler that doesn't respect context cancellation. "+
"Remaining orphaned handlers: %d", reqItem.ID, orphanedHandlerCount.Load())
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Watchdog goroutine may accumulate under sustained handler timeouts.

The cleanup goroutine that monitors orphaned handlers waits up to 5 minutes. Under sustained load with consistently timing-out handlers, these watchdog goroutines could accumulate (up to orphanedHandlerCount active ones). The logging is valuable for observability, but consider:

  1. Adding a metric/alert threshold for orphanedHandlerCount to trigger investigation.
  2. Documenting that handlers should check context.Done() to avoid this scenario.
🤖 Prompt for AI Agents
In `@components/ledger/internal/adapters/http/in/batch.go` around lines 677 - 691,
The watchdog goroutine watching resultChan for each reqItem.ID can accumulate
under sustained timeouts; update the path that increments orphanedHandlerCount
to also emit/update a monitoring metric (e.g., prometheus gauge named
orphaned_handler_count) and trigger an alert when it exceeds a sensible
threshold, and ensure the decrement path updates that metric too; additionally
add short docs/comments near the handler entry (and in the handler code)
instructing implementers to respect context cancellation (check ctx.Done()) to
prevent orphaned handlers, and optionally replace per-item long-lived goroutines
with a pooled/watcher worker if you need to bound concurrent watchdogs.

Comment on lines +130 to +134
if err != nil {
logger.Errorf("Failed to create batch handler: %v", err)

return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider returning an error or using a stricter initialization pattern.

When BatchHandler creation fails, the function logs the error and returns silently, leaving the batch endpoint unregistered. While the current constructors only fail if app is nil (which should never happen in this context), this silent failure could mask issues during future refactoring.

Consider either:

  1. Panicking during initialization (fail-fast for programmer errors)
  2. Returning an error from registerBatchEndpoint to propagate to the caller
♻️ Optional: Fail-fast approach for programmer errors
 	if err != nil {
-		logger.Errorf("Failed to create batch handler: %v", err)
-
-		return
+		// This should never happen since app is always non-nil at this point
+		panic(fmt.Sprintf("failed to create batch handler: %v", err))
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
logger.Errorf("Failed to create batch handler: %v", err)
return
}
if err != nil {
// This should never happen since app is always non-nil at this point
panic(fmt.Sprintf("failed to create batch handler: %v", err))
}
🤖 Prompt for AI Agents
In `@components/ledger/internal/bootstrap/unified-server.go` around lines 130 -
134, The creation of BatchHandler in registerBatchEndpoint currently logs and
returns silently on error; change this to a strict failure path: either (A) make
registerBatchEndpoint return an error and propagate the BatchHandler creation
error up to the caller (modify the function signature, return the err when
BatchHandler creation fails and ensure callers handle it), or (B) adopt
fail-fast behavior by replacing logger.Errorf+return with logger.Panicf (or
panic) so the process stops on programmer errors; update all call sites of
registerBatchEndpoint accordingly.

Comment on lines +42 to +57
// NewTransactionHandler creates a new TransactionHandler with validation.
// Returns an error if required dependencies are nil.
func NewTransactionHandler(cmd *command.UseCase, qry *query.UseCase) (*TransactionHandler, error) {
if cmd == nil {
return nil, pkg.ValidateInternalError(constant.ErrInternalServer, "Command use case cannot be nil")
}

if qry == nil {
return nil, pkg.ValidateInternalError(constant.ErrInternalServer, "Query use case cannot be nil")
}

return &TransactionHandler{
Command: cmd,
Query: qry,
}, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Inconsistent error handling pattern across handler constructors.

This constructor uses pkg.ValidateInternalError(constant.ErrInternalServer, "Command use case cannot be nil") while other handlers in this PR (NewBalanceHandler, NewOperationHandler, NewOperationRouteHandler) use fmt.Errorf("command use case cannot be nil").

Additionally, looking at the ValidateInternalError signature from the relevant code snippets, the second parameter is entityType, not a message string. The current usage passes a descriptive message where an entity type is expected.

Consider either:

  1. Aligning all handlers to use fmt.Errorf for consistency, or
  2. If using ValidateInternalError, pass an appropriate entity type (e.g., "TransactionHandler")
Option 1: Align with other handlers using fmt.Errorf
+import "fmt"
+
 // NewTransactionHandler creates a new TransactionHandler with validation.
 // Returns an error if required dependencies are nil.
 func NewTransactionHandler(cmd *command.UseCase, qry *query.UseCase) (*TransactionHandler, error) {
 	if cmd == nil {
-		return nil, pkg.ValidateInternalError(constant.ErrInternalServer, "Command use case cannot be nil")
+		return nil, fmt.Errorf("command use case cannot be nil")
 	}
 
 	if qry == nil {
-		return nil, pkg.ValidateInternalError(constant.ErrInternalServer, "Query use case cannot be nil")
+		return nil, fmt.Errorf("query use case cannot be nil")
 	}
🤖 Prompt for AI Agents
In `@components/transaction/internal/adapters/http/in/transaction.go` around lines
42 - 57, NewTransactionHandler currently mixes error patterns: it calls
pkg.ValidateInternalError(constant.ErrInternalServer, "Command use case cannot
be nil") and similar for qry, but other constructors use fmt.Errorf and
ValidateInternalError expects an entityType (not a message). Fix by making
NewTransactionHandler consistent with the other handlers—either replace the
ValidateInternalError calls with fmt.Errorf("command use case cannot be nil") /
fmt.Errorf("query use case cannot be nil") or, if you prefer
ValidateInternalError, pass the correct entityType (e.g., "TransactionHandler")
as the second arg and supply the error context via the constant or wrapping;
update both nil checks in NewTransactionHandler accordingly.

Comment on lines +136 to +140
// GetRedisClient returns the Redis client for use by other modules.
// This is used for rate limiting in unified ledger mode.
func (app *Service) GetRedisClient() *redis.Client {
return app.Ports.RedisClient
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding nil-safety documentation or check for consistency.

GetMetadataIndexPort() now has an explicit nil check with a comment that "caller should check", but GetRedisClient() directly returns the field without similar handling. For consistency, consider either:

  1. Adding a similar nil check pattern here, or
  2. Adding a doc comment noting that the returned client may be nil
Optional: Add documentation about nil return
 // GetRedisClient returns the Redis client for use by other modules.
 // This is used for rate limiting in unified ledger mode.
+// Returns nil if Redis is not configured.
 func (app *Service) GetRedisClient() *redis.Client {
 	return app.Ports.RedisClient
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetRedisClient returns the Redis client for use by other modules.
// This is used for rate limiting in unified ledger mode.
func (app *Service) GetRedisClient() *redis.Client {
return app.Ports.RedisClient
}
// GetRedisClient returns the Redis client for use by other modules.
// This is used for rate limiting in unified ledger mode.
// Returns nil if Redis is not configured.
func (app *Service) GetRedisClient() *redis.Client {
return app.Ports.RedisClient
}
🤖 Prompt for AI Agents
In `@components/transaction/internal/bootstrap/service.go` around lines 136 - 140,
GetRedisClient currently returns app.Ports.RedisClient without any nil-safety or
documentation, which is inconsistent with the explicit nil check in
GetMetadataIndexPort; update GetRedisClient (method on Service) to either
perform the same nil check and comment "caller should check" when app or
app.Ports or app.Ports.RedisClient is nil, or add a doc comment to
GetRedisClient stating the returned *redis.Client may be nil so callers must
check; reference the Service type, GetRedisClient method and Ports.RedisClient
field when making the change for consistency with GetMetadataIndexPort.

Comment on lines +20 to +42
-- Check if increment would exceed limit
if current + increment > max_items then
-- Rate limit exceeded, return current state without incrementing
local ttl = redis.call('TTL', key)
if ttl < 0 then
ttl = 0
end
return {0, current, ttl}
end

-- Increment atomically
local new_count = redis.call('INCRBY', key, increment)

-- Set expiration if this is a new key
if new_count == increment then
redis.call('EXPIRE', key, expiration)
end

-- Get TTL
local ttl = redis.call('TTL', key)
if ttl < 0 then
ttl = expiration
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore expiry when a key exists without TTL.
If TTL is -1 (key exists but no expiry), the script currently returns ttl = 0 (deny path) or ttl = expiration without reapplying expiry (allow path), risking permanent throttling. Consider re-setting expiration when TTL is missing.

🛠️ Suggested fix
-    local ttl = redis.call('TTL', key)
-    if ttl < 0 then
-        ttl = 0
-    end
+    local ttl = redis.call('TTL', key)
+    if ttl == -1 then
+        redis.call('EXPIRE', key, expiration)
+        ttl = expiration
+    elseif ttl == -2 then
+        ttl = 0
+    end
@@
-local ttl = redis.call('TTL', key)
-if ttl < 0 then
-    ttl = expiration
-end
+local ttl = redis.call('TTL', key)
+if ttl == -1 then
+    redis.call('EXPIRE', key, expiration)
+    ttl = expiration
+elseif ttl == -2 then
+    ttl = expiration
+end
🤖 Prompt for AI Agents
In `@pkg/net/http/scripts/rate_limit_check_and_increment.lua` around lines 20 -
42, When TTL is -1 (key exists but no expiry) the script should reapply the
expiration so the key doesn't become permanently throttled; update both the deny
path (where you compute ttl before returning {0, current, ttl}) and the allow
path after INCRBY: if redis.call('TTL', key) returns < 0, call
redis.call('EXPIRE', key, expiration) and then set ttl = expiration before
returning. Reference the variables and calls: key, current, increment,
max_items, expiration, redis.call('TTL' ...), redis.call('EXPIRE' ...), and
redis.call('INCRBY' ...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST]: Add /batch Endpoint for Batch Request Processing

1 participant