Skip to content

Commit 1fa11c1

Browse files
authored
feat(v2): parallel batch processing (#1620)
* implement parallel processing * test parallel processing * document parallel processing * code review * sqs // batch processing example * rollback to AtomicBoolean * complete sqs batch example with logs and traces annotations * update doc * update doc * update doc
1 parent 749e973 commit 1fa11c1

File tree

21 files changed

+1877
-180
lines changed

21 files changed

+1877
-180
lines changed

docs/utilities/batch.md

Lines changed: 65 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ stateDiagram-v2
3030

3131
* Reports batch item failures to reduce number of retries for a record upon errors
3232
* Simple interface to process each batch record
33+
* Parallel processing of batches
3334
* Integrates with Java Events library and the deserialization module
3435
* Build your own batch processor by extending primitives
3536

@@ -110,16 +111,9 @@ You can use your preferred deployment framework to set the correct configuration
110111
while the `powertools-batch` module handles generating the response, which simply needs to be returned as the result of
111112
your Lambda handler.
112113

113-
A complete [Serverless Application Model](https://aws.amazon.com/serverless/sam/) example can be found
114-
[here](https://github.com/aws-powertools/powertools-lambda-java/tree/main/examples/powertools-examples-batch) covering
115-
all of the batch sources.
116-
117-
For more information on configuring `ReportBatchItemFailures`,
118-
see the details for [SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting),
119-
[Kinesis](https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-batchfailurereporting),and
120-
[DynamoDB Streams](https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html#services-ddb-batchfailurereporting).
121-
114+
A complete [Serverless Application Model](https://aws.amazon.com/serverless/sam/) example can be found [here](https://github.com/aws-powertools/powertools-lambda-java/tree/main/examples/powertools-examples-batch) covering all the batch sources.
122115

116+
For more information on configuring `ReportBatchItemFailures`, see the details for [SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting), [Kinesis](https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-batchfailurereporting), and [DynamoDB Streams](https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html#services-ddb-batchfailurereporting).
123117

124118

125119
!!! note "You do not need any additional IAM permissions to use this utility, except for what each event source requires."
@@ -150,12 +144,10 @@ see the details for [SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.
150144
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
151145
return handler.processBatch(sqsEvent, context);
152146
}
153-
154-
147+
155148
private void processMessage(Product p, Context c) {
156149
// Process the product
157150
}
158-
159151
}
160152
```
161153

@@ -276,7 +268,6 @@ see the details for [SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.
276268
private void processMessage(Product p, Context c) {
277269
// process the product
278270
}
279-
280271
}
281272
```
282273

@@ -475,6 +466,51 @@ see the details for [SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.
475466
}
476467
```
477468

469+
## Parallel processing
470+
You can choose to process batch items in parallel using the `BatchMessageHandler#processBatchInParallel()`
471+
instead of `BatchMessageHandler#processBatch()`. Partial batch failure works the same way but items are processed
472+
in parallel rather than sequentially.
473+
474+
This feature is available for SQS, Kinesis and DynamoDB Streams but cannot be
475+
used with SQS FIFO. In that case, an `UnsupportedOperationException` is thrown.
476+
477+
!!! warning
478+
Note that parallel processing is not always better than sequential processing,
479+
and you should benchmark your code to determine the best approach for your use case.
480+
481+
!!! info
482+
To get more threads available (more vCPUs), you need to increase the amount of memory allocated to your Lambda function.
483+
While it is possible to increase the number of threads using Java options or custom thread pools,
484+
in most cases the defaults work well, and changing them is more likely to decrease performance
485+
(see [here](https://www.baeldung.com/java-when-to-use-parallel-stream#fork-join-framework)
486+
and [here](https://dzone.com/articles/be-aware-of-forkjoinpoolcommonpool)).
487+
In situations where this may be useful - such as performing IO-bound work in parallel - make sure to measure before and after!
488+
489+
490+
=== "Example with SQS"
491+
492+
```java hl_lines="13"
493+
public class SqsBatchHandler implements RequestHandler<SQSEvent, SQSBatchResponse> {
494+
495+
private final BatchMessageHandler<SQSEvent, SQSBatchResponse> handler;
496+
497+
public SqsBatchHandler() {
498+
handler = new BatchMessageHandlerBuilder()
499+
.withSqsBatchHandler()
500+
.buildWithMessageHandler(this::processMessage, Product.class);
501+
}
502+
503+
@Override
504+
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
505+
return handler.processBatchInParallel(sqsEvent, context);
506+
}
507+
508+
private void processMessage(Product p, Context c) {
509+
// Process the product
510+
}
511+
}
512+
```
513+
478514

479515
## Handling Messages
480516

@@ -490,7 +526,7 @@ In general, the deserialized message handler should be used unless you need acce
490526

491527
=== "Raw Message Handler"
492528

493-
```java
529+
```java hl_lines="4 7"
494530
public void setup() {
495531
BatchMessageHandler<SQSEvent, SQSBatchResponse> handler = new BatchMessageHandlerBuilder()
496532
.withSqsBatchHandler()
@@ -505,7 +541,7 @@ In general, the deserialized message handler should be used unless you need acce
505541

506542
=== "Deserialized Message Handler"
507543

508-
```java
544+
```java hl_lines="4 7"
509545
public void setup() {
510546
BatchMessageHandler<SQSEvent, SQSBatchResponse> handler = new BatchMessageHandlerBuilder()
511547
.withSqsBatchHandler()
@@ -529,20 +565,20 @@ provide a custom failure handler.
529565
Handlers can be provided when building the batch processor and are available for all event sources.
530566
For instance for DynamoDB:
531567

532-
```java
533-
BatchMessageHandler<DynamodbEvent, StreamsEventResponse> handler = new BatchMessageHandlerBuilder()
534-
.withDynamoDbBatchHandler()
535-
.withSuccessHandler((m) -> {
536-
// Success handler receives the raw message
537-
LOGGER.info("Message with sequenceNumber {} was successfully processed",
538-
m.getDynamodb().getSequenceNumber());
539-
})
540-
.withFailureHandler((m, e) -> {
541-
// Failure handler receives the raw message and the exception thrown.
542-
LOGGER.info("Message with sequenceNumber {} failed to be processed: {}"
543-
, e.getDynamodb().getSequenceNumber(), e);
544-
})
545-
.buildWithMessageHander(this::processMessage);
568+
```java hl_lines="3 8"
569+
BatchMessageHandler<DynamodbEvent, StreamsEventResponse> handler = new BatchMessageHandlerBuilder()
570+
.withDynamoDbBatchHandler()
571+
.withSuccessHandler((m) -> {
572+
// Success handler receives the raw message
573+
LOGGER.info("Message with sequenceNumber {} was successfully processed",
574+
m.getDynamodb().getSequenceNumber());
575+
})
576+
.withFailureHandler((m, e) -> {
577+
// Failure handler receives the raw message and the exception thrown.
578+
LOGGER.info("Message with sequenceNumber {} failed to be processed: {}"
579+
, e.getDynamodb().getSequenceNumber(), e);
580+
})
581+
.buildWithMessageHander(this::processMessage);
546582
```
547583

548584
!!! info

examples/powertools-examples-batch/deploy/sqs/template.yml

Lines changed: 58 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,10 @@ Globals:
77
Function:
88
Timeout: 20
99
Runtime: java11
10-
MemorySize: 512
11-
Tracing: Active
10+
MemorySize: 5400
1211
Environment:
1312
Variables:
1413
POWERTOOLS_LOG_LEVEL: INFO
15-
POWERTOOLS_LOGGER_SAMPLE_RATE: 1.0
1614
POWERTOOLS_LOGGER_LOG_EVENT: true
1715

1816
Resources:
@@ -45,6 +43,9 @@ Resources:
4543
AliasName: alias/powertools-batch-sqs-demo
4644
TargetKeyId: !Ref CustomerKey
4745

46+
Bucket:
47+
Type: AWS::S3::Bucket
48+
4849
DemoDlqSqsQueue:
4950
Type: AWS::SQS::Queue
5051
Properties:
@@ -96,11 +97,57 @@ Resources:
9697
DemoSQSConsumerFunction:
9798
Type: AWS::Serverless::Function
9899
Properties:
100+
Tracing: Active
99101
CodeUri: ../..
100102
Handler: org.demo.batch.sqs.SqsBatchHandler::handleRequest
101103
Environment:
102104
Variables:
103105
POWERTOOLS_SERVICE_NAME: sqs-demo
106+
BUCKET: !Ref Bucket
107+
Policies:
108+
- Statement:
109+
- Sid: SQSDeleteGetAttribute
110+
Effect: Allow
111+
Action:
112+
- sqs:DeleteMessageBatch
113+
- sqs:GetQueueAttributes
114+
Resource: !GetAtt DemoSqsQueue.Arn
115+
- Sid: SQSSendMessageBatch
116+
Effect: Allow
117+
Action:
118+
- sqs:SendMessageBatch
119+
- sqs:SendMessage
120+
Resource: !GetAtt DemoDlqSqsQueue.Arn
121+
- Sid: SQSKMSKey
122+
Effect: Allow
123+
Action:
124+
- kms:GenerateDataKey
125+
- kms:Decrypt
126+
Resource: !GetAtt CustomerKey.Arn
127+
- Sid: WriteToS3
128+
Effect: Allow
129+
Action:
130+
- s3:PutObject
131+
Resource: !Sub ${Bucket.Arn}/*
132+
133+
# Events:
134+
# MySQSEvent:
135+
# Type: SQS
136+
# Properties:
137+
# Queue: !GetAtt DemoSqsQueue.Arn
138+
# BatchSize: 100
139+
# MaximumBatchingWindowInSeconds: 60
140+
141+
DemoSQSParallelConsumerFunction:
142+
Type: AWS::Serverless::Function
143+
Properties:
144+
Tracing: Active
145+
CodeUri: ../..
146+
Handler: org.demo.batch.sqs.SqsParallelBatchHandler::handleRequest
147+
Environment:
148+
Variables:
149+
POWERTOOLS_SERVICE_NAME: sqs-demo
150+
BUCKET: !Ref Bucket
104151
Policies:
105152
- Statement:
106153
- Sid: SQSDeleteGetAttribute
@@ -121,13 +168,19 @@ Resources:
121168
- kms:GenerateDataKey
122169
- kms:Decrypt
123170
Resource: !GetAtt CustomerKey.Arn
171+
- Sid: WriteToS3
172+
Effect: Allow
173+
Action:
174+
- s3:PutObject
175+
Resource: !Sub ${Bucket.Arn}/*
176+
124177
Events:
125178
MySQSEvent:
126179
Type: SQS
127180
Properties:
128181
Queue: !GetAtt DemoSqsQueue.Arn
129-
BatchSize: 2
130-
MaximumBatchingWindowInSeconds: 300
182+
BatchSize: 100
183+
MaximumBatchingWindowInSeconds: 60
131184

132185
Outputs:
133186
DemoSqsQueue:

examples/powertools-examples-batch/pom.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,17 @@
4242
<groupId>software.amazon.awssdk</groupId>
4343
<artifactId>sdk-core</artifactId>
4444
<version>${sdk.version}</version>
45+
<exclusions>
46+
<exclusion>
47+
<groupId>org.slf4j</groupId>
48+
<artifactId>slf4j-api</artifactId>
49+
</exclusion>
50+
</exclusions>
51+
</dependency>
52+
<dependency>
53+
<groupId>software.amazon.awssdk</groupId>
54+
<artifactId>s3</artifactId>
55+
<version>${sdk.version}</version>
4556
</dependency>
4657
<dependency>
4758
<groupId>software.amazon.awssdk</groupId>
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
/*
2+
* Copyright 2024 Amazon.com, Inc. or its affiliates.
3+
* Licensed under the Apache License, Version 2.0 (the
4+
* "License"); you may not use this file except in compliance
5+
* with the License. You may obtain a copy of the License at
6+
* http://www.apache.org/licenses/LICENSE-2.0
7+
* Unless required by applicable law or agreed to in writing, software
8+
* distributed under the License is distributed on an "AS IS" BASIS,
9+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
* See the License for the specific language governing permissions and
11+
* limitations under the License.
12+
*
13+
*/
14+
15+
package org.demo.batch.sqs;
16+
17+
import com.amazonaws.services.lambda.runtime.Context;
18+
import com.fasterxml.jackson.databind.ObjectMapper;
19+
import java.io.File;
20+
import java.io.IOException;
21+
import java.util.Arrays;
22+
import java.util.Random;
23+
import org.demo.batch.model.Product;
24+
import org.slf4j.Logger;
25+
import org.slf4j.LoggerFactory;
26+
import org.slf4j.MDC;
27+
import software.amazon.awssdk.core.sync.RequestBody;
28+
import software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient;
29+
import software.amazon.awssdk.services.s3.S3Client;
30+
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
31+
import software.amazon.lambda.powertools.logging.Logging;
32+
import software.amazon.lambda.powertools.tracing.Tracing;
33+
import software.amazon.lambda.powertools.tracing.TracingUtils;
34+
35+
public class AbstractSqsBatchHandler {
36+
private static final Logger LOGGER = LoggerFactory.getLogger(AbstractSqsBatchHandler.class);
37+
private final ObjectMapper mapper = new ObjectMapper();
38+
private final String bucket = System.getenv("BUCKET");
39+
private final S3Client s3 = S3Client.builder().httpClient(UrlConnectionHttpClient.create()).build();
40+
private final Random r = new Random();
41+
42+
/**
43+
* Simulate some processing (I/O + S3 put request)
44+
* @param p deserialized product
45+
* @param context Lambda context
46+
*/
47+
@Logging
48+
@Tracing
49+
protected void processMessage(Product p, Context context) {
50+
TracingUtils.putAnnotation("productId", p.getId());
51+
TracingUtils.putAnnotation("Thread", Thread.currentThread().getName());
52+
MDC.put("product", String.valueOf(p.getId()));
53+
LOGGER.info("Processing product {}", p);
54+
55+
char c = (char)(r.nextInt(26) + 'a');
56+
char[] chars = new char[1024 * 1000];
57+
Arrays.fill(chars, c);
58+
p.setName(new String(chars));
59+
try {
60+
File file = new File("/tmp/"+p.getId()+".json");
61+
mapper.writeValue(file, p);
62+
s3.putObject(
63+
PutObjectRequest.builder().bucket(bucket).key(p.getId()+".json").build(), RequestBody.fromFile(file));
64+
} catch (IOException e) {
65+
throw new RuntimeException(e);
66+
} finally {
67+
MDC.remove("product");
68+
}
69+
}
70+
}

examples/powertools-examples-batch/src/main/java/org/demo/batch/sqs/SqsBatchHandler.java

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,15 @@
44
import com.amazonaws.services.lambda.runtime.RequestHandler;
55
import com.amazonaws.services.lambda.runtime.events.SQSBatchResponse;
66
import com.amazonaws.services.lambda.runtime.events.SQSEvent;
7+
import org.demo.batch.model.Product;
78
import org.slf4j.Logger;
89
import org.slf4j.LoggerFactory;
9-
import org.demo.batch.model.Product;
1010
import software.amazon.lambda.powertools.batch.BatchMessageHandlerBuilder;
1111
import software.amazon.lambda.powertools.batch.handler.BatchMessageHandler;
12+
import software.amazon.lambda.powertools.logging.Logging;
13+
import software.amazon.lambda.powertools.tracing.Tracing;
1214

13-
public class SqsBatchHandler implements RequestHandler<SQSEvent, SQSBatchResponse> {
15+
public class SqsBatchHandler extends AbstractSqsBatchHandler implements RequestHandler<SQSEvent, SQSBatchResponse> {
1416
private static final Logger LOGGER = LoggerFactory.getLogger(SqsBatchHandler.class);
1517
private final BatchMessageHandler<SQSEvent, SQSBatchResponse> handler;
1618

@@ -20,14 +22,11 @@ public SqsBatchHandler() {
2022
.buildWithMessageHandler(this::processMessage, Product.class);
2123
}
2224

25+
@Logging
26+
@Tracing
2327
@Override
2428
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
29+
LOGGER.info("Processing batch of {} messages", sqsEvent.getRecords().size());
2530
return handler.processBatch(sqsEvent, context);
2631
}
27-
28-
29-
private void processMessage(Product p, Context c) {
30-
LOGGER.info("Processing product " + p);
31-
}
32-
3332
}

0 commit comments

Comments
 (0)