Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Records passed to log processor OnEmit is non-mutable #5219

Closed
MrAlias opened this issue Apr 16, 2024 · 12 comments · Fixed by #5636 · May be fixed by #5469
Closed

Log Records passed to log processor OnEmit is non-mutable #5219

MrAlias opened this issue Apr 16, 2024 · 12 comments · Fixed by #5636 · May be fixed by #5469
Assignees
Labels
area:logs Part of OpenTelemetry logs blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made bug Something isn't working pkg:SDK Related to an SDK package
Milestone

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Apr 16, 2024

Blocked by:

Description

The specification requires that log processors be provided a LogRecordProcessor:

OnEmit

[...]

Parameters:

Our current implementation Defines the Processor interface as such:

type Processor interface {
	// OnEmit is called when a Record is emitted.
	//
	// OnEmit will be called independent of Enabled. Implementations need to
	// validate the arguments themselves before processing.
	//
	// Implementation should not interrupt the record processing
	// if the context is canceled.
	//
	// All retry logic must be contained in this function. The SDK does not
	// implement any retry logic. All errors returned by this function are
	// considered unrecoverable and will be reported to a configured error
	// Handler.
	//
	// Before modifying a Record, the implementation must use Record.Clone
	// to create a copy that shares no state with the original.
	OnEmit(ctx context.Context, record Record) error
/* ... */
}

Any modification made to the record passed to a processor's OnEmit method will not be applied to the subsequent processor. The record is not mutable, it does not comply with the specification requirement.

@MrAlias MrAlias added bug Something isn't working pkg:SDK Related to an SDK package area:logs Part of OpenTelemetry logs labels Apr 16, 2024
@MrAlias MrAlias changed the title Log Records passed to log processors' OnEmit need to mutable Log Records passed to log processor OnEmit is non-mutable Apr 16, 2024
@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

One solution is to have OnEmit accept a reference:

OnEmit(ctx context.Context, record *Record) error

This will mean that every record will need to be allocated to the heap.

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

Another solution is to have OnEmit return a Record

OnEmit(ctx context.Context, record Record) (Record, error)

The returned Record can be the same record passed, or a mutated clone of the Record. That returned value will be passed to the next processor in the SDK's processing chain.

@pellared
Copy link
Member

I could work on this but it would be nice to have #5093 merged first to reduce the amount of conflicts.

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 18, 2024

I could work on this but it would be nice to have #5093 merged first to reduce the amount of conflicts.

Merged

@pellared
Copy link
Member

pellared commented Apr 23, 2024

I remember what was my idea.

It was to use Processor as decorators in order to make modifications. A processor can modify the received record before passing to the next processor:

type mutatingProcessor struct {
  wrapped log.Processor
}

func (mutatingProcessor p) OnEmit(ctx context.Context, record Record) error {
  r.SetBody(log.StringValue("REDACTED"))
  return p.wrapped.OnEmit(ctx, r.Clone())
}

Thanks to it you can have even two "processor" pipelines which do not interfere with each other. So you can have one processor pipeline which e.g. has a processor redacts secrets and wraps a simple processor with stdout exporter and then a second pipeline which uses a batch processor (no redacting) with OTLP exporter:

log.NewLoggerProvider(
  log.WithProcessor(mutatingProcessor{ log.NewSimpleProcessor(stdoutExp) }), // stdout exporter with simple processor, the log records' body are redacted
  log.WithProcessor(log.NewBatchProcessor(otlpExp)), // otlp exporter with a batch processor
)

@MrAlias, can we close this issue? I do not see how this implementation would break the specification and I think it is a safer and more usable design.

@pellared pellared added the question Further information is requested label Apr 23, 2024
@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 23, 2024

No, I don't think we can close the issue. We are not compatible with the specification. The specification states the log records are mutable and that processors are executed in order. The processors after a mutation should receive the mutated record.

@pellared
Copy link
Member

The processors after a mutation should receive the mutated record.

Where is it defined?

What do you think about updating the specification?

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 23, 2024

Where is it defined?

In the definition of a read/write record.

@pellared
Copy link
Member

pellared commented Apr 23, 2024

I think it is "in spirit" of the specification. I would rather ask the Specification SIG if it is an acceptable design and clarify the specification.

@pellared
Copy link
Member

I created open-telemetry/opentelemetry-specification#4010

@pellared pellared added blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made and removed question Further information is requested labels Apr 24, 2024
@pellared
Copy link
Member

pellared commented May 31, 2024

It looks that the TC wants to have a possibility so that the mutated record will be passed to next registered processor without forcing the user to make a decorator.

See: https://github.com/open-telemetry/opentelemetry-specification/pull/4062/files#r1619504122

PS. There is a chance that making OnEmit(ctx context.Context, record *Record) error would not cause any performance overhead in "regular" use case.

I created #5448 that can help us comparing performance consequences between different Processor APIs such us:

  • OnEmit(ctx context.Context, record *Record) error
  • OnEmit(ctx context.Context, record Record) (Record, error)

@pellared
Copy link
Member

pellared added a commit that referenced this issue Jun 3, 2024
Towards #5219

Towards #5054

This benchmarks are supposed
- Validate the the `Processor` interface design from performance
perspective. E.g. they are used to check if a processor that is
modifying a log record is causing an additional heap allocations.
- Benchmark the processors supported by the SDK.

These are "almost-end-to-end" benchmarks (with noopExporter) so that it
checks the performance of the SDK log processing without the actual
exporting part.

```
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
BenchmarkProcessor/Simple-16         	 1990946	       644.6 ns/op	     417 B/op	       1 allocs/op
BenchmarkProcessor/Batch-16          	  835135	      1211 ns/op	     597 B/op	       0 allocs/op
BenchmarkProcessor/ModifyTimestampSimple-16         	 1782510	       644.3 ns/op	     417 B/op	       1 allocs/op
BenchmarkProcessor/ModifyTimestampBatch-16          	  945699	      1222 ns/op	     637 B/op	       0 allocs/op
BenchmarkProcessor/ModifyAttributesSimple-16        	 1570214	       717.6 ns/op	     465 B/op	       2 allocs/op
BenchmarkProcessor/ModifyAttributesBatch-16         	  768399	      1383 ns/op	     653 B/op	       1 allocs/op
```
pellared added a commit that referenced this issue Aug 1, 2024
## What

Change `Processor.OnEmit` methods to accept a record pointer.

## Why

Fixes #5219

This would be specification compliant according to discussions around
open-telemetry/opentelemetry-specification#4067

This is inline of how processors Go span processors works and how log
processors work in other languages.

If the performance (an additional heap allocation during log processing)
would occur to be a significant problem for some users, we have at few
possible solutions:

1. Utilize PGO which may also lead to decreasing heap allocations
(sources:
https://landontclipp.github.io/blog/2023/08/25/profile-guided-optimizations-in-go/#devirtualization,
https://andrewwphillips.github.io/blog/pgo.html). Currently it does not
but I expect it may change in future.
2. Improve the Go compilers escape analysis (related to previous point)
3. introduce new "chaining processor" which can be also desirable in
other languages

## Benchstat

`old` is from `main`.
`new` is from current branch.

`new-pgo` is from current branch with PGO optimization. I first run
benchmarks to generate a CPU profile using `go test -run=^$ -bench=.
-count=10 -cpuprofile default.pgo` and then I rerun the tests with PGO.
Currently, the number of heap allocations is the same.

```
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/log
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                                  │    old.txt    │                new.txt                │              new-pgo.txt              │
                                  │    sec/op     │    sec/op     vs base                 │    sec/op     vs base                 │
BatchProcessorOnEmit-16              402.7n ± 18%   382.0n ±  7%         ~ (p=0.247 n=10)   376.7n ± 14%         ~ (p=0.210 n=10)
Processor/Simple-16                  350.9n ±  9%   782.5n ±  6%  +123.00% (p=0.000 n=10)   755.1n ±  5%  +115.19% (p=0.000 n=10)
Processor/Batch-16                   1.333µ ± 15%   1.497µ ± 11%   +12.27% (p=0.000 n=10)   1.528µ ±  8%   +14.63% (p=0.000 n=10)
Processor/SetTimestampSimple-16      329.5n ± 15%   711.6n ±  4%  +115.93% (p=0.000 n=10)   721.9n ±  5%  +119.04% (p=0.000 n=10)
Processor/SetTimestampBatch-16       1.163µ ±  2%   1.524µ ±  3%   +31.03% (p=0.000 n=10)   1.461µ ±  5%   +25.57% (p=0.000 n=10)
Processor/AddAttributesSimple-16     408.7n ±  3%   810.1n ±  4%   +98.21% (p=0.000 n=10)   830.1n ±  4%  +103.11% (p=0.000 n=10)
Processor/AddAttributesBatch-16      1.270µ ±  2%   1.623µ ±  4%   +27.71% (p=0.000 n=10)   1.597µ ±  7%   +25.66% (p=0.000 n=10)
Processor/SetAttributesSimple-16     436.2n ± 10%   796.1n ±  3%   +82.50% (p=0.000 n=10)   817.6n ±  4%   +87.43% (p=0.000 n=10)
Processor/SetAttributesBatch-16      1.202µ ±  2%   1.552µ ±  2%   +29.06% (p=0.000 n=10)   1.659µ ± 11%   +37.96% (p=0.000 n=10)
LoggerNewRecord/5_attributes-16      366.6n ±  3%   363.7n ±  7%         ~ (p=0.952 n=10)   426.2n ±  7%   +16.27% (p=0.000 n=10)
LoggerNewRecord/10_attributes-16     1.711µ ±  2%   1.909µ ± 18%   +11.57% (p=0.000 n=10)   2.077µ ± 10%   +21.39% (p=0.000 n=10)
LoggerProviderLogger-16              650.1n ±  4%   690.1n ±  8%    +6.15% (p=0.019 n=10)   737.6n ± 13%   +13.47% (p=0.004 n=10)
WalkAttributes/1_attributes-16       5.264n ± 12%   5.510n ±  8%         ~ (p=0.812 n=10)   5.865n ±  5%   +11.41% (p=0.011 n=10)
WalkAttributes/10_attributes-16      5.440n ±  8%   5.881n ±  7%    +8.12% (p=0.004 n=10)   6.104n ±  7%   +12.21% (p=0.005 n=10)
WalkAttributes/100_attributes-16     5.403n ±  9%   5.894n ±  9%    +9.10% (p=0.029 n=10)   5.783n ±  6%         ~ (p=0.052 n=10)
WalkAttributes/1000_attributes-16    5.196n ±  4%   5.860n ±  8%   +12.79% (p=0.000 n=10)   5.981n ± 13%   +15.13% (p=0.002 n=10)
SetAddAttributes/SetAttributes-16    181.2n ± 14%   208.1n ± 12%   +14.85% (p=0.005 n=10)   209.9n ± 11%   +15.87% (p=0.007 n=10)
SetAddAttributes/AddAttributes-16    156.7n ± 14%   161.1n ± 16%         ~ (p=0.190 n=10)   165.5n ± 15%         ~ (p=0.315 n=10)
SimpleProcessorOnEmit-16            11.775n ± 10%   9.027n ± 17%   -23.33% (p=0.000 n=10)   9.389n ± 18%   -20.26% (p=0.002 n=10)
geomean                              169.1n         209.6n         +23.98%                  215.5n         +27.48%

                        │     old.txt     │               new.txt                │              new-pgo.txt              │
                        │       B/s       │     B/s       vs base                │      B/s       vs base                │
BatchProcessorOnEmit-16   1004.39Mi ± 15%   79.88Mi ± 7%  -92.05% (p=0.000 n=10)   81.06Mi ± 12%  -91.93% (p=0.000 n=10)

                                  │   old.txt    │                new.txt                 │               new-pgo.txt               │
                                  │     B/op     │    B/op      vs base                   │     B/op      vs base                   │
BatchProcessorOnEmit-16             0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
Processor/Simple-16                   0.0 ± 0%      417.0 ± 0%         ? (p=0.000 n=10)      417.0 ±  0%         ? (p=0.000 n=10)
Processor/Batch-16                  621.5 ± 2%     1057.5 ± 1%   +70.15% (p=0.000 n=10)     1064.5 ±  1%   +71.28% (p=0.000 n=10)
Processor/SetTimestampSimple-16       0.0 ± 0%      417.0 ± 0%         ? (p=0.000 n=10)      418.0 ±  0%         ? (p=0.000 n=10)
Processor/SetTimestampBatch-16      626.5 ± 3%     1049.5 ± 1%   +67.52% (p=0.000 n=10)     1057.5 ±  2%   +68.79% (p=0.000 n=10)
Processor/AddAttributesSimple-16      0.0 ± 0%      417.0 ± 0%         ? (p=0.000 n=10)      417.0 ±  0%         ? (p=0.000 n=10)
Processor/AddAttributesBatch-16     616.5 ± 3%     1053.0 ± 2%   +70.80% (p=0.000 n=10)     1048.5 ±  2%   +70.07% (p=0.000 n=10)
Processor/SetAttributesSimple-16    48.00 ± 0%     466.00 ± 0%  +870.83% (p=0.000 n=10)     466.00 ±  0%  +870.83% (p=0.000 n=10)
Processor/SetAttributesBatch-16     648.0 ± 3%     1089.5 ± 1%   +68.13% (p=0.000 n=10)     1087.5 ±  4%   +67.82% (p=0.000 n=10)
LoggerNewRecord/5_attributes-16     0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
LoggerNewRecord/10_attributes-16    610.0 ± 0%      610.0 ± 0%         ~ (p=1.000 n=10) ¹    610.0 ±  0%         ~ (p=1.000 n=10) ¹
LoggerProviderLogger-16             354.5 ± 6%      368.0 ± 7%         ~ (p=0.288 n=10)      391.0 ± 29%         ~ (p=0.239 n=10)
WalkAttributes/1_attributes-16      0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
WalkAttributes/10_attributes-16     0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
WalkAttributes/100_attributes-16    0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
WalkAttributes/1000_attributes-16   0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
SetAddAttributes/SetAttributes-16   48.00 ± 0%      48.00 ± 0%         ~ (p=1.000 n=10) ¹    48.00 ±  0%         ~ (p=1.000 n=10) ¹
SetAddAttributes/AddAttributes-16   0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
SimpleProcessorOnEmit-16            0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹    0.000 ±  0%         ~ (p=1.000 n=10) ¹
geomean                                        ²                ?                       ²                 ?                       ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                  │   old.txt    │                new.txt                │              new-pgo.txt              │
                                  │  allocs/op   │ allocs/op   vs base                   │ allocs/op   vs base                   │
BatchProcessorOnEmit-16             0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
Processor/Simple-16                 0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/Batch-16                  0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/SetTimestampSimple-16     0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/SetTimestampBatch-16      0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/AddAttributesSimple-16    0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/AddAttributesBatch-16     0.000 ± 0%     1.000 ± 0%         ? (p=0.000 n=10)     1.000 ± 0%         ? (p=0.000 n=10)
Processor/SetAttributesSimple-16    1.000 ± 0%     2.000 ± 0%  +100.00% (p=0.000 n=10)     2.000 ± 0%  +100.00% (p=0.000 n=10)
Processor/SetAttributesBatch-16     1.000 ± 0%     2.000 ± 0%  +100.00% (p=0.000 n=10)     2.000 ± 0%  +100.00% (p=0.000 n=10)
LoggerNewRecord/5_attributes-16     0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
LoggerNewRecord/10_attributes-16    4.000 ± 0%     4.000 ± 0%         ~ (p=1.000 n=10) ¹   4.000 ± 0%         ~ (p=1.000 n=10) ¹
LoggerProviderLogger-16             1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹   1.000 ± 0%         ~ (p=1.000 n=10) ¹
WalkAttributes/1_attributes-16      0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
WalkAttributes/10_attributes-16     0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
WalkAttributes/100_attributes-16    0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
WalkAttributes/1000_attributes-16   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
SetAddAttributes/SetAttributes-16   1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹   1.000 ± 0%         ~ (p=1.000 n=10) ¹
SetAddAttributes/AddAttributes-16   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
SimpleProcessorOnEmit-16            0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹   0.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                                        ²               ?                       ²               ?                       ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
@MrAlias MrAlias added this to the v1.29.0 milestone Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Part of OpenTelemetry logs blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made bug Something isn't working pkg:SDK Related to an SDK package
Projects
Status: Done
2 participants