Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.RuntimeException: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. #96

Open
jay7x opened this issue Jun 12, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@jay7x
Copy link
Contributor

jay7x commented Jun 12, 2020

Yet another funny exception I got today (cluster is different from #95)!
Kafka 2.4.0, kafka-backup built from commit f30b9ad.

[2020-06-12 04:07:31,663] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask:552)
org.apache.kafka.connect.errors.RetriableException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsetsForGroup(OffsetSink.java:60)
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsets(OffsetSink.java:44)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:75)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsetsForGroup(OffsetSink.java:58)
        ... 13 more
Caused by: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
[2020-06-12 04:07:35,521] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Attempt to heartbeat failed since coordinator kafka1.tld:9093 (id: 2147483646 rack: null) is either not started or not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:979)
[2020-06-12 04:07:35,522] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Group coordinator kafka1.tld:9093 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:808)
[2020-06-12 04:07:35,524] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Discovered group coordinator kafka2.tld:9093 (id: 2147483645 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:756)
[2020-06-12 04:07:39,968] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. (org.apache.kafka.connect.runtime.WorkerSinkTask:559)
java.lang.RuntimeException: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:77)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.common.segment.SegmentWriter.append(SegmentWriter.java:72)
        at de.azapps.kafkabackup.common.partition.PartitionWriter.append(PartitionWriter.java:57)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:68)
        ... 11 more
[2020-06-12 04:07:41,021] INFO WorkerSinkTask{id=chrono_prod-backup-sink-0} Committing offsets synchronously using sequence number 2950: [**SKIPPED**]
[2020-06-12 04:07:41,063] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:179)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:561)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:77)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        ... 10 more
Caused by: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.common.segment.SegmentWriter.append(SegmentWriter.java:72)
        at de.azapps.kafkabackup.common.partition.PartitionWriter.append(PartitionWriter.java:57)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:68)
        ... 11 more
[2020-06-12 04:07:41,064] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:180)
[2020-06-12 04:07:41,080] INFO Stopped BackupSinkTask (de.azapps.kafkabackup.sink.BackupSinkTask:139)
[2020-06-12 04:07:41,082] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Revoke previously assigned partitions [**SKIPPED**]
@jay7x
Copy link
Contributor Author

jay7x commented Jun 12, 2020

Fortunately resolved automagically after restart! 🤞

@jay7x
Copy link
Contributor Author

jay7x commented Jun 18, 2020

JFYI, I hit this again.. monitoring helps to catch this early but anyway I should do restart manually :(

@itadventurer
Copy link
Owner

Hmm… how does monitoring help you to catch this early? Which metrics indicate a failure?

@itadventurer itadventurer reopened this Jun 18, 2020
@jay7x
Copy link
Contributor Author

jay7x commented Jun 19, 2020

It was just lag of backup sink consumer group.. I still had no chance to inject jmx_exporter yet

@jay7x
Copy link
Contributor Author

jay7x commented Jun 21, 2020

Got it again :(

@jay7x
Copy link
Contributor Author

jay7x commented Jun 25, 2020

Today found it with a bit more info. It happens after coordinator change in kafka cluster:

[2020-06-25 04:10:54,718] INFO [AdminClient clientId=adminclient-2] Node kafka2.tld:9093 (id: 2 rack: null) is no longer the Coordinator. Retrying with new coordinator. (org.apache.kafka.clients.admin.KafkaAdminClient:2648)
[2020-06-25 04:10:54,727] INFO [AdminClient clientId=adminclient-2] Node kafka2.tld:9093 (id: 2 rack: null) is no longer the Coordinator. Retrying with new coordinator. (org.apache.kafka.clients.admin.KafkaAdminClient:2648)
[2020-06-25 04:10:54,732] INFO [AdminClient clientId=adminclient-2] Node kafka2.tld:9093 (id: 2 rack: null) is no longer the Coordinator. Retrying with new coordinator. (org.apache.kafka.clients.admin.KafkaAdminClient:2648)
[2020-06-25 04:10:55,260] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask:552)
org.apache.kafka.connect.errors.RetriableException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsetsForGroup(OffsetSink.java:60)
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsets(OffsetSink.java:44)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:75)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
        at de.azapps.kafkabackup.common.offset.OffsetSink.syncOffsetsForGroup(OffsetSink.java:58)
        ... 13 more
Caused by: org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The coordinator is loading and hence can't process requests.
[2020-06-25 04:10:58,202] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. (org.apache.kafka.connect.runtime.WorkerSinkTask:559)
java.lang.RuntimeException: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:77)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.common.segment.SegmentWriter.append(SegmentWriter.java:72)
        at de.azapps.kafkabackup.common.partition.PartitionWriter.append(PartitionWriter.java:57)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:68)
        ... 11 more
[2020-06-25 04:11:00,115] INFO WorkerSinkTask{id=chrono_prod-backup-sink-0} Committing offsets synchronously using sequence number 2935: {[***EDITED***]} (org.apache.kafka.connect.runtime.WorkerSinkTask:331)
[2020-06-25 04:11:00,126] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:179)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:561)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:77)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
        ... 10 more
Caused by: de.azapps.kafkabackup.common.segment.SegmentWriter$SegmentException: Trying to override a written record. There is something terribly wrong in your setup! Please check whether you are trying to override an existing backup
        at de.azapps.kafkabackup.common.segment.SegmentWriter.append(SegmentWriter.java:72)
        at de.azapps.kafkabackup.common.partition.PartitionWriter.append(PartitionWriter.java:57)
        at de.azapps.kafkabackup.sink.BackupSinkTask.put(BackupSinkTask.java:68)
        ... 11 more
[2020-06-25 04:11:00,126] ERROR WorkerSinkTask{id=chrono_prod-backup-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:180)
[2020-06-25 04:11:00,142] INFO Stopped BackupSinkTask (de.azapps.kafkabackup.sink.BackupSinkTask:139)
[2020-06-25 04:11:00,146] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Revoke previously assigned partitions [***EDITED***] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:286)
[2020-06-25 04:11:00,161] INFO [Consumer clientId=connector-consumer-chrono_prod-backup-sink-0, groupId=connect-chrono_prod-backup-sink] Member connector-consumer-chrono_prod-backup-sink-0-9204f74c-0cdd-4880-890d-088650147e2b sending LeaveGroup request to coordinator kafka1.tld:9093 (id: 2147483646 rack: null) due to the consumer is being closed (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:916)

@itadventurer itadventurer added the bug Something isn't working label Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants