chain validation and fix command #198

iuwqyir · 2025-05-16T07:27:19Z

TL;DR

Added a new validate command to the indexer that verifies and fixes data integrity issues in the blockchain data.

What changed?

Added a new validate command to the CLI that performs data validation and repair
Implemented three validation functions:
- FindAndRemoveDuplicates: Identifies and removes duplicate blocks, transactions, and logs
- FindAndFixGaps: Detects and fills in missing blocks in a specified range
- ValidateAndFixBlocks: Verifies block integrity by checking transaction counts, logs bloom filters, and transaction roots
Updated trace-related data structures to use proper types (int64 for trace addresses, uint64 for gas values)
Added support for filtering by block numbers when querying aggregates

How to test?

Run the validate command with optional batch size and fix batch size parameters:

# Validate with default batch size (1000)
./indexer validate

# Validate with custom batch size
./indexer validate 500

# Validate with custom batch size and fix batch size
./indexer validate 500 100

Why make this change?

This validation tool helps maintain data integrity in the blockchain indexer by:

Detecting and removing duplicate entries that could cause inconsistencies
Identifying and filling gaps in block sequences
Verifying the correctness of stored block data by recalculating and comparing cryptographic roots
Providing a mechanism to automatically fix corrupted or incorrect data

These validation capabilities are essential for ensuring the reliability and accuracy of the indexed blockchain data.

Summary by CodeRabbit

New Features
- Introduced a new CLI command for validating blockchain data, allowing users to process and correct data in customizable batches.
- Added advanced validation and correction mechanisms to detect and fix duplicate entries and gaps in blockchain data.
- Implemented detailed block integrity checks, including transaction count, logs bloom, and transaction root verification.
- Enhanced progress tracking for validation operations.
- Added persistent cursor tracking to resume validation progress across sessions.
- Enabled filtering of aggregation queries by block numbers for more precise data analysis.
- Improved data fetching with concurrent retrieval of blocks, transactions, and logs for validation.
- Added synchronous chain ID initialization to RPC client setup for improved reliability.
- Introduced warning logs when contract ABI parsing fails to aid troubleshooting.
Bug Fixes
- Improved handling and filtering of gas and trace address data for better consistency and reliability.
- Ensured consistent reads from ClickHouse by enforcing final data reads in queries.
Dependency Updates
- Upgraded multiple dependencies to newer versions for improved performance and compatibility.

iuwqyir · 2025-05-16T07:27:34Z

chain validation and fix command #198 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

internal/validation/duplicates.go

internal/validation/gaps.go

internal/validation/validation.go

internal/validation/cursor.go

cmd/validate.go

socket-security · 2025-05-16T13:39:37Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	github.com/ethereum/go-ethereum@v1.14.8 ⏵ v1.15.11		⁺²			⁺¹⁰
	github.com/holiman/uint256@v1.3.1 ⏵ v1.3.2

View full report

coderabbitai · 2025-05-22T17:21:10Z

Walkthrough

This update introduces a new validate subcommand to the CLI, enabling batch validation and correction of blockchain data stored in ClickHouse. It adds new internal packages for cursor management, data retrieval, duplicate removal, gap detection, and comprehensive block validation. Several dependencies are upgraded, and some data structure field types are adjusted for consistency.

Changes

File(s) / Path(s)	Change Summary
`cmd/root.go`, `cmd/validate.go`	Added a new `validate` subcommand to the CLI, with logic for batch validation and optional fixing of blockchain data. Introduced command-line arguments for batch sizes and implemented the `RunValidate` function.
`go.mod`	Updated Go version and upgraded multiple dependencies, including Ethereum, cryptography, and database libraries. Added several new indirect dependencies.
`internal/common/trace.go`, `internal/rpc/serializer.go`	Changed field types for gas values and trace addresses from `*big.Int`/`[]uint64` to `uint64`/`[]int64` for consistency and simplicity. Adjusted serialization logic accordingly.
`internal/storage/clickhouse.go`	Updated trace insertions to use new gas field types. Enhanced aggregation queries with block number filtering and added the `FINAL` modifier to staging data reads.
`internal/validation/cursor.go`	Introduced a `Cursor` type and related functions to track and persist validation progress per chain.
`internal/validation/db.go`	Added concurrent retrieval of blocks, logs, and transactions from ClickHouse for a given range and chain, organizing them into composite data structures.
`internal/validation/duplicates.go`	Added logic to detect and remove duplicate blocks, transactions, and logs in ClickHouse within a block range, including batching for large sets.
`internal/validation/gaps.go`	Added functionality to detect and fill gaps in block data by identifying missing blocks and repopulating them using the orchestrator and staging storage.
`internal/validation/validation.go`	Implemented block validation and correction routines, including verification of transaction count, logs bloom, and transaction root. Added conversion utilities for logs and transactions.
`internal/rpc/rpc.go`	Modified RPC client initialization to synchronously set and verify the chain ID, returning errors if unsuccessful.
`internal/common/abi.go`	Added warning log on contract ABI parsing failure with error details, contract address, and chain ID.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (validateCmd)
    participant RPC Client
    participant Storage
    participant ClickHouse
    participant Validation

    User->>CLI (validateCmd): Run `validate` with batch/fix args
    CLI (validateCmd)->>RPC Client: Initialize RPC connection
    CLI (validateCmd)->>Storage: Initialize storage connection
    CLI (validateCmd)->>ClickHouse: Connect with optimized settings
    CLI (validateCmd)->>Validation: Initialize Cursor (get last scanned/max block)
    loop For each batch in block range
        Validation->>ClickHouse: Remove duplicates in range
        Validation->>ClickHouse: Detect and fix gaps in range
        Validation->>ClickHouse: Fetch block data for range
        Validation->>Validation: Validate blocks (logs bloom, tx root, etc.)
        alt Invalid blocks found
            Validation->>RPC Client: Repoll invalid blocks
            Validation->>Storage: Stage and commit fixed blocks
        end
        Validation->>Validation: Update cursor with last validated block
    end
    CLI (validateCmd)->>User: Output validation/fix results

Suggested reviewers

nischitpra
catalyst17

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 7

🔭 Outside diff range comments (1)

internal/common/trace.go (1)
60-78: ⚠️ Potential issue

BlockTimestamp is never serialized

TraceModel includes a BlockTimestamp field (uint64) but Serialize() omits it. Downstream consumers will receive zero values.
-		BlockHash:        t.BlockHash,
+		BlockHash:        t.BlockHash,
+		BlockTimestamp:   uint64(t.BlockTimestamp.Unix()),
Please verify the desired unit (seconds vs milliseconds) and update tests accordingly.

♻️ Duplicate comments (2)

internal/validation/cursor.go (1)
37-50: Consider making the cursor file location configurable.

The current implementation writes cursor files to the working directory, which might be problematic in containerized environments with restricted filesystem access.
+// CursorFilePath is the directory where cursor files are stored
+var CursorFilePath string = "."
+
 func (c *Cursor) Update(blockNumber *big.Int) error {
-	cursorFile := fmt.Sprintf("validation_cursor_%s.json", c.ChainId.String())
+	cursorFile := fmt.Sprintf("%s/validation_cursor_%s.json", CursorFilePath, c.ChainId.String())
The same change would need to be applied to the getLastScannedBlock function.
internal/validation/validation.go (1)
163-166: isContractCreation misses the “empty string” case

A contract-creation tx coming from go-ethereum has To == nil, which your DB
may serialise as an empty string. The current check only treats the all-zero
address as creation:
isContractCreation := tx.ToAddress == "0x0000…0000" && tx.Data != "0x"
Consider:
isContractCreation := (tx.ToAddress == "" ||
-	 tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x"
+	 tx.ToAddress == "0x0000000000000000000000000000000000000000") &&
+	 len(tx.Data) > 2 // guards against meaningless “0x”

🧹 Nitpick comments (5)

cmd/validate.go (2)
78-86: Harden TLS configuration

The explicit tls.Config{} inherits Go’s default (TLS 1.2 for clients). For a CLI tool that only talks to a modern ClickHouse server you can safely raise the floor:
-		TLS:      &tls.Config{},
+		TLS: &tls.Config{
+			MinVersion: tls.VersionTLS13,
+		},
Not critical but improves security posture.

50-52: Off-by-one adjustment is opaque

batchSize = batchSize - 1 relies on readers knowing the inclusive semantics later. Consider renaming the variable or adding a short comment to avoid future errors.
// range end is inclusive, adjust once so we fetch exactly N blocks
batchSizeInclusive := new(big.Int).Sub(batchSize, big.NewInt(1))
internal/validation/validation.go (1)
46-54: *Misleading debug message when printing big.Int values

log.Debug().Msgf("Processing batch of blocks %d to %d", …) prints the decimal
representation of the pointers, not the block numbers, because %d formats
a numeric value, not an object implementing Stringer.
Prefer %s (or %v) to leverage (*big.Int).String():
- log.Debug().Msgf("Processing batch of blocks %d to %d", invalidBlocks[i], invalidBlocks[end-1])
+ log.Debug().Msgf("Processing batch of blocks %s to %s",
+ 	invalidBlocks[i].String(), invalidBlocks[end-1].String())
internal/validation/db.go (1)
120-124: Potential goroutine leak if one query errors early

Each goroutine writes exactly once to its channel, but if an early return
occurs before the send, the receiver blocks forever. Although you currently
return only after sending, future edits could break the guarantee.

A safer pattern is to close the channel once the value is sent or use a
sync.WaitGroup.
Example with a WaitGroup:
var (
   blocksResult blockResult
   …
)
wg.Add(3)
go func() { defer wg.Done(); blocksResult.blocks, blocksResult.err = … }()
…
wg.Wait()
internal/validation/duplicates.go (1)

161-199: removeDuplicateTransactions may exceed ClickHouse parameter limits

The batched deletion is great, but each loop aggregates all hashes in one
IN clause. If a batch nears 1000 items, ClickHouse can hit the default
max_query_size. Consider:

Lowering batchSize (e.g., 100).

Or chunking hashes too: execute the same query multiple times with
smaller slices.

This keeps memory and network utilisation predictable.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f520cfc and 0ffae31.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (11)

cmd/root.go (1 hunks)
cmd/validate.go (1 hunks)
go.mod (5 hunks)
internal/common/trace.go (3 hunks)
internal/rpc/serializer.go (2 hunks)
internal/storage/clickhouse.go (4 hunks)
internal/validation/cursor.go (1 hunks)
internal/validation/db.go (1 hunks)
internal/validation/duplicates.go (1 hunks)
internal/validation/gaps.go (1 hunks)
internal/validation/validation.go (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

internal/validation/cursor.go (1)

internal/storage/connector.go (1)

IStorage (67-71)

🪛 ast-grep (0.38.1)

cmd/validate.go

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🪛 golangci-lint (1.64.8)

cmd/validate.go

15-15: could not import github.com/thirdweb-dev/indexer/internal/validation (-: # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int))

(typecheck)

internal/validation/cursor.go

1-1: : # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)

(typecheck)

🪛 GitHub Check: Test

internal/validation/validation.go

[failure] 56-56:
not enough arguments in call to poller.Poll

internal/validation/gaps.go

[failure] 27-27:
not enough arguments in call to poller.Poll

🪛 GitHub Actions: Go Unit Tests

internal/validation/gaps.go

[error] 27-27: not enough arguments in call to poller.Poll

🔇 Additional comments (12)

internal/validation/cursor.go (3)

12-16: Well-structured cursor type for tracking validation progress.

The Cursor struct provides a good foundation for tracking blockchain validation progress with appropriate fields for maintaining state across validation runs.

18-35: LGTM - Good error handling and initialization logic.

The InitCursor function properly initializes a cursor by retrieving the last scanned block and ensuring it doesn't exceed the maximum block number. The null check on line 24 for maxBlockNumber provides good defensive programming.

52-72: Good error handling for file operations.

The implementation properly handles cases where the cursor file doesn't exist or contains invalid data by defaulting to block 0, allowing the validation to start from the beginning.
go.mod (3)

24-117: Numerous indirect dependency updates.

The PR includes many indirect dependency updates, including new CockroachDB packages, blockchain cryptography libraries, and Go standard library updates. These support the new validation functionality while keeping dependencies current.

7-10:
❓ Verification inconclusive

Significant dependency update for core Ethereum library.

The go-ethereum library has been updated from v1.14.8 to v1.15.11, which is a substantial version jump. Additionally, holiman/uint256 has been moved from an indirect to a direct dependency, which aligns with the type changes in the serializer.

🏁 Script executed:
#!/bin/bash
# Check for breaking changes in the updated go-ethereum version
rg -A 5 -B 5 "github.com/ethereum/go-ethereum" --type go
Length of output: 7911
Review go-ethereum v1.15.11 & holiman/uint256 Upgrade

The bump from v1.14.8→v1.15.11 spans multiple core packages and could introduce breaking changes to RPC clients, common types, trie handling, etc. We also promoted holiman/uint256 to a direct dependency to match updated serializers.

Please verify:

All imports of github.com/ethereum/go-ethereum (shown below) still compile and behave as expected.

No changed function signatures or removed APIs in v1.15.x affect your code paths.

The bump in holiman/uint256 aligns with any type conversions in serializers.

Affected files (imports of go-ethereum packages):

internal/validation/validation.go

internal/rpc/rpc.go

internal/rpc/params.go

internal/rpc/batcher.go

internal/storage/clickhouse.go

internal/middleware/authorization.go

internal/common/transaction.go, abi.go, log.go

internal/handlers/transactions_handlers.go, logs_handlers.go

internal/common/log_test.go, transaction_test.go

Actions:

Review the go-ethereum v1.15.11 release notes for breaking changes.

Run the full test suite (unit + integration) against a node with v1.15.x.

Ensure all ABI/account/rpc calls still serialize/deserialize correctly.

3-3:
❌ Incorrect review comment

Verify Go version compatibility.

The Go version has been updated to 1.23.0, which appears to be a future version since the current stable version at the time of this review is 1.22.x.

🌐 Web query:
What is the latest stable version of Go available in May 2025?
💡 Result:

As of May 22, 2025, the latest stable version of Go is 1.24.3, released on April 30, 2025. (go.dev) This release includes security fixes to the os package, as well as bug fixes to the runtime, compiler, linker, go command, and the crypto/tls and os packages. (golang.es)

Citations:

1: https://go.dev/VERSION?utm_source=openai

2: https://golang.es/doc/devel/release?utm_source=openai
Incorrect Go version assumption in go.mod

The requirement go 1.23.0 is not a future release. As of May 22, 2025 the latest stable Go is 1.24.3 (released April 30, 2025), and Go 1.23.0 was already available prior to that. Your go.mod entry is valid.

If you need fixes or features from the newest release, you may optionally bump to:
- go 1.23.0
+ go 1.24
• File: go.mod (line 3)

Likely an incorrect or invalid review comment.
cmd/root.go (1)

201-201: Good integration of the new validation command.

The new validateCmd is properly added to the root command, following the same pattern as existing commands.
internal/rpc/serializer.go (2)

382-383: Type optimization for gas values.

Changing from big.Int to uint64 for gas values is a good optimization as Ethereum gas values typically fit comfortably within uint64 range, reducing memory usage and improving performance.

402-411:
✅ Verification successful

Type change for trace addresses.

The type change from uint64 to int64 for trace addresses indicates a design decision to allow for potential negative indices or to align with other systems' expectations.

🏁 Script executed:
#!/bin/bash
# Check if there are corresponding changes in other files related to trace address types
rg -A 3 -B 3 "TraceAddress.*\[\]int64" --type go
Length of output: 1593
serializeTraceAddress type change is consistent

serializeTraceAddress now returns []int64, which matches the TraceAddress field defined as []int64 in internal/common/trace.go. No further updates are required.

internal/common/trace.go: TraceAddress []int64 confirmed.
internal/storage/clickhouse.go (2)

375-377: Gas fields handled correctly after type change – good job

trace.Gas and trace.GasUsed are now inserted directly as uint64, matching the new struct definition and saving an unnecessary allocation / dereference.
No further action needed.

936-938: FINAL modifier ensures consistent reads

Adding FINAL to the staging-data query is a good call – it prevents race conditions with background merges in ClickHouse.
👍

internal/common/trace.go (1)

16-24: Type migration looks consistent

Switching TraceAddress to []int64 and gas fields to uint64 simplifies marshaling and DB inserts. Just ensure callers no longer rely on signedness of TraceAddress (previously uint64).

internal/storage/clickhouse.go

internal/validation/gaps.go

coderabbitai · 2025-05-22T17:25:53Z

internal/validation/gaps.go

+	if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {
+		log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData)
+	}
+	return nil


🛠️ Refactor suggestion

Errors ignored after staging-data cleanup

DeleteStagingData errors are logged but not propagated. If the cleanup fails the validation job silently continues, leaving orphaned rows and possibly breaking idempotency.

Consider returning the error to the caller so the outer command can decide whether to abort:

- if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { - log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData) - } + if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { + return fmt.Errorf("failed to delete staging data: %w", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {

log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData)

}

return nil

if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {

return fmt.Errorf("failed to delete staging data: %w", err)

}

return nil

🤖 Prompt for AI Agents

In internal/validation/gaps.go around lines 43 to 46, the error returned by DeleteStagingData is logged but not propagated, causing the validation job to continue silently on failure. Modify the code to return the error from DeleteStagingData instead of just logging it, so the caller can handle the failure appropriately and maintain idempotency.

coderabbitai · 2025-05-22T17:25:53Z

internal/validation/validation.go

+			chainID, _ := uint256.FromBig(tx.ChainId)
+			gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
+			gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas)
+			value, _ := uint256.FromBig(tx.Value)
+			blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas)
+			v, _ := uint256.FromBig(tx.V)
+			r, _ := uint256.FromBig(tx.R)
+			s, _ := uint256.FromBig(tx.S)
+


💡 Verification agent

❓ Verification inconclusive

Nil-pointer risk when converting big.Int → uint256

uint256.FromBig panics if the pointer is nil. Fields such as
MaxPriorityFeePerGas, MaxFeePerBlobGas, Value, R, S, etc. can be
NULL in the DB for older blocks or special transactions.

Add pre-checks or default zero values to avoid runtime panics:

func safeUint256(b *big.Int) *uint256.Int { if b == nil { return new(uint256.Int) } out, _ := uint256.FromBig(b) return out }

…and use safeUint256() at each conversion point.

Also applies to: 332-338

*Handle nil big.Int inputs to uint256.FromBig to prevent panics

The fields tx.MaxPriorityFeePerGas, tx.MaxFeePerBlobGas, tx.Value, tx.R, tx.S, etc. may be nil (e.g. legacy or special transactions). Calling uint256.FromBig(nil) will panic.

Introduce a helper in internal/validation/validation.go:

func safeUint256(b *big.Int) *uint256.Int { if b == nil { return new(uint256.Int) } out, _ := uint256.FromBig(b) return out }

And replace the direct calls at both locations:

• Lines 286–294
• Lines 332–338

Example diff:

- chainID, _ := uint256.FromBig(tx.ChainId) + chainID := safeUint256(tx.ChainId) - gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) + gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) ...and similarly for MaxFeePerGas, Value, MaxFeePerBlobGas, V, R, S...

🤖 Prompt for AI Agents

In internal/validation/validation.go around lines 286 to 294, the code calls uint256.FromBig on fields like tx.MaxPriorityFeePerGas and tx.Value without checking for nil, which can cause panics if these fields are nil. To fix this, define a helper function safeUint256 that returns a zero uint256.Int if the input *big.Int is nil, otherwise calls uint256.FromBig. Replace all direct calls to uint256.FromBig with safeUint256 in this block and also in lines 332 to 338 to safely handle nil inputs and prevent runtime panics.

internal/validation/validation.go

coderabbitai · 2025-05-22T17:25:53Z

internal/validation/duplicates.go

+		query := fmt.Sprintf(`WITH
+			to_be_inserted AS (
+				SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
+					data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign
+				FROM default.logs FINAL
+				WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
+			)
+			INSERT INTO logs (
+				chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
+				data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign
+			) SELECT * from to_be_inserted
+		`, strings.Join(tuples, ","))
+
+		err := conn.Exec(context.Background(), query, chainId, blockNumbers)
+		if err != nil {


🛠️ Refactor suggestion

String-built IN clause risks gigantic queries & SQL-injection

fmt.Sprintf embeds a comma-separated list of tuples directly into the SQL
string.
Drawbacks:

Huge duplicateLogs sets can blow the server’s max query length.

Although values come from the DB, handcrafted strings bypass ClickHouse
query-parameter sanitisation.

Prefer passing an array parameter(s) to the driver:

- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)

Then supply []tuple (ClickHouse supports arrays of tuples) as the third
parameter. If tuple arrays are awkward, batch the deletion with smaller,
parameterised queries like you did for transactions.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In internal/validation/duplicates.go around lines 218 to 232, the SQL query uses fmt.Sprintf to embed a comma-separated list of tuples directly into the IN clause, which risks creating excessively large queries and potential SQL injection issues. To fix this, refactor the query to use parameterized array inputs for the tuples instead of string-building the IN clause. Pass the tuples as an array parameter to the query execution, leveraging ClickHouse's support for arrays of tuples, or alternatively, batch the inserts with smaller parameterized queries to avoid oversized queries and ensure proper sanitization.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

internal/validation/validation.go (2)

286-295: ⚠️ Potential issue

*Handle nil big.Int inputs to uint256.FromBig to prevent panics

The code doesn't check for nil values before converting big.Int to uint256.Int, which could cause panics for older blocks or special transactions where some fields may be nil.

Implement a helper function at the beginning of the file:
// safeUint256 converts a *big.Int to *uint256.Int safely, handling nil values
func safeUint256(b *big.Int) *uint256.Int {
    if b == nil {
        return new(uint256.Int)
    }
    out, _ := uint256.FromBig(b)
    return out
}
Then replace all the conversions:
-chainID, _ := uint256.FromBig(tx.ChainId)
-gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
-gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas)
-value, _ := uint256.FromBig(tx.Value)
-blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas)
-v, _ := uint256.FromBig(tx.V)
-r, _ := uint256.FromBig(tx.R)
-s, _ := uint256.FromBig(tx.S)
+chainID := safeUint256(tx.ChainId)
+gasTipCap := safeUint256(tx.MaxPriorityFeePerGas)
+gasFeeCap := safeUint256(tx.MaxFeePerGas)
+value := safeUint256(tx.Value)
+blobFeeCap := safeUint256(tx.MaxFeePerBlobGas)
+v := safeUint256(tx.V)
+r := safeUint256(tx.R)
+s := safeUint256(tx.S)
Also apply the same pattern to lines 332-338.

56-58: ⚠️ Potential issue

Check error return from poller.Poll

The Poll method likely returns an error that is not being checked. This could silently ignore failures in the polling process.
poller := orchestrator.NewBoundlessPoller(rpcClient, s)
-poller.Poll(context.Background(), batch)
+if err := poller.Poll(context.Background(), batch); err != nil {
+    return fmt.Errorf("failed to poll blocks %v: %w", batch, err)
+}
log.Debug().Msgf("Batch of invalid blocks polled: %d to %d", invalidBlocks[i], invalidBlocks[end-1])

internal/validation/duplicates.go (1)

213-229: 🛠️ Refactor suggestion

String-built IN clause risks gigantic queries & SQL-injection

Building the IN clause directly with fmt.Sprintf could lead to oversized queries if there are many tuples, and potentially allows SQL injection vulnerabilities even though the values come from the database.

Consider one of these approaches:

Use ClickHouse array parameters instead:

-tuples := make([]string, 0, len(batch))
+txHashes := make([]string, 0, len(batch))
+logIndices := make([]uint64, 0, len(batch))

for _, duplicateLog := range batch {
  blockNumbers = append(blockNumbers, duplicateLog.BlockNumber)
-  tuples = append(tuples, fmt.Sprintf("('%s', %d)", duplicateLog.TxHash, duplicateLog.LogIndex))
+  txHashes = append(txHashes, duplicateLog.TxHash)
+  logIndices = append(logIndices, duplicateLog.LogIndex)
}

-query := fmt.Sprintf(`WITH
+query := `WITH
  to_be_inserted AS (
    SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
      data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign
    FROM default.logs FINAL
-    WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
+    WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN ZIP(?, ?)
  )
  INSERT INTO logs (
    chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
    data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign
  ) SELECT * from to_be_inserted
-`, strings.Join(tuples, ","))
+`

-err := conn.Exec(context.Background(), query, chainId, blockNumbers)
+err := conn.Exec(context.Background(), query, chainId, blockNumbers, txHashes, logIndices)

🧹 Nitpick comments (3)

cmd/validate.go (1)

19-26: Update the TBD command documentation

The command's Short and Long descriptions are set to "TBD" which should be updated with meaningful descriptions to help users understand the purpose and functionality of the validate command.

var (
	validateCmd = &cobra.Command{
-		Use:   "validate",
-		Short: "TBD",
-		Long:  "TBD",
+		Use:   "validate [batchSize] [fixBatchSize]",
+		Short: "Validates and fixes blockchain data integrity",
+		Long:  "Validates blockchain data integrity by detecting and removing duplicates, filling gaps, and verifying block data. Takes optional batch size (default: 1000) and fix batch size (default: 0) arguments.",
		Run: func(cmd *cobra.Command, args []string) {
			RunValidate(cmd, args)
		},
	}
)

internal/validation/validation.go (2)

164-166: Better contract creation detection

The contract creation detection logic relies on checking for the zero address string and non-empty data, which could be error-prone. Consider using a more robust approach.

-isContractCreation := tx.ToAddress == "0x0000000000000000000000000000000000000000" && tx.Data != "0x"
+// Contract creation transactions have empty "to" field, but in the database they might be
+// stored with the zero address. Check for both empty address and zero address.
+isContractCreation := (tx.ToAddress == "" || tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x"

98-104: Add comments explaining transaction type checking

The code skips transaction root validation for transaction types > 4, but doesn't explain why. Add comments to clarify which transaction types are supported and why others are skipped.

// TODO: remove this once we know how to validate all tx types
for _, tx := range blockData.Transactions {
-    if tx.TransactionType > 4 { // Currently supported types are 0-4
+    // Currently supported transaction types:
+    // 0: Legacy
+    // 1: Access List (EIP-2930)
+    // 2: Dynamic Fee (EIP-1559)
+    // 3: Blob (EIP-4844)
+    // 4: Set Code (EIP-7702)
+    // Skip validation for newer, unsupported transaction types
+    if tx.TransactionType > 4 {
        log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType)
        return nil
    }
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ffae31 and 31e733b.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (11)

cmd/root.go (1 hunks)
cmd/validate.go (1 hunks)
go.mod (5 hunks)
internal/common/trace.go (3 hunks)
internal/rpc/serializer.go (2 hunks)
internal/storage/clickhouse.go (4 hunks)
internal/validation/cursor.go (1 hunks)
internal/validation/db.go (1 hunks)
internal/validation/duplicates.go (1 hunks)
internal/validation/gaps.go (1 hunks)
internal/validation/validation.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

cmd/root.go
internal/storage/clickhouse.go
internal/rpc/serializer.go
internal/common/trace.go
internal/validation/cursor.go
internal/validation/gaps.go
go.mod
internal/validation/db.go

🧰 Additional context used

🪛 ast-grep (0.38.1)

cmd/validate.go

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🪛 golangci-lint (1.64.8)

internal/validation/duplicates.go

33-33: Error return value is not checked

(errcheck)

44-44: Error return value is not checked

(errcheck)

55-55: Error return value is not checked

(errcheck)

coderabbitai · 2025-05-22T17:34:24Z

cmd/validate.go

+	conn, err := clickhouse.Open(&clickhouse.Options{
+		Addr:     []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)},
+		Protocol: clickhouse.Native,
+		TLS:      &tls.Config{},
+		Auth: clickhouse.Auth{
+			Username: config.Cfg.Storage.Main.Clickhouse.Username,
+			Password: config.Cfg.Storage.Main.Clickhouse.Password,
+		},
+		Settings: func() clickhouse.Settings {
+			settings := clickhouse.Settings{
+				"do_not_merge_across_partitions_select_final": "1",
+				"use_skip_indexes_if_final":                   "1",
+				"optimize_move_to_prewhere_if_final":          "1",
+				"async_insert":                                "1",
+				"wait_for_async_insert":                       "1",
+			}
+			return settings
+		}(),
+	})


⚠️ Potential issue

Set TLS MinVersion for better security

The TLS configuration is missing a MinVersion setting, which could allow connections with insecure TLS protocol versions.

TLS: &tls.Config{ + MinVersion: tls.VersionTLS12, },

Ideally, use TLS 1.3 if your server supports it:

TLS: &tls.Config{ + MinVersion: tls.VersionTLS13, },

🧰 Tools

🪛 ast-grep (0.38.1)

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🤖 Prompt for AI Agents

In cmd/validate.go around lines 69 to 87, the TLS configuration for the Clickhouse connection lacks a MinVersion setting, which may allow insecure TLS versions. Update the tls.Config struct to include MinVersion set to tls.VersionTLS13 to enforce using TLS 1.3 if supported by the server, enhancing connection security.

coderabbitai · 2025-05-22T17:34:24Z

cmd/validate.go

+func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error {
+	chainId := rpcClient.GetChainID()
+	err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock)
+	if err != nil {
+		log.Fatal().Err(err).Msg("Failed to find and fix duplicates")
+	}
+
+	err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock)
+	if err != nil {
+		log.Fatal().Err(err).Msg("Failed to find and fix gaps")
+	}
+
+	err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize)
+	if err != nil {
+		log.Fatal().Err(err).Msg("Failed to validate and fix blocks")
+	}
+
+	log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock)
+	return nil


🛠️ Refactor suggestion

Improve error handling in validateAndFixRange

The function calls log.Fatal() which terminates the program immediately without allowing cleanup or graceful shutdown. Consider returning errors instead and handling them at a higher level.

func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { chainId := rpcClient.GetChainID() err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix duplicates") + return fmt.Errorf("failed to find and fix duplicates: %w", err) } err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix gaps") + return fmt.Errorf("failed to find and fix gaps: %w", err) } err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) if err != nil { - log.Fatal().Err(err).Msg("Failed to validate and fix blocks") + return fmt.Errorf("failed to validate and fix blocks: %w", err) } log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) return nil }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cmd/validate.go around lines 115 to 133, the function validateAndFixRange uses log.Fatal() on errors, which abruptly terminates the program. Modify the function to return errors instead of calling log.Fatal(), allowing the caller to handle errors gracefully and perform any necessary cleanup or shutdown procedures.

coderabbitai · 2025-05-22T17:34:25Z

internal/validation/duplicates.go

+		removeDuplicateLogs(conn, chainId, duplicateLogs)
+	}


⚠️ Potential issue

Handle error return value from removeDuplicateLogs

The error return value from removeDuplicateLogs is not checked, which could silently ignore failures in the duplicate log removal process.

log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs) -removeDuplicateLogs(conn, chainId, duplicateLogs) +if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { + return fmt.Errorf("failed to remove duplicate logs: %w", err) +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

removeDuplicateLogs(conn, chainId, duplicateLogs)

}

log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs)

if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {

return fmt.Errorf("failed to remove duplicate logs: %w", err)

}

}

🧰 Tools

🪛 golangci-lint (1.64.8)

55-55: Error return value is not checked

(errcheck)

🤖 Prompt for AI Agents

In internal/validation/duplicates.go around lines 55 to 56, the call to removeDuplicateLogs returns an error that is currently not checked. Modify the code to capture the error returned by removeDuplicateLogs and handle it appropriately, such as logging the error or returning it up the call stack, to ensure failures in the duplicate log removal process are not silently ignored.

coderabbitai · 2025-05-22T17:34:25Z

internal/validation/duplicates.go

+		removeDuplicateTransactions(conn, chainId, duplicateTransactions)
+	}


⚠️ Potential issue

Handle error return value from removeDuplicateTransactions

The error return value from removeDuplicateTransactions is not checked, which could silently ignore failures in the duplicate transaction removal process.

log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions) -removeDuplicateTransactions(conn, chainId, duplicateTransactions) +if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { + return fmt.Errorf("failed to remove duplicate transactions: %w", err) +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

removeDuplicateTransactions(conn, chainId, duplicateTransactions)

}

log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions)

if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {

return fmt.Errorf("failed to remove duplicate transactions: %w", err)

}

}

🧰 Tools

🪛 golangci-lint (1.64.8)

44-44: Error return value is not checked

(errcheck)

🤖 Prompt for AI Agents

In internal/validation/duplicates.go around lines 44 to 45, the call to removeDuplicateTransactions returns an error that is currently ignored. Modify the code to capture this error and handle it appropriately, such as logging the error or returning it up the call stack, to ensure failures in removing duplicate transactions are not silently ignored.

coderabbitai · 2025-05-22T17:34:25Z

internal/validation/duplicates.go

+		removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
+	}


⚠️ Potential issue

Handle error return value from removeDuplicateBlocks

The error return value from removeDuplicateBlocks is not checked, which could silently ignore failures in the duplicate block removal process.

log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers) -removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) +if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { + return fmt.Errorf("failed to remove duplicate blocks: %w", err) +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)

}

log.Debug().Msgf(

"Found %d duplicate blocks in range %v-%v: %v",

len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers,

)

if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {

return fmt.Errorf("failed to remove duplicate blocks: %w", err)

}

}

🧰 Tools

🪛 golangci-lint (1.64.8)

33-33: Error return value is not checked

(errcheck)

🤖 Prompt for AI Agents

In internal/validation/duplicates.go around lines 33 to 34, the call to removeDuplicateBlocks returns an error that is currently ignored. Modify the code to capture this error and handle it appropriately, such as logging the error or returning it up the call stack, to ensure failures in duplicate block removal are not silently ignored.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (8)

cmd/validate.go (2)

69-87: Set TLS MinVersion for better security

The TLS configuration is missing a MinVersion setting, which could allow connections with insecure TLS protocol versions. This was previously identified and should be addressed by adding MinVersion: tls.VersionTLS13 to enhance connection security.

🧰 Tools

🪛 ast-grep (0.38.1)

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

115-133: Improve error handling in validateAndFixRange

The function uses log.Fatal() which terminates the program immediately without allowing cleanup or graceful shutdown. As previously suggested, consider returning errors instead and handling them at a higher level to allow for proper cleanup and graceful error handling.

internal/validation/validation.go (2)

287-294: *Handle nil big.Int inputs to uint256.FromBig to prevent panics

The fields tx.MaxPriorityFeePerGas, tx.MaxFeePerBlobGas, tx.Value, tx.R, tx.S, etc. may be nil for certain transaction types. Calling uint256.FromBig(nil) will panic. A helper function safeUint256 should be introduced to handle nil inputs safely as previously suggested.

332-338: *Handle nil big.Int inputs to uint256.FromBig to prevent panics

Same issue as lines 287-294 - the conversion to uint256 needs null-safety to prevent runtime panics when big.Int fields are nil.
internal/validation/duplicates.go (4)
32-34: Error from removeDuplicateBlocks is silently ignored

The call to removeDuplicateBlocks does not check the returned error.
If the deduplication query fails you will continue as if the data were clean, potentially cascading bad state into gap-filling and validation steps.
-        removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
+        if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {
+            return fmt.Errorf("failed to remove duplicate blocks: %w", err)
+        }
🧰 Tools

🪛 golangci-lint (1.64.8)

33-33: Error return value is not checked

(errcheck)

43-45: Error from removeDuplicateTransactions is silently ignored

Same issue as above for transaction duplicates. Propagate or at least log the failure so that the caller can abort or retry.
-        removeDuplicateTransactions(conn, chainId, duplicateTransactions)
+        if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {
+            return fmt.Errorf("failed to remove duplicate transactions: %w", err)
+        }
🧰 Tools

🪛 golangci-lint (1.64.8)

44-44: Error return value is not checked

(errcheck)

54-56: Error from removeDuplicateLogs is silently ignored

Please handle the error to avoid continuing with an inconsistent database state.
-        removeDuplicateLogs(conn, chainId, duplicateLogs)
+        if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {
+            return fmt.Errorf("failed to remove duplicate logs: %w", err)
+        }
🧰 Tools

🪛 golangci-lint (1.64.8)

55-55: Error return value is not checked

(errcheck)

218-230: String-built tuple list risks gigantic queries & SQL-injection

fmt.Sprintf concatenates every (txHash, logIndex) tuple straight into the SQL.
Drawbacks:

Large duplicate batches can blow ClickHouse’s max query length.

Although data originates from the DB, handcrafted SQL bypasses driver-level sanitisation.

Prefer parameterised arrays or smaller, parameterised batches, e.g.:
- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
+ WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)
Then pass []tuple(String, UInt64) as the third argument, or fall back to the existing 1 000-row batching but build the tuples slice as a proper driver parameter instead of raw SQL.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31e733b and 0433596.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (12)

cmd/root.go (1 hunks)
cmd/validate.go (1 hunks)
go.mod (5 hunks)
internal/common/trace.go (3 hunks)
internal/rpc/rpc.go (1 hunks)
internal/rpc/serializer.go (2 hunks)
internal/storage/clickhouse.go (4 hunks)
internal/validation/cursor.go (1 hunks)
internal/validation/db.go (1 hunks)
internal/validation/duplicates.go (1 hunks)
internal/validation/gaps.go (1 hunks)
internal/validation/validation.go (1 hunks)

✅ Files skipped from review due to trivial changes (1)

cmd/root.go

🚧 Files skipped from review as they are similar to previous changes (6)

internal/validation/gaps.go
internal/common/trace.go
internal/validation/cursor.go
internal/rpc/serializer.go
internal/storage/clickhouse.go
go.mod

🧰 Additional context used

🧬 Code Graph Analysis (1)

internal/validation/db.go (1)

internal/storage/clickhouse.go (1)

ZERO_BYTES_42 (35-35)

🪛 ast-grep (0.38.1)

cmd/validate.go

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🪛 golangci-lint (1.64.8)

internal/validation/duplicates.go

33-33: Error return value is not checked

(errcheck)

44-44: Error return value is not checked

(errcheck)

55-55: Error return value is not checked

(errcheck)

🔇 Additional comments (8)

internal/rpc/rpc.go (1)

111-114: LGTM! Improves consistency between initialization functions.

The addition of chain ID initialization to InitializeSimpleRPCWithUrl ensures consistency with the main Initialize() function and provides the necessary chain ID context for validation workflows.

cmd/validate.go (1)

29-51: LGTM! Well-structured argument parsing and validation.

The command setup with batch size configuration and argument validation is well implemented. The inclusive range adjustment (line 51) is a nice touch for user-friendly batch processing.
internal/validation/validation.go (3)

164-165: LGTM! Improved contract creation handling.

The contract creation detection using isContractCreation := tx.ToAddress == "" and subsequent conditional address handling properly addresses the previous feedback about nil To addresses in contract creation transactions.

118-151: LGTM! Robust logs bloom calculation.

The logs bloom calculation correctly converts internal log structures to go-ethereum types and handles optional topics properly. The use of go-ethereum's CreateBloom ensures correctness.

56-57:
✅ Verification successful

Verify poller.Poll method signature and error handling

The call to poller.Poll(context.Background(), batch) needs verification to ensure it matches the expected signature and properly handles returned errors.

🏁 Script executed:
#!/bin/bash
# Check the Poll method signature in the orchestrator package
ast-grep --pattern 'func $_($$$) Poll($$$) $$$'
Length of output: 49

🏁 Script executed:
#!/bin/bash
# Locate any Go files in the orchestrator package
pkg_files=$(rg -l "^package orchestrator" -n)

# Print out where NewBoundlessPoller is defined
echo "🔍 Searching for NewBoundlessPoller definitions..."
rg -n "func NewBoundlessPoller" -n $pkg_files

# Search for the BoundlessPoller type
echo "🔍 Locating type BoundlessPoller..."
file=$(rg -l "^type BoundlessPoller" -n $pkg_files)
echo "Found BoundlessPoller in: $file"

# Inspect the Poll method signature on BoundlessPoller
echo "🔍 Inspecting Poll method signature..."
rg -n "^func .*BoundlessPoller\\) Poll" -n "$file"
Length of output: 1605

🏁 Script executed:
#!/bin/bash
# Locate the Poller type definition
echo "🔍 Searching for the Poller type in poller.go..."
rg -n "^type Poller" internal/orchestrator/poller.go

# Inspect the Poll method signature on Poller
echo "🔍 Inspecting the Poll method signature..."
rg -n "^func .*Poll" internal/orchestrator/poller.go
Length of output: 1141
Signature verified: Poll returns only a block number, no error to handle

internal/orchestrator/poller.go:149 defines
func (p *Poller) Poll(ctx context.Context, blockNumbers []*big.Int) (lastPolledBlock *big.Int)

There is no error return, so no error handling is required here.

If you need the returned lastPolledBlock, assign it; otherwise discarding the value is acceptable.
internal/validation/db.go (3)

64-153: LGTM! Excellent concurrent data fetching design.

The concurrent fetching of blocks, logs, and transactions using goroutines and channels is well-implemented. The error handling, result organization by block number, and final assembly into BlockData structs demonstrates good engineering practices.

257-259: LGTM! Proper handling of contract creation addresses.

The special case handling for ZERO_BYTES_42 correctly converts the zero-byte representation to an empty string for contract creation transactions, which aligns with the constant defined in internal/storage/clickhouse.go.

156-165: LGTM! Optimized ClickHouse queries.

The use of FINAL clause in the SELECT queries ensures data consistency when reading from ClickHouse tables, which is appropriate for validation operations that need the most recent, deduplicated data.

coderabbitai · 2025-06-02T10:58:27Z

internal/validation/duplicates.go

+	query := `SELECT block_number
+		FROM default.blocks FINAL WHERE chain_id = ? AND block_number >= ? AND block_number <= ?
+		GROUP BY block_number
+		HAVING sum(sign) != 1
+		ORDER BY block_number;
+	`


🛠️ Refactor suggestion

HAVING sum(sign) != 1 may miss certain duplicate scenarios

If rows exist with signs (+1, +1, -1) the sum equals +1, therefore the group is not flagged, yet you still have two live (+1) rows – a duplicate situation.
Using count() > 1 (or abs(sum(sign)) != 1) is safer.

HAVING count() > 1 -- catches any group with more than one live row

Consider adjusting the logic or at least adding a comment clarifying the assumption that “duplicate cancellation” never yields +1 with more than one positive row.

🤖 Prompt for AI Agents

In internal/validation/duplicates.go around lines 62 to 67, the current SQL query uses HAVING sum(sign) != 1, which can miss duplicate cases like (+1, +1, -1) where the sum is 1 but duplicates exist. To fix this, change the HAVING clause to use count() > 1 or abs(sum(sign)) != 1 to correctly identify duplicates. Alternatively, add a comment explaining the assumption behind the current logic if you choose not to change it.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (6)

cmd/validate.go (2)

69-87: TLS configuration missing MinVersion setting.

This security issue has already been identified in previous reviews. The TLS configuration should specify a minimum version for enhanced security.

🧰 Tools

🪛 ast-grep (0.38.1)

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

115-134: Error handling uses log.Fatal() preventing graceful shutdown.

This error handling issue has already been identified in previous reviews. The function should return errors instead of calling log.Fatal() to allow proper cleanup and graceful error handling.

internal/validation/validation.go (1)

286-296: *Handle nil big.Int inputs to prevent panics.

This nil pointer risk with uint256.FromBig() has already been identified in previous reviews. Fields like MaxPriorityFeePerGas, MaxFeePerBlobGas, Value, R, S can be nil and need safe handling to prevent runtime panics.

Also applies to: 332-340

internal/validation/duplicates.go (3)

33-34: Error return values not handled.

The calls to removeDuplicateBlocks, removeDuplicateTransactions, and removeDuplicateLogs return errors that are not being checked, as already identified in previous reviews. These should be handled to prevent silent failures.

Also applies to: 44-45, 55-56

🧰 Tools

🪛 golangci-lint (1.64.8)

33-33: Error return value is not checked

(errcheck)

62-67: HAVING clause may miss certain duplicate scenarios.

The HAVING sum(sign) != 1 condition has already been identified as potentially problematic in previous reviews, as it could miss cases where duplicates exist but sum to 1.

218-232: SQL injection risk with string interpolation.

The use of fmt.Sprintf to build the IN clause has already been identified as a security risk in previous reviews. This should use parameterized queries instead.

🧹 Nitpick comments (1)

internal/validation/validation.go (1)
97-103: Consider removing or refining the transaction type check.

The current check skips validation for transaction types > 4, but this hardcoded limit may become outdated as new transaction types are introduced. Consider either:

Making this configurable

Adding a more explicit comment about which types are supported

Implementing a more robust unsupported type detection
- for _, tx := range blockData.Transactions {
-     if tx.TransactionType > 4 { // Currently supported types are 0-4
-         log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType)
-         return nil
-     }
- }
+ supportedTypes := map[uint8]bool{0: true, 1: true, 2: true, 3: true, 4: true}
+ for _, tx := range blockData.Transactions {
+     if !supportedTypes[tx.TransactionType] {
+         log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType)
+         return nil
+     }
+ }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0433596 and 4b567ef.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (13)

cmd/root.go (1 hunks)
cmd/validate.go (1 hunks)
go.mod (5 hunks)
internal/common/abi.go (2 hunks)
internal/common/trace.go (3 hunks)
internal/rpc/rpc.go (1 hunks)
internal/rpc/serializer.go (2 hunks)
internal/storage/clickhouse.go (4 hunks)
internal/validation/cursor.go (1 hunks)
internal/validation/db.go (1 hunks)
internal/validation/duplicates.go (1 hunks)
internal/validation/gaps.go (1 hunks)
internal/validation/validation.go (1 hunks)

✅ Files skipped from review due to trivial changes (1)

internal/common/abi.go

🚧 Files skipped from review as they are similar to previous changes (9)

cmd/root.go
internal/rpc/rpc.go
internal/storage/clickhouse.go
internal/rpc/serializer.go
internal/validation/cursor.go
internal/validation/gaps.go
internal/common/trace.go
go.mod
internal/validation/db.go

🧰 Additional context used

🪛 golangci-lint (1.64.8)

internal/validation/duplicates.go

33-33: Error return value is not checked

(errcheck)

44-44: Error return value is not checked

(errcheck)

55-55: Error return value is not checked

(errcheck)

🪛 ast-grep (0.38.1)

cmd/validate.go

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (9)

cmd/validate.go (2)
69-87: ⚠️ Potential issue

Set TLS MinVersion for enhanced security.

The TLS configuration lacks a MinVersion setting, potentially allowing insecure TLS protocol versions. This was flagged in previous reviews and by static analysis.
conn, err := clickhouse.Open(&clickhouse.Options{
	Addr:     []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)},
	Protocol: clickhouse.Native,
-	TLS:      &tls.Config{},
+	TLS: &tls.Config{
+		MinVersion: tls.VersionTLS12,
+	},
	Auth: clickhouse.Auth{
		Username: config.Cfg.Storage.Main.Clickhouse.Username,
		Password: config.Cfg.Storage.Main.Clickhouse.Password,
	},
🧰 Tools

🪛 ast-grep (0.38.1)

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

115-133: 🛠️ Refactor suggestion

Replace log.Fatal() with proper error handling.

The function uses log.Fatal() which terminates the program immediately, preventing graceful error handling and cleanup. This was flagged in previous reviews.
-func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error {
+func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error {
	chainId := rpcClient.GetChainID()
	err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock)
	if err != nil {
-		log.Fatal().Err(err).Msg("Failed to find and fix duplicates")
+		return fmt.Errorf("failed to find and fix duplicates: %w", err)
	}

	err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock)
	if err != nil {
-		log.Fatal().Err(err).Msg("Failed to find and fix gaps")
+		return fmt.Errorf("failed to find and fix gaps: %w", err)
	}

	err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize)
	if err != nil {
-		log.Fatal().Err(err).Msg("Failed to validate and fix blocks")
+		return fmt.Errorf("failed to validate and fix blocks: %w", err)
	}

	log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock)
	return nil
}
internal/validation/validation.go (3)
288-296: ⚠️ Potential issue

*Prevent panics from nil big.Int inputs to uint256.FromBig.

Fields like tx.MaxPriorityFeePerGas, tx.MaxFeePerBlobGas, tx.Value, etc. may be nil for certain transaction types, causing uint256.FromBig(nil) to panic.

Add a helper function and use it for safe conversions:
+func safeUint256(b *big.Int) *uint256.Int {
+	if b == nil {
+		return new(uint256.Int)
+	}
+	out, _ := uint256.FromBig(b)
+	return out
+}
Then replace the direct calls:
-		chainID, _ := uint256.FromBig(tx.ChainId)
-		gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
-		gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas)
-		value, _ := uint256.FromBig(tx.Value)
-		blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas)
-		v, _ := uint256.FromBig(tx.V)
-		r, _ := uint256.FromBig(tx.R)
-		s, _ := uint256.FromBig(tx.S)
+		chainID := safeUint256(tx.ChainId)
+		gasTipCap := safeUint256(tx.MaxPriorityFeePerGas)
+		gasFeeCap := safeUint256(tx.MaxFeePerGas)
+		value := safeUint256(tx.Value)
+		blobFeeCap := safeUint256(tx.MaxFeePerBlobGas)
+		v := safeUint256(tx.V)
+		r := safeUint256(tx.R)
+		s := safeUint256(tx.S)
333-340: ⚠️ Potential issue

Apply the same nil-safety fix for EIP-7702 transactions.

The same nil pointer risk exists here for the EIP-7702 transaction type conversion.
-		chainID, _ := uint256.FromBig(tx.ChainId)
-		gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
-		gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas)
-		value, _ := uint256.FromBig(tx.Value)
-		v, _ := uint256.FromBig(tx.V)
-		r, _ := uint256.FromBig(tx.R)
-		s, _ := uint256.FromBig(tx.S)
+		chainID := safeUint256(tx.ChainId)
+		gasTipCap := safeUint256(tx.MaxPriorityFeePerGas)
+		gasFeeCap := safeUint256(tx.MaxFeePerGas)
+		value := safeUint256(tx.Value)
+		v := safeUint256(tx.V)
+		r := safeUint256(tx.R)
+		s := safeUint256(tx.S)
56-58: ⚠️ Potential issue

Handle error return from poller.Poll.

The error returned by poller.Poll is not checked, which could silently ignore polling failures for invalid blocks.
poller := orchestrator.NewBoundlessPoller(rpcClient, s)
-poller.Poll(context.Background(), batch)
+if err := poller.Poll(context.Background(), batch); err != nil {
+	log.Error().Err(err).Msgf("Failed to poll batch of blocks %s-%s", batch[0], batch[len(batch)-1])
+	return fmt.Errorf("failed to poll invalid blocks: %w", err)
+}
internal/validation/duplicates.go (4)
33-34: ⚠️ Potential issue

Handle error return from removeDuplicateBlocks.

The error returned by removeDuplicateBlocks is not checked, potentially ignoring failures in duplicate removal.
log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers)
-removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
+if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {
+	return fmt.Errorf("failed to remove duplicate blocks: %w", err)
+}
🧰 Tools

🪛 golangci-lint (1.64.8)

33-33: Error return value is not checked

(errcheck)

44-45: ⚠️ Potential issue

Handle error return from removeDuplicateTransactions.

The error returned by removeDuplicateTransactions is not checked, potentially ignoring failures in duplicate removal.
log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions)
-removeDuplicateTransactions(conn, chainId, duplicateTransactions)
+if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {
+	return fmt.Errorf("failed to remove duplicate transactions: %w", err)
+}
🧰 Tools

🪛 golangci-lint (1.64.8)

44-44: Error return value is not checked

(errcheck)

55-56: ⚠️ Potential issue

Handle error return from removeDuplicateLogs.

The error returned by removeDuplicateLogs is not checked, potentially ignoring failures in duplicate removal.
log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs)
-removeDuplicateLogs(conn, chainId, duplicateLogs)
+if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {
+	return fmt.Errorf("failed to remove duplicate logs: %w", err)
+}
🧰 Tools

🪛 golangci-lint (1.64.8)

55-55: Error return value is not checked

(errcheck)

218-229: 🛠️ Refactor suggestion

Avoid string interpolation in SQL queries for security.

The SQL query uses fmt.Sprintf to embed tuples directly, which poses security risks and could create oversized queries.

Consider using parameterized queries or batching with smaller queries. If ClickHouse supports tuple arrays as parameters, use those instead:
-		query := fmt.Sprintf(`WITH
-			to_be_inserted AS (
-				SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
-					data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign
-				FROM default.logs FINAL
-				WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
-			)
-			INSERT INTO logs (
-				chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
-				data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign
-			) SELECT * from to_be_inserted
-		`, strings.Join(tuples, ","))
+		// Alternative approach: process each log individually or use smaller batches
+		for _, duplicateLog := range batch {
+			query := `WITH
+				to_be_inserted AS (
+					SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
+						data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign
+					FROM default.logs FINAL
+					WHERE chain_id = ? AND block_number = ? AND transaction_hash = ? AND log_index = ?
+				)
+				INSERT INTO logs (
+					chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address,
+					data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign
+				) SELECT * from to_be_inserted`
+			
+			err := conn.Exec(context.Background(), query, chainId, duplicateLog.BlockNumber, duplicateLog.TxHash, duplicateLog.LogIndex)
+			if err != nil {
+				return err
+			}
+		}

🧹 Nitpick comments (1)

cmd/validate.go (1)
19-26: Complete the command descriptions.

The Short and Long descriptions are marked as "TBD". Please provide meaningful descriptions for the validate command to improve CLI usability.
var (
	validateCmd = &cobra.Command{
		Use:   "validate",
-		Short: "TBD",
-		Long:  "TBD",
+		Short: "Validate and fix blockchain data integrity",
+		Long:  `Validate and fix blockchain data integrity by checking for duplicates, gaps, and block validation errors. Supports custom batch sizes for processing.`,
		Run: func(cmd *cobra.Command, args []string) {
			RunValidate(cmd, args)
		},
	}
)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b567ef and 734272c.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (13)

cmd/root.go (1 hunks)
cmd/validate.go (1 hunks)
go.mod (5 hunks)
internal/common/abi.go (2 hunks)
internal/common/trace.go (3 hunks)
internal/rpc/rpc.go (1 hunks)
internal/rpc/serializer.go (2 hunks)
internal/storage/clickhouse.go (4 hunks)
internal/validation/cursor.go (1 hunks)
internal/validation/db.go (1 hunks)
internal/validation/duplicates.go (1 hunks)
internal/validation/gaps.go (1 hunks)
internal/validation/validation.go (1 hunks)

✅ Files skipped from review due to trivial changes (1)

internal/common/abi.go

🚧 Files skipped from review as they are similar to previous changes (9)

internal/rpc/rpc.go
cmd/root.go
internal/rpc/serializer.go
internal/validation/gaps.go
internal/validation/cursor.go
internal/common/trace.go
internal/storage/clickhouse.go
go.mod
internal/validation/db.go

🧰 Additional context used

🪛 ast-grep (0.38.1)

cmd/validate.go

[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🪛 golangci-lint (1.64.8)

internal/validation/duplicates.go

33-33: Error return value is not checked

(errcheck)

44-44: Error return value is not checked

(errcheck)

55-55: Error return value is not checked

(errcheck)

iuwqyir requested review from AmineAfia and catalyst17 May 16, 2025 07:28

iuwqyir marked this pull request as ready for review May 16, 2025 07:28

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch 3 times, most recently from d0eb75b to 7ea3dda Compare May 16, 2025 11:32

catalyst17 reviewed May 16, 2025

View reviewed changes

internal/validation/duplicates.go Show resolved Hide resolved

catalyst17 reviewed May 16, 2025

View reviewed changes

internal/validation/gaps.go Show resolved Hide resolved

catalyst17 reviewed May 16, 2025

View reviewed changes

internal/validation/validation.go Show resolved Hide resolved

catalyst17 reviewed May 16, 2025

View reviewed changes

internal/validation/cursor.go Show resolved Hide resolved

catalyst17 reviewed May 16, 2025

View reviewed changes

cmd/validate.go Show resolved Hide resolved

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch from 7ea3dda to bebe17e Compare May 16, 2025 13:39

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch 5 times, most recently from 8769ad1 to 0ffae31 Compare May 22, 2025 17:21

coderabbitai bot reviewed May 22, 2025

View reviewed changes

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch from 0ffae31 to 31e733b Compare May 22, 2025 17:27

coderabbitai bot reviewed May 22, 2025

View reviewed changes

catalyst17 approved these changes May 26, 2025

View reviewed changes

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch from 31e733b to 0433596 Compare June 2, 2025 10:54

coderabbitai bot reviewed Jun 2, 2025

View reviewed changes

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch from 0433596 to 4b567ef Compare June 3, 2025 16:51

coderabbitai bot reviewed Jun 3, 2025

View reviewed changes

chain validation and fix command

734272c

iuwqyir force-pushed the 05-16-chain_validation_and_fix_command branch from 4b567ef to 734272c Compare June 4, 2025 09:54

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

-		removeDuplicateLogs(conn, chainId, duplicateLogs)
-	}
+		log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs)
+		if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {
+			return fmt.Errorf("failed to remove duplicate logs: %w", err)
+		}
+	}

		removeDuplicateTransactions(conn, chainId, duplicateTransactions)
		}

-		removeDuplicateTransactions(conn, chainId, duplicateTransactions)
-	}
+        log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions)
+        if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {
+            return fmt.Errorf("failed to remove duplicate transactions: %w", err)
+        }
+    }

		removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
		}

-		removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
-	}
+        log.Debug().Msgf(
+            "Found %d duplicate blocks in range %v-%v: %v",
+            len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers,
+        )
+        if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {
+            return fmt.Errorf("failed to remove duplicate blocks: %w", err)
+        }
+    }

chain validation and fix command #198

Are you sure you want to change the base?

chain validation and fix command #198

Uh oh!

Conversation

iuwqyir commented May 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Summary by CodeRabbit

Uh oh!

iuwqyir commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

socket-security bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

iuwqyir commented May 16, 2025 •

edited by coderabbitai bot

Loading

socket-security bot commented May 16, 2025 •

edited

Loading

coderabbitai bot commented May 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)