-
Notifications
You must be signed in to change notification settings - Fork 16
chain validation and fix command #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d0eb75b
to
7ea3dda
Compare
"use_skip_indexes_if_final": "1", | ||
"optimize_move_to_prewhere_if_final": "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity & to learn: why are we considering these two safe for the validation feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the async settings? It should not change anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are safe. We are operating on FINAL data, so we want these optimizations
7ea3dda
to
bebe17e
Compare
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
8769ad1
to
0ffae31
Compare
WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (validateCmd)
participant RPC Client
participant Storage
participant ClickHouse
participant Validation
User->>CLI (validateCmd): Run `validate` with batch/fix args
CLI (validateCmd)->>RPC Client: Initialize RPC connection
CLI (validateCmd)->>Storage: Initialize storage connection
CLI (validateCmd)->>ClickHouse: Connect with optimized settings
CLI (validateCmd)->>Validation: Initialize Cursor (get last scanned/max block)
loop For each batch in block range
Validation->>ClickHouse: Remove duplicates in range
Validation->>ClickHouse: Detect and fix gaps in range
Validation->>ClickHouse: Fetch block data for range
Validation->>Validation: Validate blocks (logs bloom, tx root, etc.)
alt Invalid blocks found
Validation->>RPC Client: Repoll invalid blocks
Validation->>Storage: Stage and commit fixed blocks
end
Validation->>Validation: Update cursor with last validated block
end
CLI (validateCmd)->>User: Output validation/fix results
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
internal/common/trace.go (1)
60-78
:⚠️ Potential issue
BlockTimestamp
is never serialized
TraceModel
includes aBlockTimestamp
field (uint64) butSerialize()
omits it. Downstream consumers will receive zero values.- BlockHash: t.BlockHash, + BlockHash: t.BlockHash, + BlockTimestamp: uint64(t.BlockTimestamp.Unix()),Please verify the desired unit (seconds vs milliseconds) and update tests accordingly.
♻️ Duplicate comments (2)
internal/validation/cursor.go (1)
37-50
: Consider making the cursor file location configurable.The current implementation writes cursor files to the working directory, which might be problematic in containerized environments with restricted filesystem access.
+// CursorFilePath is the directory where cursor files are stored +var CursorFilePath string = "." + func (c *Cursor) Update(blockNumber *big.Int) error { - cursorFile := fmt.Sprintf("validation_cursor_%s.json", c.ChainId.String()) + cursorFile := fmt.Sprintf("%s/validation_cursor_%s.json", CursorFilePath, c.ChainId.String())The same change would need to be applied to the
getLastScannedBlock
function.internal/validation/validation.go (1)
163-166
:isContractCreation
misses the “empty string” caseA contract-creation tx coming from
go-ethereum
hasTo == nil
, which your DB
may serialise as an empty string. The current check only treats the all-zero
address as creation:isContractCreation := tx.ToAddress == "0x0000…0000" && tx.Data != "0x"Consider:
isContractCreation := (tx.ToAddress == "" || - tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x" + tx.ToAddress == "0x0000000000000000000000000000000000000000") && + len(tx.Data) > 2 // guards against meaningless “0x”
🧹 Nitpick comments (5)
cmd/validate.go (2)
78-86
: Harden TLS configurationThe explicit
tls.Config{}
inherits Go’s default (TLS 1.2 for clients). For a CLI tool that only talks to a modern ClickHouse server you can safely raise the floor:- TLS: &tls.Config{}, + TLS: &tls.Config{ + MinVersion: tls.VersionTLS13, + },Not critical but improves security posture.
50-52
: Off-by-one adjustment is opaque
batchSize = batchSize - 1
relies on readers knowing the inclusive semantics later. Consider renaming the variable or adding a short comment to avoid future errors.// range end is inclusive, adjust once so we fetch exactly N blocks batchSizeInclusive := new(big.Int).Sub(batchSize, big.NewInt(1))internal/validation/validation.go (1)
46-54
: *Misleading debug message when printing big.Int values
log.Debug().Msgf("Processing batch of blocks %d to %d", …)
prints the decimal
representation of the pointers, not the block numbers, because%d
formats
a numeric value, not an object implementingStringer
.
Prefer%s
(or%v
) to leverage(*big.Int).String()
:- log.Debug().Msgf("Processing batch of blocks %d to %d", invalidBlocks[i], invalidBlocks[end-1]) + log.Debug().Msgf("Processing batch of blocks %s to %s", + invalidBlocks[i].String(), invalidBlocks[end-1].String())internal/validation/db.go (1)
120-124
: Potential goroutine leak if one query errors earlyEach goroutine writes exactly once to its channel, but if an early
return
occurs before the send, the receiver blocks forever. Although you currently
return
only after sending, future edits could break the guarantee.A safer pattern is to close the channel once the value is sent or use a
sync.WaitGroup
.
Example with a WaitGroup:var ( blocksResult blockResult … ) wg.Add(3) go func() { defer wg.Done(); blocksResult.blocks, blocksResult.err = … }() … wg.Wait()internal/validation/duplicates.go (1)
161-199
:removeDuplicateTransactions
may exceed ClickHouse parameter limitsThe batched deletion is great, but each loop aggregates all hashes in one
IN
clause. If a batch nears 1000 items, ClickHouse can hit the default
max_query_size
. Consider:
- Lowering
batchSize
(e.g., 100).- Or chunking
hashes
too: execute the same query multiple times with
smaller slices.This keeps memory and network utilisation predictable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/validation/cursor.go (1)
internal/storage/connector.go (1)
IStorage
(67-71)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
cmd/validate.go
15-15: could not import github.com/thirdweb-dev/indexer/internal/validation (-: # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int))
(typecheck)
internal/validation/cursor.go
1-1: : # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
(typecheck)
🪛 GitHub Check: Test
internal/validation/validation.go
[failure] 56-56:
not enough arguments in call to poller.Poll
internal/validation/gaps.go
[failure] 27-27:
not enough arguments in call to poller.Poll
🪛 GitHub Actions: Go Unit Tests
internal/validation/gaps.go
[error] 27-27: not enough arguments in call to poller.Poll
🔇 Additional comments (12)
internal/validation/cursor.go (3)
12-16
: Well-structured cursor type for tracking validation progress.The
Cursor
struct provides a good foundation for tracking blockchain validation progress with appropriate fields for maintaining state across validation runs.
18-35
: LGTM - Good error handling and initialization logic.The
InitCursor
function properly initializes a cursor by retrieving the last scanned block and ensuring it doesn't exceed the maximum block number. The null check on line 24 formaxBlockNumber
provides good defensive programming.
52-72
: Good error handling for file operations.The implementation properly handles cases where the cursor file doesn't exist or contains invalid data by defaulting to block 0, allowing the validation to start from the beginning.
go.mod (3)
24-117
: Numerous indirect dependency updates.The PR includes many indirect dependency updates, including new CockroachDB packages, blockchain cryptography libraries, and Go standard library updates. These support the new validation functionality while keeping dependencies current.
7-10
:❓ Verification inconclusive
Significant dependency update for core Ethereum library.
The go-ethereum library has been updated from v1.14.8 to v1.15.11, which is a substantial version jump. Additionally,
holiman/uint256
has been moved from an indirect to a direct dependency, which aligns with the type changes in the serializer.
🏁 Script executed:
#!/bin/bash # Check for breaking changes in the updated go-ethereum version rg -A 5 -B 5 "github.com/ethereum/go-ethereum" --type goLength of output: 7911
Review go-ethereum v1.15.11 & holiman/uint256 Upgrade
The bump from v1.14.8→v1.15.11 spans multiple core packages and could introduce breaking changes to RPC clients, common types, trie handling, etc. We also promoted holiman/uint256 to a direct dependency to match updated serializers.
Please verify:
- All imports of
github.com/ethereum/go-ethereum
(shown below) still compile and behave as expected.- No changed function signatures or removed APIs in v1.15.x affect your code paths.
- The bump in holiman/uint256 aligns with any type conversions in serializers.
Affected files (imports of go-ethereum packages):
- internal/validation/validation.go
- internal/rpc/rpc.go
- internal/rpc/params.go
- internal/rpc/batcher.go
- internal/storage/clickhouse.go
- internal/middleware/authorization.go
- internal/common/transaction.go, abi.go, log.go
- internal/handlers/transactions_handlers.go, logs_handlers.go
- internal/common/log_test.go, transaction_test.go
Actions:
- Review the go-ethereum v1.15.11 release notes for breaking changes.
- Run the full test suite (unit + integration) against a node with v1.15.x.
- Ensure all ABI/account/rpc calls still serialize/deserialize correctly.
3-3
:❌ Incorrect review comment
Verify Go version compatibility.
The Go version has been updated to 1.23.0, which appears to be a future version since the current stable version at the time of this review is 1.22.x.
🌐 Web query:
What is the latest stable version of Go available in May 2025?
💡 Result:
As of May 22, 2025, the latest stable version of Go is 1.24.3, released on April 30, 2025. (go.dev) This release includes security fixes to the
os
package, as well as bug fixes to the runtime, compiler, linker,go
command, and thecrypto/tls
andos
packages. (golang.es)Citations:
- 1: https://go.dev/VERSION?utm_source=openai
- 2: https://golang.es/doc/devel/release?utm_source=openai
Incorrect Go version assumption in go.mod
The requirement
go 1.23.0
is not a future release. As of May 22, 2025 the latest stable Go is 1.24.3 (released April 30, 2025), and Go 1.23.0 was already available prior to that. Yourgo.mod
entry is valid.If you need fixes or features from the newest release, you may optionally bump to:
- go 1.23.0 + go 1.24• File: go.mod (line 3)
Likely an incorrect or invalid review comment.
cmd/root.go (1)
201-201
: Good integration of the new validation command.The new
validateCmd
is properly added to the root command, following the same pattern as existing commands.internal/rpc/serializer.go (2)
382-383
: Type optimization for gas values.Changing from
big.Int
touint64
for gas values is a good optimization as Ethereum gas values typically fit comfortably within uint64 range, reducing memory usage and improving performance.
402-411
:✅ Verification successful
Type change for trace addresses.
The type change from
uint64
toint64
for trace addresses indicates a design decision to allow for potential negative indices or to align with other systems' expectations.
🏁 Script executed:
#!/bin/bash # Check if there are corresponding changes in other files related to trace address types rg -A 3 -B 3 "TraceAddress.*\[\]int64" --type goLength of output: 1593
serializeTraceAddress type change is consistent
serializeTraceAddress now returns []int64, which matches the TraceAddress field defined as []int64 in internal/common/trace.go. No further updates are required.
- internal/common/trace.go:
TraceAddress []int64
confirmed.internal/storage/clickhouse.go (2)
375-377
: Gas fields handled correctly after type change – good job
trace.Gas
andtrace.GasUsed
are now inserted directly asuint64
, matching the new struct definition and saving an unnecessary allocation / dereference.
No further action needed.
936-938
: FINAL modifier ensures consistent readsAdding
FINAL
to the staging-data query is a good call – it prevents race conditions with background merges in ClickHouse.
👍internal/common/trace.go (1)
16-24
: Type migration looks consistentSwitching
TraceAddress
to[]int64
and gas fields touint64
simplifies marshaling and DB inserts. Just ensure callers no longer rely on signedness ofTraceAddress
(previouslyuint64
).
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | ||
log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData) | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Errors ignored after staging-data cleanup
DeleteStagingData
errors are logged but not propagated. If the cleanup fails the validation job silently continues, leaving orphaned rows and possibly breaking idempotency.
Consider returning the error to the caller so the outer command can decide whether to abort:
- if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {
- log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData)
- }
+ if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {
+ return fmt.Errorf("failed to delete staging data: %w", err)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | |
log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData) | |
} | |
return nil | |
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | |
return fmt.Errorf("failed to delete staging data: %w", err) | |
} | |
return nil |
🤖 Prompt for AI Agents
In internal/validation/gaps.go around lines 43 to 46, the error returned by
DeleteStagingData is logged but not propagated, causing the validation job to
continue silently on failure. Modify the code to return the error from
DeleteStagingData instead of just logging it, so the caller can handle the
failure appropriately and maintain idempotency.
chainID, _ := uint256.FromBig(tx.ChainId) | ||
gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) | ||
gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) | ||
value, _ := uint256.FromBig(tx.Value) | ||
blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) | ||
v, _ := uint256.FromBig(tx.V) | ||
r, _ := uint256.FromBig(tx.R) | ||
s, _ := uint256.FromBig(tx.S) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Nil-pointer risk when converting big.Int → uint256
uint256.FromBig
panics if the pointer is nil
. Fields such as
MaxPriorityFeePerGas
, MaxFeePerBlobGas
, Value
, R
, S
, etc. can be
NULL in the DB for older blocks or special transactions.
Add pre-checks or default zero values to avoid runtime panics:
func safeUint256(b *big.Int) *uint256.Int {
if b == nil {
return new(uint256.Int)
}
out, _ := uint256.FromBig(b)
return out
}
…and use safeUint256()
at each conversion point.
Also applies to: 332-338
*Handle nil big.Int inputs to uint256.FromBig to prevent panics
The fields tx.MaxPriorityFeePerGas
, tx.MaxFeePerBlobGas
, tx.Value
, tx.R
, tx.S
, etc. may be nil
(e.g. legacy or special transactions). Calling uint256.FromBig(nil)
will panic.
Introduce a helper in internal/validation/validation.go
:
func safeUint256(b *big.Int) *uint256.Int {
if b == nil {
return new(uint256.Int)
}
out, _ := uint256.FromBig(b)
return out
}
And replace the direct calls at both locations:
• Lines 286–294
• Lines 332–338
Example diff:
- chainID, _ := uint256.FromBig(tx.ChainId)
+ chainID := safeUint256(tx.ChainId)
- gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
+ gasTipCap := safeUint256(tx.MaxPriorityFeePerGas)
...and similarly for MaxFeePerGas, Value, MaxFeePerBlobGas, V, R, S...
🤖 Prompt for AI Agents
In internal/validation/validation.go around lines 286 to 294, the code calls
uint256.FromBig on fields like tx.MaxPriorityFeePerGas and tx.Value without
checking for nil, which can cause panics if these fields are nil. To fix this,
define a helper function safeUint256 that returns a zero uint256.Int if the
input *big.Int is nil, otherwise calls uint256.FromBig. Replace all direct calls
to uint256.FromBig with safeUint256 in this block and also in lines 332 to 338
to safely handle nil inputs and prevent runtime panics.
query := fmt.Sprintf(`WITH | ||
to_be_inserted AS ( | ||
SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, | ||
data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign | ||
FROM default.logs FINAL | ||
WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) | ||
) | ||
INSERT INTO logs ( | ||
chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, | ||
data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign | ||
) SELECT * from to_be_inserted | ||
`, strings.Join(tuples, ",")) | ||
|
||
err := conn.Exec(context.Background(), query, chainId, blockNumbers) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
String-built IN
clause risks gigantic queries & SQL-injection
fmt.Sprintf
embeds a comma-separated list of tuples directly into the SQL
string.
Drawbacks:
- Huge
duplicateLogs
sets can blow the server’s max query length. - Although values come from the DB, handcrafted strings bypass ClickHouse
query-parameter sanitisation.
Prefer passing an array parameter(s) to the driver:
- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
+ WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)
Then supply []tuple
(ClickHouse supports arrays of tuples) as the third
parameter. If tuple arrays are awkward, batch the deletion with smaller,
parameterised queries like you did for transactions.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 218 to 232, the SQL query uses
fmt.Sprintf to embed a comma-separated list of tuples directly into the IN
clause, which risks creating excessively large queries and potential SQL
injection issues. To fix this, refactor the query to use parameterized array
inputs for the tuples instead of string-building the IN clause. Pass the tuples
as an array parameter to the query execution, leveraging ClickHouse's support
for arrays of tuples, or alternatively, batch the inserts with smaller
parameterized queries to avoid oversized queries and ensure proper sanitization.
0ffae31
to
31e733b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (3)
internal/validation/validation.go (2)
286-295
:⚠️ Potential issue*Handle nil big.Int inputs to uint256.FromBig to prevent panics
The code doesn't check for nil values before converting
big.Int
touint256.Int
, which could cause panics for older blocks or special transactions where some fields may be nil.Implement a helper function at the beginning of the file:
// safeUint256 converts a *big.Int to *uint256.Int safely, handling nil values func safeUint256(b *big.Int) *uint256.Int { if b == nil { return new(uint256.Int) } out, _ := uint256.FromBig(b) return out }Then replace all the conversions:
-chainID, _ := uint256.FromBig(tx.ChainId) -gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) -gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) -value, _ := uint256.FromBig(tx.Value) -blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) -v, _ := uint256.FromBig(tx.V) -r, _ := uint256.FromBig(tx.R) -s, _ := uint256.FromBig(tx.S) +chainID := safeUint256(tx.ChainId) +gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) +gasFeeCap := safeUint256(tx.MaxFeePerGas) +value := safeUint256(tx.Value) +blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) +v := safeUint256(tx.V) +r := safeUint256(tx.R) +s := safeUint256(tx.S)Also apply the same pattern to lines 332-338.
56-58
:⚠️ Potential issueCheck error return from poller.Poll
The
Poll
method likely returns an error that is not being checked. This could silently ignore failures in the polling process.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + return fmt.Errorf("failed to poll blocks %v: %w", batch, err) +} log.Debug().Msgf("Batch of invalid blocks polled: %d to %d", invalidBlocks[i], invalidBlocks[end-1])internal/validation/duplicates.go (1)
213-229
: 🛠️ Refactor suggestionString-built
IN
clause risks gigantic queries & SQL-injectionBuilding the
IN
clause directly withfmt.Sprintf
could lead to oversized queries if there are many tuples, and potentially allows SQL injection vulnerabilities even though the values come from the database.Consider one of these approaches:
- Use ClickHouse array parameters instead:
-tuples := make([]string, 0, len(batch)) +txHashes := make([]string, 0, len(batch)) +logIndices := make([]uint64, 0, len(batch)) for _, duplicateLog := range batch { blockNumbers = append(blockNumbers, duplicateLog.BlockNumber) - tuples = append(tuples, fmt.Sprintf("('%s', %d)", duplicateLog.TxHash, duplicateLog.LogIndex)) + txHashes = append(txHashes, duplicateLog.TxHash) + logIndices = append(logIndices, duplicateLog.LogIndex) } -query := fmt.Sprintf(`WITH +query := `WITH to_be_inserted AS ( SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign FROM default.logs FINAL - WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN ZIP(?, ?) ) INSERT INTO logs ( chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign ) SELECT * from to_be_inserted -`, strings.Join(tuples, ",")) +` -err := conn.Exec(context.Background(), query, chainId, blockNumbers) +err := conn.Exec(context.Background(), query, chainId, blockNumbers, txHashes, logIndices)
🧹 Nitpick comments (3)
cmd/validate.go (1)
19-26
: Update the TBD command documentationThe command's
Short
andLong
descriptions are set to "TBD" which should be updated with meaningful descriptions to help users understand the purpose and functionality of the validate command.var ( validateCmd = &cobra.Command{ - Use: "validate", - Short: "TBD", - Long: "TBD", + Use: "validate [batchSize] [fixBatchSize]", + Short: "Validates and fixes blockchain data integrity", + Long: "Validates blockchain data integrity by detecting and removing duplicates, filling gaps, and verifying block data. Takes optional batch size (default: 1000) and fix batch size (default: 0) arguments.", Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, } )internal/validation/validation.go (2)
164-166
: Better contract creation detectionThe contract creation detection logic relies on checking for the zero address string and non-empty data, which could be error-prone. Consider using a more robust approach.
-isContractCreation := tx.ToAddress == "0x0000000000000000000000000000000000000000" && tx.Data != "0x" +// Contract creation transactions have empty "to" field, but in the database they might be +// stored with the zero address. Check for both empty address and zero address. +isContractCreation := (tx.ToAddress == "" || tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x"
98-104
: Add comments explaining transaction type checkingThe code skips transaction root validation for transaction types > 4, but doesn't explain why. Add comments to clarify which transaction types are supported and why others are skipped.
// TODO: remove this once we know how to validate all tx types for _, tx := range blockData.Transactions { - if tx.TransactionType > 4 { // Currently supported types are 0-4 + // Currently supported transaction types: + // 0: Legacy + // 1: Access List (EIP-2930) + // 2: Dynamic Fee (EIP-1559) + // 3: Blob (EIP-4844) + // 4: Set Code (EIP-7702) + // Skip validation for newer, unsupported transaction types + if tx.TransactionType > 4 { log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) return nil } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- cmd/root.go
- internal/storage/clickhouse.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
conn, err := clickhouse.Open(&clickhouse.Options{ | ||
Addr: []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)}, | ||
Protocol: clickhouse.Native, | ||
TLS: &tls.Config{}, | ||
Auth: clickhouse.Auth{ | ||
Username: config.Cfg.Storage.Main.Clickhouse.Username, | ||
Password: config.Cfg.Storage.Main.Clickhouse.Password, | ||
}, | ||
Settings: func() clickhouse.Settings { | ||
settings := clickhouse.Settings{ | ||
"do_not_merge_across_partitions_select_final": "1", | ||
"use_skip_indexes_if_final": "1", | ||
"optimize_move_to_prewhere_if_final": "1", | ||
"async_insert": "1", | ||
"wait_for_async_insert": "1", | ||
} | ||
return settings | ||
}(), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set TLS MinVersion for better security
The TLS configuration is missing a MinVersion
setting, which could allow connections with insecure TLS protocol versions.
TLS: &tls.Config{
+ MinVersion: tls.VersionTLS12,
},
Ideally, use TLS 1.3 if your server supports it:
TLS: &tls.Config{
+ MinVersion: tls.VersionTLS13,
},
🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🤖 Prompt for AI Agents
In cmd/validate.go around lines 69 to 87, the TLS configuration for the
Clickhouse connection lacks a MinVersion setting, which may allow insecure TLS
versions. Update the tls.Config struct to include MinVersion set to
tls.VersionTLS13 to enforce using TLS 1.3 if supported by the server, enhancing
connection security.
func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { | ||
chainId := rpcClient.GetChainID() | ||
err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to find and fix duplicates") | ||
} | ||
|
||
err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to find and fix gaps") | ||
} | ||
|
||
err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to validate and fix blocks") | ||
} | ||
|
||
log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling in validateAndFixRange
The function calls log.Fatal()
which terminates the program immediately without allowing cleanup or graceful shutdown. Consider returning errors instead and handling them at a higher level.
func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error {
chainId := rpcClient.GetChainID()
err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to find and fix duplicates")
+ return fmt.Errorf("failed to find and fix duplicates: %w", err)
}
err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to find and fix gaps")
+ return fmt.Errorf("failed to find and fix gaps: %w", err)
}
err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to validate and fix blocks")
+ return fmt.Errorf("failed to validate and fix blocks: %w", err)
}
log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock)
return nil
}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cmd/validate.go around lines 115 to 133, the function validateAndFixRange
uses log.Fatal() on errors, which abruptly terminates the program. Modify the
function to return errors instead of calling log.Fatal(), allowing the caller to
handle errors gracefully and perform any necessary cleanup or shutdown
procedures.
removeDuplicateLogs(conn, chainId, duplicateLogs) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateLogs
The error return value from removeDuplicateLogs
is not checked, which could silently ignore failures in the duplicate log removal process.
log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs)
-removeDuplicateLogs(conn, chainId, duplicateLogs)
+if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {
+ return fmt.Errorf("failed to remove duplicate logs: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateLogs(conn, chainId, duplicateLogs) | |
} | |
log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs) | |
if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { | |
return fmt.Errorf("failed to remove duplicate logs: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 55 to 56, the call to
removeDuplicateLogs returns an error that is currently not checked. Modify the
code to capture the error returned by removeDuplicateLogs and handle it
appropriately, such as logging the error or returning it up the call stack, to
ensure failures in the duplicate log removal process are not silently ignored.
removeDuplicateTransactions(conn, chainId, duplicateTransactions) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateTransactions
The error return value from removeDuplicateTransactions
is not checked, which could silently ignore failures in the duplicate transaction removal process.
log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions)
-removeDuplicateTransactions(conn, chainId, duplicateTransactions)
+if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {
+ return fmt.Errorf("failed to remove duplicate transactions: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateTransactions(conn, chainId, duplicateTransactions) | |
} | |
log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions) | |
if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { | |
return fmt.Errorf("failed to remove duplicate transactions: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 44 to 45, the call to
removeDuplicateTransactions returns an error that is currently ignored. Modify
the code to capture this error and handle it appropriately, such as logging the
error or returning it up the call stack, to ensure failures in removing
duplicate transactions are not silently ignored.
removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateBlocks
The error return value from removeDuplicateBlocks
is not checked, which could silently ignore failures in the duplicate block removal process.
log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers)
-removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
+if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {
+ return fmt.Errorf("failed to remove duplicate blocks: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) | |
} | |
log.Debug().Msgf( | |
"Found %d duplicate blocks in range %v-%v: %v", | |
len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers, | |
) | |
if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { | |
return fmt.Errorf("failed to remove duplicate blocks: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 33 to 34, the call to
removeDuplicateBlocks returns an error that is currently ignored. Modify the
code to capture this error and handle it appropriately, such as logging the
error or returning it up the call stack, to ensure failures in duplicate block
removal are not silently ignored.
TL;DR
Added a new
validate
command to the indexer that verifies and fixes data integrity issues in the blockchain data.What changed?
validate
command to the CLI that performs data validation and repairFindAndRemoveDuplicates
: Identifies and removes duplicate blocks, transactions, and logsFindAndFixGaps
: Detects and fills in missing blocks in a specified rangeValidateAndFixBlocks
: Verifies block integrity by checking transaction counts, logs bloom filters, and transaction rootsHow to test?
Run the validate command with optional batch size and fix batch size parameters:
Why make this change?
This validation tool helps maintain data integrity in the blockchain indexer by:
These validation capabilities are essential for ensuring the reliability and accuracy of the indexed blockchain data.
Summary by CodeRabbit
New Features
Bug Fixes
Dependency Updates