Conversation
WalkthroughThis pull request enhances transaction broadcasting by introducing improved sequence management and robust error handling. A new function for handling sequence mismatches is added and integrated within the transaction flow. The changes include a new NodeSelector for managing RPC nodes with health checks, per-node sequence tracking across multiple modules, and updated logging. Several timeouts for HTTP, RPC, and gRPC communications have been reduced, and fee calculations have been adjusted. Overall, the update refines node selection, error recovery, and transaction sequence handling across the system. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant NS as NodeSelector
participant SM as SequenceManager
participant N as RPC Node
C->>NS: Request node selection
NS-->>C: Return node URL (healthy, least-used)
C->>SM: Get sequence for selected node
SM-->>C: Provide sequence value
C->>N: Send transaction with sequence
alt Transaction fails (sequence mismatch)
N-->>C: Error "sequence mismatch"
C->>SM: Call handleSequenceMismatch with error details
SM-->>C: Update expected sequence & increment sequence
C->>N: Resend transaction with updated sequence
N-->>C: Success response
else Transaction succeeds
N-->>C: Success response with transaction ID
end
Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 golangci-lint (1.62.2)Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24) Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (13)
lib/peerdiscovery/discovery.go (1)
18-18: Consider the performance implication of reducing timeoutReducing DefaultTimeout from 5 seconds to 1 second will improve responsiveness, but it might impact the ability to discover peers in high-latency networks. Ensure this 80% decrease in timeout doesn't negatively affect peer discovery in real-world networks where response times could exceed 1 second.
Making the timeout configurable would be a more robust solution:
- DefaultTimeout = 1 * time.Second + // DefaultTimeout is the default timeout for RPC requests + DefaultTimeout = 1 * time.Second + + // TimeoutEnvVar allows overriding the timeout via environment variable + TimeoutEnvVar = "PEER_DISCOVERY_TIMEOUT"Then modify the New function:
func New(initialEndpoints []string) *PeerDiscovery { ctx, cancel := context.WithCancel(context.Background()) // Allow timeout to be configured via environment discoveryTimeout := DefaultTimeout if envTimeout := os.Getenv(TimeoutEnvVar); envTimeout != "" { if seconds, err := strconv.Atoi(envTimeout); err == nil && seconds > 0 { discoveryTimeout = time.Duration(seconds) * time.Second fmt.Printf("Using custom peer discovery timeout: %s\n", discoveryTimeout) } } return &PeerDiscovery{ initialEndpoints: initialEndpoints, visitedNodes: make(map[string]bool), openRPCEndpoints: make([]string, 0), semaphore: make(chan struct{}, MaxConcurrentChecks), ctx: ctx, cancel: cancel, timeout: discoveryTimeout, } }lib/chainregistry/cli.go (1)
372-372: Significantly reduced connection idle timeoutReducing IdleConnTimeout from 30 seconds to 1 second will cause connections to be closed much faster when not in use. This could increase connection churn if endpoints are accessed frequently, as new connections will need to be established instead of reusing idle ones.
Consider using a more moderate timeout value, especially since this is part of an endpoint verification process where connection reuse might be beneficial:
- IdleConnTimeout: 1 * time.Second, + IdleConnTimeout: 5 * time.Second,broadcast/client.go (2)
55-56: Consider making the 15-second timeout configurable.
This ensures the library can handle varying network conditions while preserving a sane default for typical usage.- ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second) + configuredTimeout := getBroadcastTimeout() // e.g., from config or environment + ctx, cancel := context.WithTimeout(context.Background(), configuredTimeout)
66-71: Caution with substring matching on errors.
Relying on partial string matches (e.g., "timed out", "connection refused") can be brittle. Consider unwrapping errors or using structured error handling to avoid false positives.broadcast/grpc.go (3)
42-43: Consider adopting structured logging
Usingfmt.Printffor logs is workable, but adopting a structured logger (e.g., zap, logrus) can improve clarity and parseability.
60-60: Consider making the 15-second timeout configurable.
This ensures better adaptability for varied network conditions while preserving a sensible default.- broadcastCtx, cancel := context.WithTimeout(ctx, 15*time.Second) + timeout := getBroadcastTimeoutFromConfig() + broadcastCtx, cancel := context.WithTimeout(ctx, timeout)
84-85: Consider using structured logging for fee adjustment messages
In production environments, structured logs can facilitate more effective analysis than plain text.broadcast/rpc.go (3)
24-25: Consider making the 15-second timeout configurable.
This can help accommodate various network conditions while retaining a suitable default.- ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second) + confTimeout := readTimeoutFromConfig() + ctx, cancel := context.WithTimeout(context.Background(), confTimeout)
42-43: Consider structured logging
Replacingfmt.Printfwith a structured logger can enhance traceability and debugging.
86-87: Fee adjustment logging
Consider structured logging for better observability and correlation in production.broadcast/broadcast.go (1)
122-175: Potential centralization of node initialization logic.
The node selection and sequence retrieval code is a sizable block. Reusing or refactoring it into a helper could reduce duplication.You might abstract repeated steps into a dedicated function for clarity and maintainability.
broadcast/utils.go (1)
400-410: Consider adding connection pooling configuration.While the HTTP client configuration is generally good, you might want to add connection pooling parameters to better handle high transaction volumes.
client := &http.Client{ Transport: &http.Transport{ MaxIdleConns: 100, MaxIdleConnsPerHost: 20, IdleConnTimeout: 60 * time.Second, TLSHandshakeTimeout: 5 * time.Second, DisableKeepAlives: false, + MaxConnsPerHost: 50, // Limit connections per host + ResponseHeaderTimeout: 5 * time.Second, // Timeout for response headers + ExpectContinueTimeout: 1 * time.Second, // Timeout for expect-continue protocol }, Timeout: 5 * time.Second, }lib/sequence.go (1)
123-131: Consider adding retry mechanism for sequence extraction.When extracting sequence from error messages, there's no retry mechanism if the extraction fails. Consider adding a fallback approach when sequence cannot be parsed from the error message.
func (sm *SequenceManager) UpdateFromError(address, nodeURL, errMsg string) (uint64, bool) { expectedSeq, err := ExtractExpectedSequence(errMsg) if err != nil { + // Fallback: If we can't extract sequence, try refreshing from chain + nodeConfig := types.Config{Nodes: types.Nodes{RPC: []string{nodeURL}}} + seq, _, fetchErr := GetAccountInfo(address, nodeConfig) + if fetchErr == nil { + sm.SetSequence(address, nodeURL, seq) + return seq, true + } return 0, false } sm.SetSequence(address, nodeURL, expectedSeq) return expectedSeq, true }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
configurations/atone/logs/meteorite_viz_2025-02-27_20-03-39.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-11-06.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-12-20.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-20-05.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-23-12.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-31-59.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-35-02.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-44-22.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-49-40.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_20-53-24.logis excluded by!**/*.logconfigurations/atone/logs/meteorite_viz_2025-02-27_21-01-35.logis excluded by!**/*.log
📒 Files selected for processing (13)
broadcast/broadcast.go(4 hunks)broadcast/client.go(2 hunks)broadcast/grpc.go(6 hunks)broadcast/rpc.go(6 hunks)broadcast/transaction.go(1 hunks)broadcast/utils.go(2 hunks)lib/chainregistry/cli.go(1 hunks)lib/lib.go(4 hunks)lib/peerdiscovery/discovery.go(1 hunks)lib/sequence.go(3 hunks)lib/visualizer/visualizer.go(1 hunks)main.go(4 hunks)modes/registry/registry.go(8 hunks)
🔇 Additional comments (62)
lib/visualizer/visualizer.go (1)
316-316: Increased context timeout for mempool data collectionUnlike other timeout reductions in this PR, this change increases the timeout from 2 to 5 seconds. This is a good adjustment as retrieving mempool data can be time-consuming, especially for nodes with many pending transactions. This helps ensure the operation has enough time to complete.
broadcast/transaction.go (1)
604-604:Details
❓ Verification inconclusive
Major fee calculation change increases fees by 100x
Changing the divisor from 100000 to 1000 will result in transaction fees that are 100 times higher for the same gas limit and gas price. This is a significant change that could impact users' cost of transactions.
Verify this change is intentional and properly communicated to users. If this is intended to make fees more reasonable, consider adding a comment explaining the rationale for the new divisor value, and ensure fee estimation tools are updated accordingly.
🏁 Script executed:
#!/bin/bash # Check for references to fee calculation in documentation and tests # to verify if this change is properly documented and tested echo "Searching for fee calculation references..." rg -A 2 -B 2 "fee.*calculation|gas.*fee|transaction.*fee" --type md echo "Searching for tests involving fee calculation..." rg -A 5 -B 5 "TestFee|test.*fee|fee.*test" --type go echo "Checking for fee-related configuration in sample configs..." rg -A 3 -B 3 "fee|gas" --type json --type yamlLength of output: 506
Critical Fee Calculation Change: Manual Verification Needed
The revised fee calculation at
broadcast/transaction.go:604now uses a divisor of 1000 instead of 100000, which dramatically increases transaction fees by 100× compared to previous behavior. Our automated search for fee-related documentation, tests, and configuration files did not reveal any references that either confirm this change or document its rationale.
- Location to verify:
broadcast/transaction.go(line 604)Please manually verify that this change is intentional and that its impact on user fees is clearly communicated. If the intent is indeed to adjust fees to be more reasonable, consider:
- Adding an in-code comment explaining the choice of divisor.
- Updating fee estimation tools and related documentation to reflect the new behavior.
broadcast/grpc.go (5)
26-30: Use of a shared sequence manager
The code properly obtains a shared sequence manager instance for per-node tracking. Good approach for multi-node concurrency.
32-39: Node-specific sequence usage
This ensures correct sequence usage across multiple nodes, preventing sequence mismatch errors.
116-121: Sequence mismatch handling
Appropriately captures the expected sequence and updates the local manager. Solid approach to ensure consistency.
147-152: Response-level sequence mismatch handling
Good job also checking for mismatches in the final response, ensuring complete coverage.
165-171: Sequence update on success
Incrementing the tracked sequence upon success keeps local state synchronized with the network.broadcast/rpc.go (5)
28-30: Sequence manager usage
Well done retrieving the manager for per-node sequence tracking.
32-39: Node-specific sequence retrieval
This design avoids sequence collisions when broadcasting to different nodes.
60-60: Broadcast client creation
No issues found here; logic flows correctly.
114-119: Sequence mismatch handling
Excellent approach to updating the local manager with the newly expected sequence to stay synchronized.
152-158: Sequence update on success
Incrementing the sequence ensures consistency between the local state and remote node.lib/lib.go (5)
34-37: Be cautious regarding short idle and TLS handshake timeouts.
Although the code comments say “Increased,” setting these values to 1 second may be too small for certain network conditions and can lead to frequent reconnections or handshake failures.Please confirm that these very short timeouts are intentional and sufficiently tested under real-world or slower network scenarios.
159-159: Validate reduced HTTP request timeout.
Lowering the context from 30 seconds to 5 seconds speeds up error detection but may risk timeouts on slow networks.Please verify whether 5 seconds is acceptable for all usage contexts and doesn’t increase failed requests unnecessarily.
394-394: Verify shorter context timeout for balance checks.
Reducing the timeout from 10 seconds to 5 seconds might cause more aborted requests if the node or network is slow.Confirm that 5 seconds suffices for typical operations.
534-538: Ensure idle connection settings match intended performance.
A 1-second IdleConnTimeout is significantly low, which can hamper reusability of connections.Double-check these settings for busy or higher-latency environments.
544-544: Reevaluate 1-second overall request timeout.
Limiting all HTTP requests to 1 second may cause frequent timeouts.Confirm this is aligned with application needs, especially if remote nodes can exhibit transient delays.
broadcast/broadcast.go (11)
74-109: Check concurrency safety for sequence mismatch handling.
This new function properly extracts the expected sequence, updates the sequence manager, and retries broadcasting. However, if multiple goroutines target the same account simultaneously, confirm thatGetSequenceManager()and its internal store are safely handling concurrent updates.Consider verifying concurrency via stress tests or audits of the sequence manager for possible data races.
177-178: Looks good.
Logging the start of broadcasts with visual debugging is straightforward.
181-190: Node selection logic is clear.
This picks the next node in a round-robin or fallback approach.
220-222: Good handling of sequence mismatch.
Calling the new mismatch handler and updating node sequences is correct.
227-228: Usage recording appears valid.
Recording usage helps the NodeSelector’s balancing logic.
238-240: User feedback for successful transactions is helpful
Provides immediate clarity for logs.
245-246: Node usage metrics updated effectively.
This ensures usage weighting is accurate.
248-251: Incrementing node-specific sequence is correct.
Keeps local tracking consistent with on-chain state.
257-262: Correctly determines highest used sequence.
Makes the final updated sequence reflect the maximum across all nodes.
265-267: Clear node usage logging.
No issues spotted.
269-274: Completion logging and return of updated sequence is well-structured.
No problems found.main.go (15)
320-330: Good addition of NodeSelector setup.
This helps distribute traffic across multiple healthy nodes.
563-568: Assigning nodes to a NodeSelector for transfers.
This integrates well with the multi-node approach.
569-574: Use of least-used node fallback is consistent.
Ensures effective load distribution with a safe fallback.
575-577: Clear logging of selected node.
Eases troubleshooting.
581-581: Assigning NodeURL for transaction parameters.
Straightforward addition.
595-595: Reduced timeout to 10 seconds.
Shorter contexts detect node issues sooner but be mindful of partial network delays.
600-601: Improved logs for better debug clarity.
No issues found.
613-613: Logs node in error message.
Enhances diagnostic info.
620-622: Sequence mismatch handling.
Updating and retrying the sequence is appropriate.
626-636: Switching to next node on non-sequence errors.
Reduces repeated failures on a single node.
641-641: Clear error log with node context.
No additional concerns.
649-651: Sequence mismatch fallback.
Mirrors the prior logic, looks consistent.
655-668: Retries on different node for errors is robust.
Ensures we don’t get stuck on a failing node.
673-675: Success log details.
Good to capture node used and txhash for auditing.
677-678: Tracks usage metrics again.
Consistent approach across code.modes/registry/registry.go (6)
787-802: LGTM - Good implementation of node selection for load balancing.The addition of node selection logic effectively distributes transactions across multiple nodes, improving system resilience and throughput. The fallback mechanism to the first node is a sensible design choice when selection fails.
804-807: LGTM - Improved error reporting with node context.The enhanced error messages now include which node caused the sequence retrieval failure, making debugging easier.
850-854: LGTM - Node-specific sequence refresh.The sequence refresh logic now correctly obtains the sequence from the specific node being used for the transaction, improving accuracy in a multi-node environment.
890-893: LGTM - Effective sequence mismatch handling.The improved sequence mismatch handling now correctly updates the sequence for the specific node where the mismatch occurred.
910-921: LGTM - Smart node switching logic for retries.The implementation now intelligently switches to a different node on retry attempts when errors occur, which helps avoid repeatedly hitting the same failing node.
877-882: LGTM - Node-specific sequence update.Successfully updating the sequence in the manager for the specific node used in the transaction ensures proper sequence tracking across multiple nodes.
broadcast/utils.go (7)
147-155: LGTM - Well-structured NodeSelector implementation.The NodeSelector struct is well-designed with appropriate fields for tracking nodes, their usage, health status, and synchronization mechanisms.
163-168: LGTM - Proper singleton pattern implementation.Good use of the singleton pattern with sync.Once to ensure only one global NodeSelector instance is created.
171-231: LGTM - Comprehensive node initialization and health checking.The SetNodes method properly initializes and updates the list of available nodes, performing health checks when needed and filtering out unhealthy nodes.
234-278: LGTM - Smart node selection with health prioritization.The GetNextNode method intelligently selects nodes based on position while preferring healthy nodes, which helps maintain system reliability.
315-358: LGTM - Effective load balancing with least used node selection.The GetLeastUsedNode method intelligently selects the node with the lowest usage count, prioritizing healthy nodes but falling back to any node if necessary.
384-474: LGTM - Thorough node health checking implementation.The CheckNodeHealth function performs comprehensive health verification through multiple checks:
- Testing the status endpoint
- Verifying the node is not still syncing
- Checking block height validity
- Validating mempool accessibility
This ensures only truly operational nodes are marked as healthy.
476-513: LGTM - Efficient concurrent health checking.The FilterHealthyNodes function intelligently checks node health concurrently, improving efficiency while maintaining proper synchronization with mutexes.
lib/sequence.go (6)
16-20: LGTM - Enhanced sequence management with per-node tracking.The modification to store sequences in a nested map structure (address → node → sequence) is an excellent improvement that allows for properly tracking sequences across multiple nodes for the same address.
41-47: LGTM - Good helper method implementation.The
createAddressEntryIfNotExistshelper method ensures consistency by properly initializing the nested map structure when needed, reducing code duplication and potential errors.
49-66: LGTM - Improved GetSequence with node-specific logic.The updated GetSequence method correctly handles node-specific sequence retrieval, with proper initialization and fallback mechanisms.
97-107: LGTM - Node-specific sequence increment.The IncrementSequence method now correctly increments sequences for specific node-address combinations.
110-119: LGTM - Node-specific sequence setting.The SetSequence method has been properly updated to set sequences for specific node-address combinations.
159-209: LGTM - Comprehensive sequence prefetching across all nodes.The PrefetchAllSequences method efficiently fetches sequences from all nodes for all accounts concurrently, with proper error handling and context cancellation support.
| if feeAmount < 1 && gasPrice > 0 { | ||
| feeAmount = 1 // Ensure minimum fee for non-zero gas price | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Missing validation for extreme gas price and gas limit values
The fee calculation lacks bounds checking for extreme values of gasPrice and gasLimit, which could lead to unexpected behavior with very high values.
Add validation for extreme gas values to avoid unintended fee calculations:
feeAmount := int64(gasLimit) * gasPrice / 1000 // Scale to make fees reasonable
+
+// Prevent excessive fees from extreme gas values
+const maxReasonableFee = 1000000000 // Define a reasonable maximum fee
+if feeAmount > maxReasonableFee {
+ fmt.Printf("Warning: Calculated fee is very high: %d. Capping to %d\n",
+ feeAmount, maxReasonableFee)
+ feeAmount = maxReasonableFee
+}
+
if feeAmount < 1 && gasPrice > 0 {
feeAmount = 1 // Ensure minimum fee for non-zero gas price
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if feeAmount < 1 && gasPrice > 0 { | |
| feeAmount = 1 // Ensure minimum fee for non-zero gas price | |
| } | |
| feeAmount := int64(gasLimit) * gasPrice / 1000 // Scale to make fees reasonable | |
| // Prevent excessive fees from extreme gas values | |
| const maxReasonableFee = 1000000000 // Define a reasonable maximum fee | |
| if feeAmount > maxReasonableFee { | |
| fmt.Printf("Warning: Calculated fee is very high: %d. Capping to %d\n", feeAmount, maxReasonableFee) | |
| feeAmount = maxReasonableFee | |
| } | |
| if feeAmount < 1 && gasPrice > 0 { | |
| feeAmount = 1 // Ensure minimum fee for non-zero gas price | |
| } |
|
|
||
| // 4. Prepare sequences for all accounts that will send funds | ||
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | ||
| ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider increasing the timeout for balance adjustments.
The context timeout has been reduced from 30 seconds to 2 seconds, which might be too short for balance adjustment operations that involve network calls. These operations often require querying account information and might take longer, especially on congested networks.
-ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) | |
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
Summary by CodeRabbit