- 
                Notifications
    You must be signed in to change notification settings 
- Fork 320
feat: Optimize and fix filtering on toStartOfX primary key expressions #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 🦋 Changeset detectedLatest commit: 0646139 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
 Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR | 
| The latest updates on your projects. Learn more about Vercel for GitHub. 
 💡 Enable Vercel Agent with $100 free credit for automated AI reviews | 
| Pull Request ReviewThis PR introduces an optimization for filtering on  ✅ Strengths
 🔍 Code Quality ObservationsStrong Points:
 Minor Suggestions:1. Error handling in  The function logs an error but returns  const columnArgument = args[0];
if (columnArgument == null) {
  console.error(`Failed to parse column argument from ${expr}`);
  return undefined; // Explicit return
}2. Documentation in hook (packages/app/src/hooks/useOptimizedTimestampValueExpression.ts:63) The condition check on line 63 seems to exclude Metric sources from optimization. Consider adding a comment explaining why: // Metrics use a different schema structure that doesn't benefit from this optimization
if (source.kind === SourceKind.Metric)
  return source.timestampValueExpression;3. Test naming (packages/app/src/hooks/tests/useOptimizedTimestampValueExpression.test.tsx:120) The test description says "ungracefully" but it actually handles the case gracefully by returning  it('should return undefined when source is null', () => {4. Regex pattern robustness (packages/common-utils/src/renderChartConfig.ts:487) The regex  🎯 Performance ConsiderationsPositive: 
 Consider: 
 🔒 Security ConcernsNo major security issues identified. The code: 
 📊 Test CoverageExcellent coverage: 
 One suggestion: 📝 Additional Observations
 🎓 Code Style Alignment with CLAUDE.md✅ Follows TypeScript strict typing guidelines 💡 RecommendationLGTM with minor suggestions. This is a well-implemented optimization with excellent test coverage. The minor suggestions above are optional improvements that don't block merging. The PR successfully addresses the performance issue while maintaining code quality and backward compatibility. Great work! 🚀 | 
| E2E Test Results✅ All tests passed • 25 passed • 3 skipped • 224s 
 | 
2e4ca2e    to
    1ca6d15      
    Compare
  
    1ca6d15    to
    39bc709      
    Compare
  
    39bc709    to
    36275d6      
    Compare
  
    | Pull Request Review - HyperDX #1265SummaryThis PR implements an optimization for ClickHouse queries with primary keys that use toStartOfX functions. The changes automatically detect and utilize these primary key expressions to improve query performance, and fix filtering issues with time ranges. Strengths1. Excellent Test Coverage
 2. Well-Designed API
 3. Performance Optimization
 4. Backward Compatibility
 Issues and Concerns1. Potential SQL Injection Risk (HIGH PRIORITY)Location: packages/common-utils/src/renderChartConfig.ts:560, 566 The code directly interpolates the function name and arguments into SQL. While toStartOf.formattedRemainingArgs comes from parsing timestamp expressions, it is directly injected into SQL without sanitization. Recommendation: 
 2. Error Handling Could Be More SpecificLocation: packages/common-utils/src/renderChartConfig.ts:521-523 The catch block uses console.log instead of a proper logger and silently swallows all errors. This makes debugging difficult. Recommendation: Use a proper logger with appropriate level and include context about the failure. 3. Regex Pattern Could Be More RobustLocation: packages/common-utils/src/utils.ts:598 The pattern (toStartOf\w+) matches any word characters, which could match invalid function names like toStartOfXYZ123. Recommendation: Use an explicit allowlist of valid ClickHouse toStartOf functions. 4. Type Safety IssueLocation: packages/common-utils/src/renderChartConfig.ts:511-515 No type guard or validation that primary_key exists and is a string before using it. Recommendation: Add proper type checking before destructuring. 5. Magic Strings in LogicLocation: packages/common-utils/src/utils.ts:645-649 Hard-coded function names like toUnixTimestamp and toDateTime make the code fragile and difficult to maintain. Recommendation: Extract to constants or a configuration object. Testing Recommendations
 Security ReviewOverall Security Posture: MEDIUM RISK The main security concern is the direct string interpolation of parsed timestamp expressions into SQL queries. While the code uses the chSql template tag for parameterization in most places, the formattedRemainingArgs is directly inserted. Action Items: 
 ConclusionThis is a well-implemented feature with excellent test coverage that solves a real performance problem. The core logic is sound and the approach is pragmatic. However, there are security concerns around SQL injection that should be addressed before merging. The error handling and input validation could also be improved. Recommendation: Approve with required changes 
 The test coverage is exemplary and gives high confidence in the correctness of the implementation. | 
36275d6    to
    e983faa      
    Compare
  
    e983faa    to
    711c169      
    Compare
  
    711c169    to
    1e7f9bb      
    Compare
  
    1e7f9bb    to
    7fa8ec0      
    Compare
  
    7fa8ec0    to
    779f63c      
    Compare
  
    | } catch (e) { | ||
| console.log('Failed to optimize timestampValueExpression', e); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clicking through the app, I don't run into this error, and don't expect it to happen, but am adding this try catch here as an extra layer of protection against an error either in getTableMetadata or optimizeTimestampValueExpression, in which case we can always fall back to the raw timestampValueExpression used before
779f63c    to
    0646139      
    Compare
  
    | Code ReviewOverall: Well-structured PR with comprehensive test coverage. The implementation correctly optimizes timestamp filtering for ClickHouse primary keys. Critical Issues: None found Important Notes: 
 Optional Suggestions (non-blocking): 
 Recommendation: Approved - no blocking issues | 
Closes HDX-2576
It is a common optimization to have a primary key like
toStartOfDay(Timestamp), ..., Timestamp. This PR improves the experience when using such a primary key in the following ways:toStartOfDay(Timestamp)andTimestampin this case, instead of justTimestamp. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.toStartOfXfunction to the right-hand-side of timestamp comparisons. So when filtering using an expression liketoStartOfDay(Timestamp), the generated SQL will have the conditiontoStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>). This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.