Skip to content

Conversation

@honix
Copy link
Owner

@honix honix commented Dec 21, 2025

No description provided.

@claude
Copy link

claude bot commented Dec 21, 2025

Pull Request Review: TSX, CSV formats, agent can create view

Summary

This PR adds significant new functionality: CSV file support, runtime TSX compilation with React views, and collaborative editing via Yjs. The implementation is ambitious and well-structured, with +2124/-125 lines across 20 files. Overall quality is good, but there are several security concerns and architectural considerations that should be addressed.


Critical Security Concerns

1. Code Injection via new Function() (HIGH SEVERITY)

Location: frontend/src/components/views/ViewRuntime.tsx:92-94, 437-438

The use of new Function() to execute user-provided TSX code creates a serious code injection vulnerability. Any user with wiki edit access can execute arbitrary JavaScript in other users' browsers.

Risk: Steal credentials/tokens, make unauthorized API calls, execute XSS attacks, access localStorage/sessionStorage, modify the DOM maliciously

Recommendation:

  • Add server-side validation/sandboxing for TSX content before allowing execution
  • Implement strict Content Security Policy (CSP) headers
  • Consider using a safer alternative like isolated iframes with sandboxed srcdoc
  • Add user permission checks (only trusted users can create .tsx files)
  • Document this as a known security consideration in CLAUDE.md

2. CSV Injection Potential

Location: backend/storage/git_wiki.py:400-408, frontend/src/hooks/useCSV.ts:414-437

CSV parsing does not sanitize values that could contain formula injection attacks. Values starting with =, +, @, - could be interpreted as formulas in Excel/Sheets.

Recommendation:

  • Prefix potentially dangerous values with single quote
  • Add validation to reject/escape formula characters
  • Document CSV security considerations

3. Hardcoded WebSocket URLs

Location: Multiple files (useCSV.ts:160, usePage.ts:143)

WebSocket URLs are hardcoded to ws://localhost:8000/ws/collab which will not work in production and could leak data over unencrypted connections.

Recommendation:

  • Use environment variables or derive from window.location
  • Always use wss:// in production for encryption

Code Quality Issues

4. Missing Error Handling in Async Operations

Location: useCSV.ts:229-239, usePage.ts:210-223

Fetch operations in sync handlers lack proper error boundaries. No error notification to user when page loading fails.

5. Memory Leak Potential

Location: useCSV.ts:56-66, usePage.ts:46-54

Session maps (activeDataSessions, activeTextSessions) grow indefinitely and are never cleaned up automatically.

Recommendation:

  • Implement automatic cleanup after inactivity timeout
  • Add session TTL or LRU eviction
  • Provide explicit cleanup on page/branch navigation

6. Race Condition in Initial Load

Location: useCSV.ts:196-207, usePage.ts:178-189

The 500ms timeout is arbitrary and could cause race conditions. Use provider sync event completion rather than arbitrary timeout.

7. CSV Parser Limitations

Location: useCSV.ts:414-437

The custom CSV parser does not handle multi-line quoted fields properly, different line endings (CRLF vs LF), or edge cases like trailing commas.

Recommendation: Consider using a battle-tested library like papaparse or document limitations.

8. Type Safety Issues

Location: ViewRuntime.tsx:87-101

Loose typing around component compilation. Add runtime type validation for compiled components.


Performance Considerations

9. No Memoization in ViewRuntime

Location: ViewRuntime.tsx:264-487

The scope object is recreated on every render. Memoize the scope object with useMemo to prevent unnecessary recompilations.

10. Unbounded Component Cache

Location: ViewRuntime.tsx:22-27

Component cache has no size limits. Implement cache size limits or LRU eviction.

11. Debounce Configuration

Location: useCSV.ts:14, usePage.ts:14

2-second debounce may be too long for real-time collaboration feel. Make configurable or reduce to 500-1000ms.


Positive Observations

  1. Excellent Documentation: Template examples in backend/templates/agents/ are clear and helpful
  2. Consistent Patterns: useCSV and usePage follow similar patterns, good for maintainability
  3. Real-time Collaboration: Yjs integration is well-implemented with awareness features
  4. Error Boundaries: ViewErrorBoundary provides good user experience
  5. Git Integration: Multi-file-type support is cleanly integrated into existing GitWiki
  6. Component Reusability: useComponent hook enables composition of views
  7. Save Status Tracking: Aggregated save status across multiple useCSV calls is clever

Recommendations Priority

Must Fix Before Merge:

  1. Address code injection security (consider sandboxing or trusted-user-only)
  2. Fix hardcoded WebSocket URLs
  3. Add CSP headers for TSX execution

Should Fix Soon:
4. Improve error handling in async operations
5. Add memory leak prevention (session cleanup)
6. Fix race condition in initial load
7. Add basic test coverage

Nice to Have:
8. Memoize ViewRuntime scope
9. Use robust CSV parser library
10. Add comprehensive test suite
11. Document security model in CLAUDE.md


Overall Assessment

This is a well-architected feature with clean separation of concerns and good code organization. The real-time collaboration and runtime TSX compilation are impressive technical achievements.

However, the security implications are significant. The new Function() execution of user-provided code is a critical vulnerability that must be addressed before this can be safely deployed in any multi-user environment.

Recommendation: Request changes to address security concerns, then approve with minor fixes for quality issues.

Great work overall! The feature set is valuable and the implementation shows strong engineering skills. Just needs security hardening before production use.

@claude
Copy link

claude bot commented Dec 21, 2025

Code Review for PR #42: TSX, CSV formats, agent can create views

Summary

This is a substantial feature addition that introduces interactive data views and CSV editing capabilities to the wiki. The implementation includes TSX runtime compilation, collaborative CSV editing via Yjs, and template examples for agents. Overall, the code is well-structured with good separation of concerns, but there are several areas that need attention.

🔴 Critical Issues

1. Security: Arbitrary Code Execution (High Severity)

Location: frontend/src/components/views/ViewRuntime.tsx:437

The TSX runtime uses new Function() to execute user-provided code:

const fn = new Function(...scopeKeys, wrappedCode);
const result = fn(...scopeValues);

Risk: This allows arbitrary JavaScript execution, which is a severe XSS vulnerability. Malicious TSX files could:

  • Access localStorage/sessionStorage
  • Exfiltrate data to external servers
  • Modify the DOM outside the component
  • Access cookies and authentication tokens

Recommendation:

  • Implement a Content Security Policy (CSP) to restrict script execution
  • Add input validation and sanitization for TSX content
  • Consider using a sandboxed iframe for view execution
  • Add access control to restrict who can create/edit TSX files
  • Document the security model clearly

2. CSV Parsing Error Handling

Location: backend/storage/git_wiki.py:272-280

try:
    reader = csv.DictReader(io.StringIO(raw_content))
    rows = list(reader)
    base_result["headers"] = reader.fieldnames or []
    base_result["rows"] = rows
except Exception:
    # Fallback if CSV parsing fails
    base_result["headers"] = []
    base_result["rows"] = []

Issue: Silent exception swallowing hides actual errors from users and makes debugging difficult.

Recommendation:

except csv.Error as e:
    logger.warning(f"CSV parsing failed for {filepath}: {e}")
    base_result["headers"] = []
    base_result["rows"] = []
    base_result["parse_error"] = str(e)

3. Undefined Variable in git_wiki.py

Location: backend/storage/git_wiki.py:840

return {
    # ... other fields ...
    "order": order,  # ❌ order is not defined in this scope
    "parent_path": parent_path,
    "children": []
}

Impact: This will cause a runtime error when creating folders.

Fix: Remove the "order" field since automatic numbering was removed.

⚠️ High Priority Issues

4. Race Condition in useCSV Hook

Location: frontend/src/hooks/useCSV.ts:196-207

The initial load flag is set after a 500ms timeout, which is arbitrary and could cause race conditions:

setTimeout(() => {
    const session = activeDataSessions.get(pageId);
    if (session) {
        session.isInitialLoad = false;
        session.lastSavedContent = serializeDataToCSV(session.yArray);
    }
}, 500);

Recommendation: Use proper event-based synchronization instead of arbitrary timeouts.

5. Memory Leaks - Component Cache Never Cleaned

Location: frontend/src/components/views/ViewRuntime.tsx:22-27

The componentCache is a module-level Map that grows indefinitely:

const componentCache = new Map<string, {...}>();

Issue: Components are loaded but never removed, causing memory leaks in long-running sessions.

Recommendation: Implement cache eviction (LRU) or reference counting.

6. Hardcoded URLs

Multiple locations have hardcoded localhost:8000 URLs:

  • useCSV.ts:160
  • useCSV.ts:230
  • ViewRuntime.tsx:351

Recommendation: Use environment variables or a central configuration.

7. CSV Escaping Issues

Location: frontend/src/hooks/useCSV.ts:414-437

The CSV parser doesn't handle all edge cases (e.g., newlines in quoted fields, escaped quotes at field boundaries).

Recommendation: Use a well-tested CSV library like papaparse instead of rolling your own parser.

📝 Code Quality Issues

8. Inconsistent Error Handling

The codebase mixes different error handling patterns:

  • Silent catches in git_wiki.py CSV parsing
  • Verbose logging in ViewRuntime
  • Mixed use of try/catch vs error states

Recommendation: Establish consistent error handling patterns.

9. Type Safety Issues

Location: frontend/src/components/views/ViewRuntime.tsx:241-258

Table components accept any for children but should be more specific:

Table: ({ children, className = '' }: { children: React.ReactNode; className?: string })

This is correct, but should use a const type assertion for the components.

10. Missing Input Validation

Location: backend/storage/git_wiki.py:_get_file_type

No validation that the filepath is actually a Path object. Consider adding type guards.

11. No Automated Tests

This is a large feature with complex logic (CSV parsing, TSX compilation, collaborative editing) but I don't see test files for:

  • CSV parsing edge cases
  • TSX compilation errors
  • Collaborative editing conflicts
  • Template initialization

Recommendation: Add comprehensive test coverage, especially for:

  • CSV parsing with malformed input
  • TSX compilation with syntax errors
  • Collaborative editing race conditions
  • Git operations with templates

🎯 Performance Considerations

12. Inefficient Re-renders

Location: frontend/src/hooks/useCSV.ts:278

existingSession.yArray.observeDeep(updateRows);

This triggers on every deep change. For large datasets, this could cause performance issues.

Recommendation: Consider throttling or debouncing the updateRows callback.

13. No Pagination for Large CSVs

The CSV editor loads all rows into memory. This will cause performance issues with large datasets.

Recommendation: Implement virtual scrolling or pagination for CSV files with >1000 rows.

✅ Positive Aspects

  1. Good Architecture: Clean separation between backend (git_wiki) and frontend (hooks, components)
  2. Template System: The template initialization system is well-designed
  3. Collaborative Editing: Yjs integration for real-time collaboration is solid
  4. TypeScript Usage: Good use of types in the frontend
  5. Component Reusability: The useComponent hook is a clever pattern
  6. Documentation: Good inline comments and the data-views.md reference is helpful

🔧 Minor Issues

  1. Magic Numbers: SAVE_DEBOUNCE_MS = 2000 should be configurable
  2. Console.log Pollution: Several console.log statements should use proper logging
  3. Commented Code: Line 277 in ViewRuntime.tsx: void _loadedComponents; is confusing
  4. Inconsistent Naming: Mix of camelCase and snake_case in some areas

📋 Testing Recommendations

Before merging, please add tests for:

  1. Backend:

    • CSV parsing with various edge cases (quotes, commas, newlines)
    • Template initialization and overwriting
    • File type detection
    • Multi-file type support in search/glob
  2. Frontend:

    • TSX compilation errors
    • Component cache management
    • CSV serialization/deserialization
    • Collaborative editing conflicts
    • Save status aggregation

🚀 Deployment Checklist

Before deploying to production:

  • Fix the undefined order variable (Critical)
  • Address the code execution security concern
  • Add CSP headers
  • Replace hardcoded URLs with config
  • Add error logging for CSV parsing
  • Implement cache eviction for component cache
  • Add comprehensive tests
  • Document security model for TSX files
  • Add rate limiting for TSX compilation
  • Review and restrict TSX file creation permissions

Summary

This is a powerful feature that significantly extends the wiki's capabilities. The core architecture is sound, but there are critical security and reliability issues that must be addressed before merging. The arbitrary code execution via TSX compilation is the most serious concern and needs immediate attention.

Recommendation: Request Changes - Fix critical issues before merge.


Great work on the feature implementation! The template system and collaborative editing are particularly well done. With the security and reliability fixes, this will be a valuable addition to the project.

@claude
Copy link

claude bot commented Dec 21, 2025

PR Review: TSX, CSV Formats, Agent Can Create View

Summary

This PR adds support for interactive TSX views and CSV data files to the wiki system, enabling agents to create data-driven visualizations. The changes are substantial (2550 additions, 227 deletions across 51 files) and introduce a new runtime compilation system for React components.


Strengths

1. Architecture & Design

  • Clean separation of concerns: File type detection logic centralized in GitWiki._get_file_type()
  • Consistent extension handling: SUPPORTED_EXTENSIONS constant used throughout
  • Template system: Smart auto-initialization of example files from backend/templates/
  • Backwards compatible: Existing markdown workflow unchanged

2. Code Quality

  • Good documentation: agents/data-views.md provides clear examples and API reference
  • Consistent patterns: File type handling follows same patterns across list_pages, glob_pages, search_pages_regex
  • Type safety: Frontend uses proper TypeScript types (FileType, ViewFile)
  • Error handling: Try-catch blocks in CSV parsing and view compilation

3. User Experience

  • Rich examples: Kanban board, data table, and simple view examples show real-world usage
  • Auto-scrolling: StickToBottom integration in ViewRuntime console
  • Visual feedback: Loading states, error messages, file type badges in UI

⚠️ Issues & Concerns

CRITICAL: Security Vulnerabilities

1. Arbitrary Code Execution Risk 🔴

Location: frontend/src/components/views/ViewRuntime.tsx:82-95

Using new Function() to execute user-provided TSX content creates a major XSS vulnerability. Malicious code in TSX files can:

  • Access localStorage/sessionStorage
  • Make arbitrary network requests
  • Steal authentication tokens
  • Modify the DOM outside the sandbox
  • Execute arbitrary JavaScript

Recommendation:

  • Add a CSP (Content Security Policy) to restrict what compiled code can do
  • Run views in an isolated iframe with sandbox attributes
  • Use a proper sandboxing library
  • Add explicit trust boundaries - mark TSX files as requiring review before execution
  • Document security model clearly in agents/data-views.md

2. Path Traversal Risk 🟡

Location: backend/storage/git_wiki.py:108

No validation that rel_path stays within pages_dir. A malicious template could use ../ to write outside the wiki.

Recommendation: Resolve and validate path is within pages_dir before writing.

Performance Concerns

3. N+1 File Reads 🟡

Location: backend/storage/git_wiki.py:553-575 (list_pages)

_get_file_type is called for every file, even though it's a simple suffix check.

Recommendation: Extract suffix check inline to avoid function call overhead.

4. ViewRuntime Compilation on Every Render 🟠

Location: frontend/src/components/views/ViewRuntime.tsx:169-171

TSX gets recompiled every time content or csvData changes, even if content hasn't changed.

Recommendation: Memoize compiled function using useMemo.

Code Quality Issues

5. Inconsistent Error Handling 🟡

Silent failures in CSV parsing make debugging difficult. Errors aren't logged or reported to users.

Recommendation: Log errors and add an error field to return values.

6. Magic String Constants 🟡

String literals like 'csv', 'tsx', '.csv' repeated throughout codebase (~25 times).

Recommendation: Define constants for file types and extensions.

7. Missing Input Validation 🟡

Location: backend/storage/git_wiki.py:1420 (glob_pages)

No protection against expensive glob patterns like **/*/**/*/**/*/**.

Recommendation: Add pattern validation (e.g., max wildcard count).

8. Hardcoded Babel Configuration 🟡

Location: frontend/src/components/views/ViewRuntime.tsx:55-70

Babel config embedded in component makes it hard to update or test.

Recommendation: Extract to src/utils/tsxCompiler.ts for reusability and testing.

Testing Gaps 🔴

9. No Tests for New Features

PR adds 2550 lines of code with zero test files.

Critical missing tests:

  • CSV parsing edge cases (malformed CSV, empty files, special characters)
  • TSX compilation errors (syntax errors, runtime errors, infinite loops)
  • File type detection (_get_file_type)
  • Template initialization (ensure_templates)
  • useCSV hook (add/update/delete operations)
  • ViewRuntime error boundaries

Recommendation: Add test files for backend CSV/TSX support and frontend ViewRuntime component.

Documentation Issues

10. Incomplete Security Documentation 🟠

Location: backend/templates/agents/data-views.md

Missing:

  • Security model for TSX execution
  • Trust boundaries (who can create TSX files?)
  • Restrictions on what TSX code can do
  • Code review requirements for TSX files

11. Missing Migration Guide 🟡

No documentation for existing users on:

  • How templates get initialized
  • Whether existing wikis need migration
  • What happens to existing markdown files

🔍 Minor Issues

  1. Unused import: frontend/src/components/views/ViewRuntime.tsx:2 imports memo but never uses it
  2. Inconsistent naming: file_type (Python) vs fileType (TypeScript) - consider standardizing
  3. Magic number: limit: int = 50 in glob_pages - should be a constant
  4. No rate limiting: ViewRuntime could be abused to execute expensive code repeatedly
  5. Missing PropTypes: CSVEditor component lacks prop validation
  6. Console pollution: Multiple console.log statements in ViewRuntime should use proper logging

📋 Recommendations

Must Fix (Before Merge)

  1. ✅ Add iframe sandboxing or CSP for TSX execution
  2. ✅ Add path traversal protection in ensure_templates
  3. ✅ Add basic tests for CSV parsing and file type detection
  4. ✅ Document security model for TSX views

Should Fix (High Priority)

  1. Extract file type constants to avoid magic strings
  2. Add error logging for CSV parsing failures
  3. Memoize ViewRuntime compilation
  4. Add input validation for glob patterns

Nice to Have

  1. Extract Babel config to separate module
  2. Add comprehensive test suite
  3. Add migration guide documentation
  4. Consider rate limiting for view compilation

🎯 Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5) - Well-structured with good patterns, but needs security hardening

Security: ⭐⭐ (2/5) - Critical XSS risk from arbitrary code execution

Performance: ⭐⭐⭐ (3/5) - Generally good, but compilation could be optimized

Test Coverage: ⭐ (1/5) - No tests for major new features

Documentation: ⭐⭐⭐ (3/5) - Good examples, missing security docs

Verdict

This is an exciting feature that significantly expands the wiki's capabilities! The architecture is solid and the examples are excellent. However, the security concerns around TSX execution are blocking issues that must be addressed before merge.

Recommendation: Request changes for security hardening, then approve once addressed.


Great work on the feature implementation! The template system and examples are particularly well done. Let me know if you'd like help implementing any of the security fixes.

@honix honix marked this pull request as ready for review December 21, 2025 18:44
@honix honix merged commit 3fdbea4 into main Dec 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants