Skip to content

Fix newline handling to prevent mixed line endings#11

Merged
jmcpheron merged 1 commit intomainfrom
fix-newline-issue
Jul 31, 2025
Merged

Fix newline handling to prevent mixed line endings#11
jmcpheron merged 1 commit intomainfrom
fix-newline-issue

Conversation

@jmcpheron
Copy link
Owner

  • Add intelligent newline detection that preserves original file format
  • Create .gitattributes to enforce consistent line endings in git
  • Add --normalize-newlines option for explicit conversion to Unix style
  • Update file I/O to use binary mode to prevent platform-specific conversions
  • Add comprehensive tests for newline handling scenarios
  • Store newline style in extraction maps for consistency

This prevents the issue where editing extracted files on Windows would introduce CRLF line endings that pollute git history when rebuilt.

🤖 Generated with Claude Code

- Add intelligent newline detection that preserves original file format
- Create .gitattributes to enforce consistent line endings in git
- Add --normalize-newlines option for explicit conversion to Unix style
- Update file I/O to use binary mode to prevent platform-specific conversions
- Add comprehensive tests for newline handling scenarios
- Store newline style in extraction maps for consistency

This prevents the issue where editing extracted files on Windows would
introduce CRLF line endings that pollute git history when rebuilt.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Jul 30, 2025

Claude finished @jmcpheron's task —— View job


PR Review Complete ✅

I've thoroughly reviewed this pull request for newline handling improvements. Here's my comprehensive feedback:

Review Checklist:

  • Read and analyze all changed files
  • Evaluate code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Review security concerns
  • Analyze test coverage completeness
  • Provide comprehensive feedback

Code Quality & Best Practices ⭐

Excellent implementation with clean, well-structured code:

  • Clear separation of concerns: Newline detection, file I/O, and normalization are properly separated
  • Consistent pattern across both scripts: The same detect_newline_style() function is used in both extract_literals.py and extract_virtual_domains.py
  • Good documentation: Functions have clear docstrings explaining their purpose
  • Binary file I/O: Correctly uses binary mode to prevent platform-specific conversions (extract_literals.py:72, extract_virtual_domains.py:49)
  • Proper encoding handling: Explicit UTF-8 encoding/decoding throughout

Potential Bugs/Issues ⚠️

Minor issues identified:

  1. Edge case in newline detection (extract_literals.py:33, extract_virtual_domains.py:33):

    lf_count = content.count('\n') - crlf_count  # Subtract CRLF occurrences

    This could result in negative counts if there are standalone \n characters that are part of \r\n sequences. However, the logic still works correctly due to the comparison approach.

  2. No validation for normalization conflicts: When --normalize-newlines is used, the original newline style detection is bypassed but still stored. This could be confusing - consider storing the actual style used.

Recommendations:

  • Add validation that newline counts don't go negative
  • Consider storing the effective newline style when normalization is applied

Performance Considerations 🚀

Well-optimized implementation:

  • Single file read: Both scripts read files once in binary mode to detect newlines (extract_literals.py:72-77)
  • Efficient newline counting: The detection algorithm is O(n) with the file content
  • Minimal memory overhead: No significant additional memory usage for newline tracking
  • Binary I/O prevents unnecessary conversions: Using 'rb'/'wb' modes avoids platform newline translation

No performance concerns - the implementation is efficient for the intended use case.

Security Concerns 🔒

No security issues identified:

  • No external dependencies: Only uses standard library modules
  • No network operations: All file operations are local
  • Proper file handling: Uses context managers (with statements) consistently
  • No shell execution: No subprocess calls or shell commands
  • Path handling: Uses pathlib.Path for safe path operations

The implementation follows security best practices.

Test Coverage 🧪

Comprehensive test suite (tests/test_newline_handling.py):

  • 337 lines of tests covering multiple scenarios
  • Edge cases: Empty content, mixed newlines, dominant style detection
  • Round-trip testing: Extract → Edit → Rebuild workflows
  • Platform scenarios: Windows developer → Unix server workflows
  • Normalization testing: Both preserve and normalize modes
  • Real-world scenarios: Mixed newline content handling

Excellent coverage including the critical cross-platform development scenario that motivated this fix.

Architecture Review 📐

Smart design decisions:

  1. Newline style storage: Adding newline_style to extraction maps ensures consistency (_extraction_map.json files)
  2. Opt-in normalization: The --normalize-newlines flag provides explicit control
  3. Backward compatibility: Existing extraction maps work (default to Unix style if missing)
  4. Git attributes: The .gitattributes file enforces consistent line endings in the repository

Overall Assessment 🎯

This is a high-quality pull request that addresses a real cross-platform development issue. The implementation is:

  • Robust: Handles edge cases and mixed content scenarios
  • Well-tested: Comprehensive test suite covering real-world usage
  • Maintainable: Clean, documented code following established patterns
  • Secure: No security concerns identified
  • Performant: Efficient implementation with minimal overhead

Recommendation: Approve - This PR successfully solves the mixed line ending problem while maintaining code quality and backward compatibility.


@jmcpheron jmcpheron merged commit a052863 into main Jul 31, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant