Skip to content

Stream processing for large file checksums (TypeScript) #369

@nikblanchet

Description

@nikblanchet

Summary

TypeScript FileTracker.createSnapshot() loads entire files into memory for checksum computation. This is inefficient for large files (>10MB). Python already uses chunked reading - TypeScript should match.

Current Behavior

TypeScript (cli/src/utils/file-tracker.ts:53):

const fileBuffer = await fs.readFile(filepath);  // Loads entire file into memory
const hash = createHash('sha256');
hash.update(fileBuffer);

Python (already efficient):

for chunk in iter(lambda: f.read(8192), b""):
    sha256_hash.update(chunk)

Proposed Enhancement

Use streaming API for checksums.

Implementation

TypeScript:

import { createReadStream } from 'node:fs';

const hash = createHash('sha256');
const stream = createReadStream(filepath);

for await (const chunk of stream) {
  hash.update(chunk);
}

const checksum = hash.digest('hex');

Benefits

  • Constant memory usage regardless of file size
  • Handles multi-GB files gracefully
  • Matches Python implementation

Files

  • cli/src/utils/file-tracker.ts:53

Effort

~3 hours

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    effort-mediumMedium effort: 2-4 hoursenhancementNew feature or requestimpact-mediumMedium impact on users or systemperformancePerformance optimization issuespost-mvpPost-MVP feature, not needed for initial releasetypescriptTypeScript-specific issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions