Skip to content

optimized data gen for lower spec servers to avoid RAM issues#4

Open
epatel16 wants to merge 1 commit intomainfrom
optimize-data-gen
Open

optimized data gen for lower spec servers to avoid RAM issues#4
epatel16 wants to merge 1 commit intomainfrom
optimize-data-gen

Conversation

@epatel16
Copy link
Collaborator

Optimized the data generation process for creating pretraining and finetuning data to work for lower spec servers to avoid RAM accumulation issues.

@epatel16 epatel16 requested a review from glchau October 16, 2025 05:37
Copy link
Collaborator

@glchau glchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! I had a few questions about what happens if we run this in parallel and code clarifications.



manifest_path = os.path.join(data_path, "manifest.tsv")
manifest_path = os.path.join(data_path, f"subject_manifests/{cfg.data_prep.subj}/manifest.tsv")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, but any reason for this change?

total_words = sum(arr.shape[1] for arr in seeg_data)
n_electrodes, _, n_samples = seeg_data[0].shape

neural_data_path = "neural_data_memmap.dat"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will this potentially overwrite another ongoing process? Maybe we can have a unique name for each run?

  2. We may also want to delete this after the run to reduce disk space accumulation

raise RuntimeError("Task not found")

def write_trial_data_piecemeal(subject, brain_run, extracter, data_cfg_template_copy, cfg):
output_path = os.path.join(cfg.data_prep.output_directory, subject, brain_run)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants