optimized data gen for lower spec servers to avoid RAM issues#4
Open
optimized data gen for lower spec servers to avoid RAM issues#4
Conversation
glchau
requested changes
Oct 20, 2025
Collaborator
glchau
left a comment
There was a problem hiding this comment.
Thanks for putting this together! I had a few questions about what happens if we run this in parallel and code clarifications.
|
|
||
|
|
||
| manifest_path = os.path.join(data_path, "manifest.tsv") | ||
| manifest_path = os.path.join(data_path, f"subject_manifests/{cfg.data_prep.subj}/manifest.tsv") |
Collaborator
There was a problem hiding this comment.
Seems fine, but any reason for this change?
| total_words = sum(arr.shape[1] for arr in seeg_data) | ||
| n_electrodes, _, n_samples = seeg_data[0].shape | ||
|
|
||
| neural_data_path = "neural_data_memmap.dat" |
Collaborator
There was a problem hiding this comment.
-
Will this potentially overwrite another ongoing process? Maybe we can have a unique name for each run?
-
We may also want to delete this after the run to reduce disk space accumulation
| raise RuntimeError("Task not found") | ||
|
|
||
| def write_trial_data_piecemeal(subject, brain_run, extracter, data_cfg_template_copy, cfg): | ||
| output_path = os.path.join(cfg.data_prep.output_directory, subject, brain_run) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimized the data generation process for creating pretraining and finetuning data to work for lower spec servers to avoid RAM accumulation issues.