https://zenodo.org/records/5055046
- Locally create new virtual environment
python3 -m venv <env_name>
- Run
pip install -r requirements.txt
to install all dependencies - To add new libraries, run
pip freeze > requirements.txt
to update requirements.txt file
This repo runs on EEG dataset located at https://zenodo.org/records/5055046 [1]
In total, this dataset has ~13,000 epochs with 61 channels. 'Epoch' in this context is a segment of continuous EEG data or a sample.
Data is loaded using the draft_loader.py
program, saved as all_epochs.pickle
.
Data shape:
-
$13410$ = number of epochs/samples -
$61$ = number of channels -
$500$ = number of timepoints =$2$ seconds of$250$ Hz data.
Data is normalised on a channel by channel basis per mini-batch (e.g. ~200 out of ~13000 samples).
Predict the last few datapoints on the basis of the first many datapoints. This is a sequence-to-sequence regression task.
- Run
draft_loader.py
to generateall_epochs.pickle
. - Run
split_data.py
to rearrange data to be [batch_size, timepoints=500, channels=61] train and test split data intorch
format (train_loader.pth
andtest_loader.pth
). - Run embedding in linear layer
- Pass through transformer
- Revert to output dimensions for comparison
- Implement loss function + loop
80% training and 20% test. Evaluated to see how similar model predictions are as a linear regression problem.
The problem is a sequence-to-sequence regression task, in which the last 50 timepoints are predicted on the basis of the first 450. The dataset is split in accordance to that and saved as train_loader.pth
and test_loader.pth
with a batch size of 32.
Timepoints can both be size 450 - the test dataset will be from timepoints 50 to 500 and training is from 0 to 450.
61 is a prime number. Data passed through a linear layer for embedding the input to a hidden dimension using self.embedding = nn.Linear(input_channels, hidden_dim)
. This embeds the data to a nicer number that is divisible by the number of heads.
- input_channels = 61
- timepoints = c. 500
- hidden_dim = hidden_dim = 64
- num_heads = 4
- num_layers = 4 (number of layers for transformer)
- output_dim = 64
src: (batch_size, channels, timepoints) -> [batch_size, timepoints, channels]
src = src.permute(2, 0, 1) # Shape: [timepoints, batch_size, channels]
tgt = tgt.permute(2, 0, 1) # Shape: [timepoints, batch_size, channels]
transformer = nn.Transformer(d_model=hidden_dim, nhead=num_heads, num_encoder_layers=num_layers)
transformer_out = transformer(src, tgt)
Convert back from 64 to 61 channels, compare to true value using output = fc_out(transformer_out)
.
[1] Hinss, M. F., Darmet, L., Somon, B., Jahanpour, E., Lotte, F., Ladouce, S., & Roy, R. N. (2021). An EEG dataset for cross-session mental workload estimation: Passive BCI competition of the Neuroergonomics Conference 2021 (Version 2) [Data set]. Neuroergonomics Conference, Munich, Germany. Zenodo. https://doi.org/10.5281/zenodo.5055046