You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 22, 2024. It is now read-only.
Hello,
Thank you for the great work. I am studying federated learning in NLP. I tried to reproduce the results in the paper (mainly LSTM on Shakespeare dataset) but results seem very off from what it should be. Please help me recheck what I missed in my experiments.
(1) The Shakespeare data preprocessing is noted like below in the paper:
So I use the command like this to preprocess the data:
(2) It is indicated the the paper that experiments were done with 1-Layer LSTM.
Anyway, reading from the code, I believe it is equal to setting:
NUM_LAYERS=3
As it will have one input layer, one output layer and one hidden LSTM layer (where the invariant permutation problem is addressed by FedMA)
(3) It is noted in the paper that FedAvg and FedProx awere trained with 33 communication rounds, while FedMA was trained with 11 communication rounds (because each round of FedMA requires 3 communication rounds correspoding to number of LSTM layers). I actually used 30 for FedAvg and FedProx and 10 for FedMA like these:
For FedAvg python language_main.py --mode=fedavg --comm-round=30
For FedProx python language_main.py --mode=fedprox --comm-round=30
For FedMA python language_main.py --mode=fedma --comm-round=10
(I do not think --comm-round has any effect in FedMA anyway because the code perform single round of FedMA)
Then I performed the rest of FedMA communication round by running python lstm_fedma_with_comm.py
(The lstm_fedma_with_comm.py has 10 communication rounds hard-coded)
(4) The results seem not aligned with what indicated in the paper. While FedProx got lower test accuracy than FedAvg, but FedMA also got lower accuracy than FedAvg too.
For FedAvg
For FedProx
For FedMA
Result from the first step (language_main.py)
Result from the second step (lstm_fedma_with_comm.py)
Results from the paper
Actually my FedAvg got substantially higher accuracy than in the paper. It reach 0.5 test accuracy while non of these 3 approachs reach such accuracy in the paper.
** I did not tune E (local training epoch) and use default value (5) but the results are still not align with indicated in the paper for E=5 anyway.
Thank you in advance for your help.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello,
Thank you for the great work. I am studying federated learning in NLP. I tried to reproduce the results in the paper (mainly LSTM on Shakespeare dataset) but results seem very off from what it should be. Please help me recheck what I missed in my experiments.
(1) The Shakespeare data preprocessing is noted like below in the paper:
So I use the command like this to preprocess the data:
./preprocess.sh -s niid --sf 1.0 -k 0 -t sample -tf 0.8 -k 10000
(2) It is indicated the the paper that experiments were done with 1-Layer LSTM.

Anyway, reading from the code, I believe it is equal to setting:
NUM_LAYERS=3
As it will have one input layer, one output layer and one hidden LSTM layer (where the invariant permutation problem is addressed by FedMA)
(3) It is noted in the paper that FedAvg and FedProx awere trained with 33 communication rounds, while FedMA was trained with 11 communication rounds (because each round of FedMA requires 3 communication rounds correspoding to number of LSTM layers). I actually used 30 for FedAvg and FedProx and 10 for FedMA like these:
For FedAvg
python language_main.py --mode=fedavg --comm-round=30
For FedProx
python language_main.py --mode=fedprox --comm-round=30
For FedMA
python language_main.py --mode=fedma --comm-round=10
(I do not think --comm-round has any effect in FedMA anyway because the code perform single round of FedMA)
Then I performed the rest of FedMA communication round by running
python lstm_fedma_with_comm.py
(The lstm_fedma_with_comm.py has 10 communication rounds hard-coded)
(4) The results seem not aligned with what indicated in the paper. While FedProx got lower test accuracy than FedAvg, but FedMA also got lower accuracy than FedAvg too.
For FedAvg

For FedProx

For FedMA

Result from the first step (language_main.py)
Result from the second step (lstm_fedma_with_comm.py)

Results from the paper

** I did not tune E (local training epoch) and use default value (5) but the results are still not align with indicated in the paper for E=5 anyway.
Thank you in advance for your help.
The text was updated successfully, but these errors were encountered: