Multi-node training

Hi, thank you so much for releasing this great code base!

I noticed that your Laion blog says that the pre-training of OpenLM 1B/7B took place on 128 or 256 A100s. Therefore, I'm wondering if the current code supports multi-node training? The current training command seems to only use 4 gpus on 1 node.

Thank you very much!