-
Hi Nakib, •If I disregard this error, I ran the Si example using a (50,150) mesh on four nodes, each equipped with 28 Intel(R) Xeon(R) CPU E5-2680 [email protected] cores, just as you did. However, the code remained in a state like the one below for more than six hours. Is this normal, or could this be caused by the errors mentioned above? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
Dear Ziwen, Thanks for your kind words and your question. It is not normal for the code to be stuck at that stage. I have seen this before with builds with
This way,
Please report back if this solves your issue. Another point: I recommend staying up to date with the latest Best, |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Dear Nakib, |
Beta Was this translation helpful? Give feedback.
Dear Nakib,
The issue I encountered has been resolved after recompiling elphbolt. As you suggested, I conducted additional tests with different (q, k) meshes. I found that when running the code on a single node with 96 cores, all calculations completed successfully within a reasonable time. However, when using multiple nodes, especially more than three, the code froze unexpectedly and exited abnormally. This led me to suspect a parallelization issue.
To address this, I recompiled the latest elphbolt using GCC 12.3.0, OpenCoarrays 2.10.2, and FPM 0.9.0. Moreover, I obtained a temporary storage directory from the cluster administrator that is better suited for intensive I/O operations. Now,…