-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with cA2.60.32 on P=-10-1 irrep=B2 #30
Comments
For some reason this went through for two configurations this time: $ ls resolved_-10-1_B2_*
resolved_-10-1_B2_2496.js resolved_-10-1_B2_5328.js |
This is very strange. My first instinct would be to guess that it's related to having too many HDF5 files open at the same time (I could imagine that these are internally opened using |
I guess in the original description you mean |
I really don't get it either. And there are not too many HDF5 files open, I start a new R process for every configuration and every irrep. It just crashes. And since it worked on two configurations, there cannot be something completely wrong with the program or the files. |
I meant globally. When there are O(30) projection jobs running, the number of memory mapped files will be rather large and this might be problematic for Lustre. What if you run a projection for a single config on QBIG? |
After all the projections were done, I did try that to see what the issue was. It seems that even with a single irrep in the whole cluster there is a problem. I will find out how the other ensembles fare with that, perhaps it is always this irrep or just that irrep on cA2.60.32. |
Hah, we figured in the end. @matfischer observed the same problem and it was solved by reinstalling rhdf5 :) |
not so fast, apparently... |
I am re-running the projections on cA2.60.32 and they work just fine for almost all irreps in every configuration. There is just one exception, namely
P = (-1, 0, -1)
in the B₂ irrep. And that for every configuration. It is always this output:I have tried to restart these jobs, but that did not help either. We had some random segfaults before, but this is consistent. It seems that it has something to do with the actual files. And it happens on all of the nodes that I have tried.
The only difference in input is the prescription file. And that does not differ from the other ensembles. And the ones related with a global rotation are just fine.
For the meantime I will just skip that B₂ irrep at P² = 2, but it feels very peculiar and I still have no idea what happens there.
The text was updated successfully, but these errors were encountered: