Skip to content

Error using FastKCI #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
harrydesmond opened this issue Feb 26, 2025 · 9 comments
Open

Error using FastKCI #222

harrydesmond opened this issue Feb 26, 2025 · 9 comments

Comments

@harrydesmond
Copy link

When using the FastKCI method for an FCI search, I often obtain the following error:

Traceback (most recent call last):
 File "/mnt/users/hdesmond/Causality/run_cl_3.py", line 163, in <module>
   g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_kn
owledge)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/search/ConstraintBased/FCI.py", line 1077, in fci
   graph, sep_sets, test_results = fas(dataset, nodes, independence_test_method=independence_test_method, alpha=alpha,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FAS.py", line 115, in fas
   p = cg.ci_test(x, y, S)
       ^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/graph/GraphClass.py", line 58, in ci_test
   return self.test(i, j, S)
          ^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/cit.py", line 480, in __call__
   self.kci_ci.compute_pvalue(self.data[:, Xs], self.data[:, Ys], self.data[:, condition_set])[0]
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FastKCI/FastKCI.py", line 69, in compute_pvalue
   self.Z_proposal = Parallel(n_jobs=-1)(delayed(self.partition_data)() for i in range(self.J))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 2007, in __call__
   return output if self.return_generator else list(output)
                                               ^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1650, in _get_outputs
   yield from self._retrieve()
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1754, in _retrieve
   self._raise_error_fast()
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
   error_job.get_result(self.timeout)
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 745, in get_result
   return self._return_or_raise()
          ^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 763, in _return_or_raise
   raise self._result
ValueError: sum(pvals[:-1]) > 1.0

This is only for some datasets (others seem to work fine), and in cases where FastKCI fails like this, KCI works fine. Any idea what this means or what to do about it? I have a very large, nonlinear dataset so really need to use FastKCI...

@kunwuz
Copy link
Collaborator

kunwuz commented Feb 28, 2025

FastKCI is an ongoing work by @OliverSchacht and @Biwei-Huang , so the current implementation might not be the final version. For very large nonlinear datasets, RCIT may also be worth trying

@harrydesmond
Copy link
Author

Thanks for this. I'm trying RCIT now. On mock data resembling my real dataset I find good performance only if I use an extremely small p-value threshold, 1e-11 to 1e-14. Is this reasonable / expected at all?

@kunwuz
Copy link
Collaborator

kunwuz commented Feb 28, 2025

aha I see, thanks for reporting. Perhaps @OliverSchacht has more intuition on this?

@OliverSchacht
Copy link
Contributor

Hi @harrydesmond ,

thanks for reporting this issue and sorry for the belated response.
Concerning RCIT, I cannot provide so many insights.

Concerning FastKCI, as @kunwuz mentioned it is ongoing work, so I would gladly like to look into this issue in more detail. It looks like when partitioning the data there is an error that happens inside the parallelization so it needs a bit of debugging to find out what is going wrong. Do you have code reproducing this?

Thanks and best,

Oliver

@harrydesmond
Copy link
Author

Strangely I cannot reproduce it reliably even when I fix the numpy random seed. I have a script that produces that error every time. However if I add one line that simply saves the data to file before running FCI, the error does not occur and the FCI runs fine. If I make another MWE script that loads the saved data with all settings the same, the error also does not occur. Very confused how this is possible.

@OliverSchacht
Copy link
Contributor

I see. I'm not sure if in the current version a seed impacts the RNG inside the joblib parallelization instances. So what you observe might as well happen at a chance and thus only some runs fail. I did not encounter this error in my simulations yet, but I will look later if there is a way to reproduce it. Unfortunately this traceback does not tell alot about what's going wrong but again, joblib is tricky to debug.

@harrydesmond
Copy link
Author

Ah it seems the FCI ran fine for much longer, but didn't actually complete. Here's some code and data that I hope will allow you to reproduce it. Sometimes it throws the error within 10 seconds of starting, sometimes it takes an hour. I've been running it on 28 cores in case that makes a difference.

from causallearn.search.ConstraintBased.FCI import fci
import numpy as np

np.random.seed(0)

indep_test_method = 'fastkci'

pval_threshold = 0.01
depth = -1
max_path_length = -1
verbose = True
background_knowledge = None

data = np.loadtxt("test_data.txt")

g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_knowledge)

test_data.txt

@OliverSchacht
Copy link
Contributor

Found it (I think) - due to numerical reasons, sometimes the weights for the multinomial when partitioning the data did not sum up to 1 and thus numpy threw this error (see this related issue).

I added a explicited normalization step that should prevent this, however, in very rare cases I could imagine this step having numerical issues too and then it might break (again). I ran your test data twice successfully without any errors occuring.

The fix is here. Opened a PR #228 too.

Would be very interested on your general feedback concerning FastKCI.

Best,
Oliver

@harrydesmond
Copy link
Author

Great, yes that seems to have fixed it!

I've been impressed with FastKCI. I haven't done exhaustive tests, but from what I've seen it performs roughly as well as KCI in a fraction of the time when the dataset is large. No problems with it besides this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants