I'm wondering why the implementation of compose(int,int,int) in BDDImpl does not utilize caching like many of the other operations do. Without caching, the operation performs very poorly on large BDD instances. Is there some specific challenge related to multithreading or was it just an oversight?