-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter tuning for applying DECI to large graphs #60
Comments
Hi Fred, Thanks for your detailed description of your question. My suspicion is that when the graph is relatively large, the scale of dagness penalty can get larger, and the updates for rho or alpha can blow up quickly; hence the optimization focuses only on producing a dag (which can be achieved naively with null graph) and ignore fitting the data. I would suggest to:
and see if the problem still persists. Note that the learned graph might not be a valid dag in this case, but this will help you identify the source of the problem.
|
I have encountered a similar issue when running the algorithm. I have tested it on 20 nodes + 10k data + batch size 100 + max epoch 1000. After a few epoch the alpha and rho increase drastically and i eventually get an 'nan no real values found' error. I have tried setting safety_alpha=0.0, safety_rho=0.0 and prior_sparsity_lambda = 0.1, but the problem persists. There are also cases where at the end of training, a non-valid DAG is generated. Do you have any recommendations on how I can troubleshoot it? I'm running on causia0.2.0 as I'm using python 3.9, I followed the code in the csuite example. Also, is it possible to know what the time complexity of the DECI algorithm is? Thank you so much! |
Hello, First of all, thank you very much LaurantChao for your suggestions and sorry for my late answer. Next, jiayang97, my issue seems a little different from yours (my trainings are performed without any error message) but, if you want to completely remove the dagness penalty like me, you also need to set the parameter |
Hello,
Do you have some pieces of advice for tuning the parameters of the DECI method when applied to large graphs for Causal Discovery?
I have tried to apply the DECI method to datasets of simulated graphs with 10, 20, 50 and 100 nodes (with nb edges equal or 4x number of nodes) and different types of nonlinear SEMs (but all with Gaussian Additive Noise).
For all datasets, the training seems to be going correctly (the loss curves are correctly decreasing, there is no numerical warning), so DECI seems to be converging.
For all but 100-nodes graph datasets (and some 50-nodes graphs) I obtain valid graph estimates, more or less correct depending on the situation.
For all 100-nodes graph datasets (and some 50-nodes graphs) I obtain invalid "empty" graphs (i.e. graphs with adjacency matrix made of only 0 elements).
Could you please help me in making DECI work for these 100-nodes graphs?
Here is my setting:
gp
,quadratic
andmlp
):So I am using the default parameters + those specified in this example.
I have only changed the batch size from 1024 to 128 to better fit my datasets containing 3000 samples.
Thank you very much for your help,
The text was updated successfully, but these errors were encountered: