Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter tuning for applying DECI to large graphs #60

Open
fred887 opened this issue Aug 14, 2023 · 3 comments
Open

Parameter tuning for applying DECI to large graphs #60

fred887 opened this issue Aug 14, 2023 · 3 comments

Comments

@fred887
Copy link

fred887 commented Aug 14, 2023

Hello,

Do you have some pieces of advice for tuning the parameters of the DECI method when applied to large graphs for Causal Discovery?

I have tried to apply the DECI method to datasets of simulated graphs with 10, 20, 50 and 100 nodes (with nb edges equal or 4x number of nodes) and different types of nonlinear SEMs (but all with Gaussian Additive Noise).

For all datasets, the training seems to be going correctly (the loss curves are correctly decreasing, there is no numerical warning), so DECI seems to be converging.
For all but 100-nodes graph datasets (and some 50-nodes graphs) I obtain valid graph estimates, more or less correct depending on the situation.
For all 100-nodes graph datasets (and some 50-nodes graphs) I obtain invalid "empty" graphs (i.e. graphs with adjacency matrix made of only 0 elements).

Could you please help me in making DECI work for these 100-nodes graphs?

Here is my setting:

  1. Using the following snippet from the current gcastle package (v1.0.3), I have simulated several datasets of 3000 graphs with 100 nodes (and 100 or 400 edges) and different nonlinear SEMs (gp, quadratic and mlp):
weighted_random_dag = DAG.erdos_renyi(n_nodes=n_nodes, n_edges=n_edges, weight_range=(0.5, 2.0), seed=seed)
dataset = IIDSimulation(W=weighted_random_dag, n=n, method=method_type, sem_type=sem_type)
true_dag, X = dataset.B, dataset.X
  1. I have adapted the source code from examples/multi_investment_sales_attribution.ipynb to process my own datasets.
    So I am using the default parameters + those specified in this example.
    I have only changed the batch size from 1024 to 128 to better fit my datasets containing 3000 samples.

Thank you very much for your help,

@LaurantChao
Copy link

Hi Fred,

Thanks for your detailed description of your question. My suspicion is that when the graph is relatively large, the scale of dagness penalty can get larger, and the updates for rho or alpha can blow up quickly; hence the optimization focuses only on producing a dag (which can be achieved naively with null graph) and ignore fitting the data. I would suggest to:

  • debug by removing the dagness penalty term during the optimization by setting the safety_alpha and safety_rho. For example you could try setting the config to something like:
lightning_module = DECIModule(
    noise_dist=ContinuousNoiseDist.GAUSSIAN,
    prior_sparsity_lambda=1.0,
    init_rho=0.0,
    init_alpha=0.0,
    safety_alpha=0.0,
    safety_rho=0.0,
    auglag_config=AugLagLRConfig(lr_init_dict={"vardist": 1e-2, "icgnn": 3e-4, "noise_dist": 3e-3}),
)

and see if the problem still persists. Note that the learned graph might not be a valid dag in this case, but this will help you identify the source of the problem.

  • The other possibility is that the sparsity term is too large. You could try to set a smaller value, e.g., prior_sparsity_lambda = 0.01.

@jiayang97
Copy link

jiayang97 commented Sep 27, 2023

I have encountered a similar issue when running the algorithm. I have tested it on 20 nodes + 10k data + batch size 100 + max epoch 1000. After a few epoch the alpha and rho increase drastically and i eventually get an 'nan no real values found' error. I have tried setting safety_alpha=0.0, safety_rho=0.0 and prior_sparsity_lambda = 0.1, but the problem persists. There are also cases where at the end of training, a non-valid DAG is generated. Do you have any recommendations on how I can troubleshoot it? I'm running on causia0.2.0 as I'm using python 3.9, I followed the code in the csuite example. Also, is it possible to know what the time complexity of the DECI algorithm is? Thank you so much!

@fred887
Copy link
Author

fred887 commented Sep 28, 2023

Hello,

First of all, thank you very much LaurantChao for your suggestions and sorry for my late answer.
I applied the modifications you proposed and by removing completely the dagness penalty I could obtain non-empty graphs (though not valid, as expected). Changing the sparsity lambda had no impact on my initial problem.
So there is something with the dagness penalty term that I have to investigate more deeply.
(By the way, in my DECI version, the safety_rho and safety_alpha parameters belong to the AugLagLRConfig class...).

Next, jiayang97, my issue seems a little different from yours (my trainings are performed without any error message) but, if you want to completely remove the dagness penalty like me, you also need to set the parameter init_rho to 0. I hope this can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants