Parameter tuning for applying DECI to large graphs #60

fred887 · 2023-08-14T15:50:11Z

Hello,

Do you have some pieces of advice for tuning the parameters of the DECI method when applied to large graphs for Causal Discovery?

I have tried to apply the DECI method to datasets of simulated graphs with 10, 20, 50 and 100 nodes (with nb edges equal or 4x number of nodes) and different types of nonlinear SEMs (but all with Gaussian Additive Noise).

For all datasets, the training seems to be going correctly (the loss curves are correctly decreasing, there is no numerical warning), so DECI seems to be converging.
For all but 100-nodes graph datasets (and some 50-nodes graphs) I obtain valid graph estimates, more or less correct depending on the situation.
For all 100-nodes graph datasets (and some 50-nodes graphs) I obtain invalid "empty" graphs (i.e. graphs with adjacency matrix made of only 0 elements).

Could you please help me in making DECI work for these 100-nodes graphs?

Here is my setting:

Using the following snippet from the current gcastle package (v1.0.3), I have simulated several datasets of 3000 graphs with 100 nodes (and 100 or 400 edges) and different nonlinear SEMs (gp, quadratic and mlp):

weighted_random_dag = DAG.erdos_renyi(n_nodes=n_nodes, n_edges=n_edges, weight_range=(0.5, 2.0), seed=seed)
dataset = IIDSimulation(W=weighted_random_dag, n=n, method=method_type, sem_type=sem_type)
true_dag, X = dataset.B, dataset.X

I have adapted the source code from examples/multi_investment_sales_attribution.ipynb to process my own datasets.
So I am using the default parameters + those specified in this example.
I have only changed the batch size from 1024 to 128 to better fit my datasets containing 3000 samples.

Thank you very much for your help,

The text was updated successfully, but these errors were encountered:

LaurantChao · 2023-08-25T14:11:13Z

Hi Fred,

Thanks for your detailed description of your question. My suspicion is that when the graph is relatively large, the scale of dagness penalty can get larger, and the updates for rho or alpha can blow up quickly; hence the optimization focuses only on producing a dag (which can be achieved naively with null graph) and ignore fitting the data. I would suggest to:

debug by removing the dagness penalty term during the optimization by setting the safety_alpha and safety_rho. For example you could try setting the config to something like:

lightning_module = DECIModule(
    noise_dist=ContinuousNoiseDist.GAUSSIAN,
    prior_sparsity_lambda=1.0,
    init_rho=0.0,
    init_alpha=0.0,
    safety_alpha=0.0,
    safety_rho=0.0,
    auglag_config=AugLagLRConfig(lr_init_dict={"vardist": 1e-2, "icgnn": 3e-4, "noise_dist": 3e-3}),
)

and see if the problem still persists. Note that the learned graph might not be a valid dag in this case, but this will help you identify the source of the problem.

The other possibility is that the sparsity term is too large. You could try to set a smaller value, e.g., prior_sparsity_lambda = 0.01.

jiayang97 · 2023-09-27T03:21:43Z

I have encountered a similar issue when running the algorithm. I have tested it on 20 nodes + 10k data + batch size 100 + max epoch 1000. After a few epoch the alpha and rho increase drastically and i eventually get an 'nan no real values found' error. I have tried setting safety_alpha=0.0, safety_rho=0.0 and prior_sparsity_lambda = 0.1, but the problem persists. There are also cases where at the end of training, a non-valid DAG is generated. Do you have any recommendations on how I can troubleshoot it? I'm running on causia0.2.0 as I'm using python 3.9, I followed the code in the csuite example. Also, is it possible to know what the time complexity of the DECI algorithm is? Thank you so much!

fred887 · 2023-09-28T12:22:54Z

Hello,

First of all, thank you very much LaurantChao for your suggestions and sorry for my late answer.
I applied the modifications you proposed and by removing completely the dagness penalty I could obtain non-empty graphs (though not valid, as expected). Changing the sparsity lambda had no impact on my initial problem.
So there is something with the dagness penalty term that I have to investigate more deeply.
(By the way, in my DECI version, the safety_rho and safety_alpha parameters belong to the AugLagLRConfig class...).

Next, jiayang97, my issue seems a little different from yours (my trainings are performed without any error message) but, if you want to completely remove the dagness penalty like me, you also need to set the parameter init_rho to 0. I hope this can help you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter tuning for applying DECI to large graphs #60

Parameter tuning for applying DECI to large graphs #60

fred887 commented Aug 14, 2023

LaurantChao commented Aug 25, 2023

jiayang97 commented Sep 27, 2023 •

edited

Loading

fred887 commented Sep 28, 2023 •

edited

Loading

Parameter tuning for applying DECI to large graphs #60

Parameter tuning for applying DECI to large graphs #60

Comments

fred887 commented Aug 14, 2023

LaurantChao commented Aug 25, 2023

jiayang97 commented Sep 27, 2023 • edited Loading

fred887 commented Sep 28, 2023 • edited Loading

jiayang97 commented Sep 27, 2023 •

edited

Loading

fred887 commented Sep 28, 2023 •

edited

Loading