-
Notifications
You must be signed in to change notification settings - Fork 938
Networkx Proposals For MixedEdgeGraph Class To Enable Causal Graphs
Networkx is the standard Python package for representing graphs, performing operations on graphs and performing algorithms on graphs. It also enables users to draw graphs. Networkx graphs only have one type of edge, either directed, or undirected.
Causality uses graphs with different types of edges in the same graph. For example:
- ADMG: a causal DAG with directed and bidirected edges.
- CPDAG: an equivalence class of causal graphs with directed and undirected edges.
- PAG: an equivalence class of causal graphs with directed edges, undirected edges and directed edges with circular endpoints.
We propose a MixedEdge
graph class that allows arbitrary combinations of existing Graph
or DiGraph
classes. This could of course be extended to support MultiMixedEdge
graph class in the future as well.
class MixedEdgeGraph:
def __init__(self, graphs: List, edge_types: List):
# do some error checking
...
self.graphs = graphs
self.edge_types = edge_types
def add_edge(self, u, v, edge_type):
if edge_type not in self.edge_types:
raise Error
edge_idx = edge_types.index(edge_type)
self.graphs[edge_idx].add_edge(u, v)
def has_edge(self, u, v, edge_type):
def remove_edge(self, u, v, edge_type):
def to_undirected(self):
# convert all graphs to undirected
@cached_property
def adj(self, edge_type='all'):
# return Adjacencies over all/any type of edge
This would serve as a "new" base class since it would not inherit from any of the existing networkx graphs, but rather serve as a generic API layer for allowing any arbitrary combination of graphs that represent different edge types, while providing users with highly efficient and robust data structures: e.g. AdjacencyView. Here are a few examples of usage in downstream packages. These explicit implementations of causal graphs ideally should be lightweight and extend the networkx-API layer to provide more nuanced functionality for causality users.
Although all graph classes could be represented using just a MixedEdgeGraph
instantiation, it is not desirable. For one, having a subclass implementation of common causal graphs would enable i) error checking of edge operations and graph initialization, ii) light-weight convenience API for common causal-graph operations (i.e. access to specific types of edges and causality-specific notions such as "c-components"), and iii) provide additional causality-specific graph operations (i.e. do-operation).
class ADMG(MixedEdgeGraph):
def __init__(self, directed_data, bidirected_data):
super().__init__([directed_data, bidirected_data], edge_types=['directed', 'bidirected'])
def do(self, u):
# apply do intervention and intervene on the graph
def soft_do(self, nodes):
# apply soft intervention on a set of nodes by adding a F-node for example
@property
def c_components(self):
return nx.connected_components(self.bidirected_edge_graph)
...
We could also subclass the MixedEdgeGraph
with equivalence classes, where we have again different types of edges.
class CPDAG(MixedEdgeGraph):
def __init__(self, directed_data, undirected_data):
super().__init__([directed_data, undirected_data], edge_types=['directed', 'undirected'])
# API layer for users to work with the specific types of edges
def add_undirected_edge(self, u, v):
super().add_edge(u, v, edge_type='undirected')
...
# We can also use the ADMG defined functions in the PAG as well
class PAG(MixedEdgeGraph, ADMG):
def __init__(self, directed_data, undirected_data, circular_data):
super().__init__([directed_data, undirected_data, circular_data], edge_types=['directed', 'undirected', 'circle'])
# API layer for users to work with the specific types of edges
def add_circle_endpoint(self, u, v):
super().add_edge(u, v, edge_type='circle')
def potential_parents(self, u):
# return the directed edge parents of u and circle edges pointing to u
...
# instantiate the PAG
pag = PAG(nx.DiGraph, nx.Graph, nx.DiGraph)
MixedEdge graphs are useful mainly in causality, so we propose similar to bipartite
submodule in networkx
to add a causal
submodule that has graph traversal algorithms for mixededge graphs. The most relevant one would be that of "m-separation", which generalizes "d-separation". Similar to bipartite in networkx, we can also have some "convention" that is used in networkx to represent the different types of edges we want to analyze in each algorithm. In this case, networkx has the convention that there is an edge_type named 'directed' and an edge type named 'bidirected' when running m_separated
.
def m_separated(G, x, y, z, directed_edge_name='directed', bidirected_edge_name='bidirected'):
# check that G is a mixed edge graph with the edge-graph names corresponding to
# 'directed' and 'bidirected'.
check_G(G)
# convert G to a DAG by converting all bidirected edges to unobserved confounders
dag_G = _convert_latent_to_unobserved_confounders(G)
# run d-separation
return d_separated(dag_G, x, y, z)
def _convert_latent_to_unobserved_confounders(G: ADMG) -> ADMG:
"""Convert all bidirected edges to unobserved confounders.
Parameters
----------
G : ADMG
A causal graph with bidirected edges.
Returns
-------
G_copy : ADMG
A networkx DiGraph that is a fully specified DAG with unobserved
variables added in place of bidirected edges.
"""
uc_label = "Unobserved Confounders"
G_copy = G.copy()
# for every bidirected edge, add a new node
for idx, latent_edge in enumerate(G.c_component_graph.edges):
G_copy.add_node(f"U{idx}", label=uc_label, observed="no")
# then add edges from the new UC to the nodes
G_copy.add_edge(f"U{idx}", latent_edge[0])
G_copy.add_edge(f"U{idx}", latent_edge[1])
# remove the actual bidirected edge
G_copy.remove_bidirected_edge(*latent_edge)
return G_copy
Other algorithms also can be proposed, such as discriminating paths, uncovered paths, etc. These are traversal algorithms that operate over specifically mixed edges.
def discriminating_path(G, directed_edge_name='directed', bidirected_edge_name='bidirected'):
# compute discriminating paths in the graph
def is_discriminating_path(G, path):
# check if the path is a discriminating path
def has_discriminating_path(G, u, v, w):
# check if there is a discriminating path along u, v, w
Note this is similar to other path algorithms, where networkx provides a function to i) enumerate all paths of a certain type, ii) check if there is a path, iii) check if a path exhibits certain properties. (e.g. https://networkx.org/documentation/stable/reference/algorithms/shortest_paths.html)
Similar to bipartite in networkx, we could have common basic functions to help causality-focused users:
With the addition of a new graph class, existing algorithms inside networkx should be modified to either i) raise an error if it doesn't support mixedEdgeGraphs, or ii) work with mixed edge graphs or iii) some documentation for how to work it with the existing codebase.
Update as of 07-25-22: I am inclined to say that the documentation for MixedEdgeGraph
should state that any algorithm within https://networkx.org/documentation/stable/reference/algorithms/approximation.html#module-networkx.algorithms is not guaranteed to work with MixedEdgeGraph
unless specified. Perhaps @dshultz from networkx can comment on this. By making this cutoff of what "will" work with MixedEdgeGraph
, we guarantee that users do not accidentally run something that doesn't work. Then we can slowly PR algorithms that make sense to add MixedEdgeGraph
functionality.
TBD
TBD
- topological_sort: document that you can run it using
topological_sort(mixed_graph.get_edge_graph('directed'))
. - is_directed_acyclic_graph: document that you can run it using
is_directed_acyclic_graph(mixed_graph.get_edge_graph('directed'))
.
Fundamental graphical algorithms, such as m-separation should go into networkx. These can possibly have generic usage for researchers who use mixed-edge graphs that are not necessarily "causal". Keeping in line with the fact that networkx provides generic path-algorithms for graphs, we believe that the causal
submodule can house many of the fundamental causal graph traversal algorithms. These include fundamental path algorithms for mixed-edge graphs, such as discriminating paths, possibly d-separating paths, uncovered potentially directed paths, potentially directed paths, etc.
This will need discussion with networkx. The exact separation of what goes where should be discussed.
PyWhy on the other hand would house more nuanced causal operations, that align more with causal-inference semantics. For example, structure learning.
See here for discussions on the proposal outlined here: https://github.com/py-why/dowhy/discussions/525