Skip to content

Adding the ability to identify causal queries in a cyclic graph from interventional data #388

@emilydogbatsePNNL2025

Description

@emilydogbatsePNNL2025

Motivation:

This is motivated by the need to identify causal effects in E. coli gene regulatory networks where wild-type data alone is insufficient, and single perturbation experimental data is required as background. Specifically we want to support two cases:

Single perturbation: causal queries identifiable from wild-type data $P(V)$
Double perturbation: causal queries not identifiable from wild-type data, but identifiable from single perturbation data $P_{do(J)}(V)$

Issue Description

This issue is documenting the requirement to fully support identification of causal effects $P(Y | do(W))$ in cyclic graphs using data collected under a prior interventional regime $P_{do(J)}(V)$, rather than purely observational data $P(V)$.

What exists now

The cyclic ID algorithm in cyclic_id.py on the feature branch called fix-line-23-cyclic-ID currently accepts a base_distribution parameter representing $P_{do(J)}(V)$. At the top level it:

  • Extracts the intervention $do(J)$ using _help_level_2_distribution()
  • Performs graph surgery by removing incoming edges to $J$ from the graph
  • Passes the intervened distribution down to initialize_district_distribution and initialize_component_distribution

Where the gap is:

The internal IDCD recursion does not fully account for $do(J)$. More specifically, Line 23 of the IDCD algorithm requires:
$R_A[S] \leftarrow P(S \mid Pred^G_<(S) \cap A, do(J \cup V \setminus A))$

What this means is that the full interventional context $do(J \cup V\setminus A)$ must be passed through the recursive identification calls. Right now, the implementation performs graph surgery at the top level using _help_level_2_distribution_ but does not correctly propagate $J$ through the recursive calls inside idcd.py

Code vs Paper Mapping Table

Paper Concept Notation Code Location File Status
Background intervention $do(J)$ base_distribution parameter in cyclic_id() cyclic_id.py Exists
Graph surgery on $J$ Remove incoming edges to $J$ graph.remove_nodes_from(intervention_j) cyclic_id.py Exists (top level only)
Line 23: Full interventional context $R_A[S] \leftarrow P(S \mid Pred(S) \cap A, do(J \cup V\setminus A))$ compute_scc_distributions() cyclic_id.py GAP here, $J$ not propagated through recursion for this line

Next steps to be able to complete:

  • Propagate $J$ through the recursion: We need to update the function signatures of the idcd , identify_through_scc_decomposition() and compute_scc_distributions() functions to be able to include do(J) through the recursion. Right now, $J$ is extracted at the top level in cyclic_id and passed via base_distribution to initialize_district_distribution and initialize_component_distribution, but is not explicitly passed through idcd(), identify_through_scc_decomposition() , and compute_scc_distributions().

  • Adjust Line 23: In identify_through_scc_decomposition(), update
    the intervention_set calculation to union the local context $(V\setminus A)$ with the
    background intervention $J$, which should map to the paper's $do(J \cup V\setminus A)$:

    # Current (incomplete)
    intervention_set = nodes - ancestral_closure
    
    # suggested fix
    intervention_set = (nodes - ancestral_closure) | background_interventions  # was intervention_j
  • Testing and validation: Add a test case using a confounded cyclic graph where
    identifiability is only achieved with background interventional data, and verify
    that:

cyclic_id(graph, outcomes={Y}, interventions={W}, base_distribution=P[J](V))

Returns the same thing as the manual graph surgery:

cyclic_id(graph_mutilated_by_J, outcomes={Y}, interventions={W})


Testing Strategy

To be able to validate the fix, a couple more automated pytest test cases should be added to the existing cyclic_id testing files.

The tests should cover the following scenarios:

  • Unidentifiable without context added; query becomes identifiable with additional parameter added: This is the core case. A query that raises Unidentifiable from observational data alone but returns a valid estimand when base_distribution=P[do(J)](V) is provided.

  • Identifiable without additional parameter stays identifiable with context: This test verifies that adding a base_distribution does not break queries that were already identifiable; serving as a regression test.

  • Equivalence with manual graph surgery: Verifies that cyclic_id with base_distribution=P[do(J)](V) returns the same result as manually removing incoming edges to J and running the cyclic ID algorithm on the mutilated graph.


Proposed Fix

The fix requires changes across different functions in cyclic_id.py. The goal is to fully propagate background_interventions (representing $J$ from the paper) through the full recursion chain.:

cyclic_id()idcd()identify_through_scc_decomposition()compute_scc_distributions()identify_district_variables_cyclic()

1. cyclic_id() - pass background_interventions into idcd()

intervention_j is extracted already, but needs to be passed into the recursive helper function IDCD.

# Current
district_distributions[district_c] = idcd(
    graph=graph,
    outcomes=set(district_c),
    district=consolidated_district_of_c,
    distribution=initial_distribution,
)

# Proposed fix
district_distributions[district_c] = idcd(
    graph=graph,
    outcomes=set(district_c),
    district=consolidated_district_of_c,
    distribution=initial_distribution,
    background_interventions=intervention_j,
)

2. idcd() - add the background_interventions parameter to the signature and pass it down through the function.

Add background_interventions as a parameter and pass it through to identify_through_scc_decomposition():

# Current
def idcd(
    graph: NxMixedGraph,
    outcomes: set[Variable],
    district: set[Variable],
    *,
    distribution: Expression | None = None,
    _recursion_level: int = 0,
) -> Expression:

# Proposed fix
def idcd(
    graph: NxMixedGraph,
    outcomes: set[Variable],
    district: set[Variable],
    *,
    distribution: Expression | None = None,
    background_interventions: set[Variable] | None = None,
    _recursion_level: int = 0,
) -> Expression:

And then update the call to identify_through_scc_decomposition() at the bottom of idcd():

# Current
return identify_through_scc_decomposition(
    graph, outcomes, ancestral_closure, _recursion_level=_recursion_level
)

# Proposed fix
return identify_through_scc_decomposition(
    graph, outcomes, ancestral_closure,
    background_interventions=background_interventions,
    _recursion_level=_recursion_level
)

3. identify_through_scc_decomposition() - add signature update and fix the intervention_set calculation

This is a core fix:

# Current
def identify_through_scc_decomposition(
    graph: NxMixedGraph,
    outcomes: set[Variable],
    ancestral_closure: set[Variable],
    *,
    _recursion_level: int = 0,
) -> Expression:

# Proposed fix
def identify_through_scc_decomposition(
    graph: NxMixedGraph,
    outcomes: set[Variable],
    ancestral_closure: set[Variable],
    *,
    background_interventions: set[Variable] | None = None,
    _recursion_level: int = 0,
) -> Expression:

Update the intervention_set calculation to union with background_interventions

# Current
intervention_set: Annotated[set[Variable], InPaperAs("J")] = nodes - ancestral_closure

# Proposed fix
background_interventions = background_interventions or set()
intervention_set: Annotated[set[Variable], InPaperAs("J")] = (nodes - ancestral_closure) | background_interventions

Pass both the intervention_set and background_interventions into compute_scc_distributions():

# Current
scc_distributions = compute_scc_distributions(
    graph=graph,
    subgraph_a=ancestral_closure_subgraph,
    relevant_sccs=relevant_sccs,
    ancestral_closure=ancestral_closure,
    intervention_set=intervention_set,
)

# Proposed fix
scc_distributions = compute_scc_distributions(
    graph=graph,
    subgraph_a=ancestral_closure_subgraph,
    relevant_sccs=relevant_sccs,
    ancestral_closure=ancestral_closure,
    intervention_set=intervention_set,
    background_interventions=background_interventions,
)

And then finally update the recursive call to idcd() to pass background_interventions through:

# Current
return idcd(
    graph=graph,
    outcomes=outcomes,
    district=consolidated_district,
    distribution=district_distribution,
    _recursion_level=_recursion_level + 1,
)

# Proposed fix
return idcd(
    graph=graph,
    outcomes=outcomes,
    district=consolidated_district,
    distribution=district_distribution,
    background_interventions=background_interventions,
    _recursion_level=_recursion_level + 1,
)

4. compute_scc_distributions() - fix the unused intervention_setand add thebackground_interventions`

This function has an intervention_set parameter but it ignores it entirely. It is never passed to identify_district_variables_cyclic at all. We should add background_interventions to the signature and pass both into identify_district_variables_cyclic():

# Current
def compute_scc_distributions(
    graph: NxMixedGraph,
    subgraph_a: NxMixedGraph,
    relevant_sccs: list[frozenset[Variable]],
    ancestral_closure: set[Variable],
    intervention_set: set[Variable],
) -> dict[frozenset[Variable], Expression]:

# Proposed fix
def compute_scc_distributions(
    graph: NxMixedGraph,
    subgraph_a: NxMixedGraph,
    relevant_sccs: list[frozenset[Variable]],
    ancestral_closure: set[Variable],
    intervention_set: set[Variable],
    background_interventions: set[Variable] | None = None,
) -> dict[frozenset[Variable], Expression]:

Update the call to identify_district_variables_cyclic() inside the loop to actually pass intervention_set and background_interventions:

# Current
result = identify_district_variables_cyclic(
    input_variables=scc,
    input_district=frozenset(consolidated_district),
    district_probability=initial_distribution,
    graph=graph,
    topo=full_apt_order,
)

# Proposed fix
result = identify_district_variables_cyclic(
    input_variables=scc,
    input_district=frozenset(consolidated_district),
    district_probability=initial_distribution,
    graph=graph,
    topo=full_apt_order,
    intervention_set=intervention_set,
    background_interventions=background_interventions,
)

5. identify_district_variables_cyclic() - add signature update and use the interventional context

Here we have to now add in the $do(J)$ parameter into this function which represents Line 23 in the cyclic ID algorithm:

# Current
def identify_district_variables_cyclic(
    *,
    input_variables: frozenset[Variable],
    input_district: frozenset[Variable],
    district_probability: Expression,
    graph: NxMixedGraph,
    topo: list[Variable],
) -> Expression | None:

# Proposed fix
def identify_district_variables_cyclic(
    *,
    input_variables: frozenset[Variable],
    input_district: frozenset[Variable],
    district_probability: Expression,
    graph: NxMixedGraph,
    topo: list[Variable],
    intervention_set: set[Variable] | None = None,
    background_interventions: set[Variable] | None = None,
) -> Expression | None:

We also need to pass background_interventions through the recursive call inside the function:

# Current
return identify_district_variables_cyclic(
    input_variables=input_variables,
    input_district=targeted_ancestral_set_subgraph_district,
    district_probability=targeted_district_probability,
    graph=graph,
    topo=topo,
)

# Proposed fix
return identify_district_variables_cyclic(
    input_variables=input_variables,
    input_district=targeted_ancestral_set_subgraph_district,
    district_probability=targeted_district_probability,
    graph=graph,
    topo=topo,
    intervention_set=intervention_set,
    background_interventions=background_interventions,
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions