Motivation:
This is motivated by the need to identify causal effects in E. coli gene regulatory networks where wild-type data alone is insufficient, and single perturbation experimental data is required as background. Specifically we want to support two cases:
Single perturbation: causal queries identifiable from wild-type data $P(V)$
Double perturbation: causal queries not identifiable from wild-type data, but identifiable from single perturbation data $P_{do(J)}(V)$
Issue Description
This issue is documenting the requirement to fully support identification of causal effects $P(Y | do(W))$ in cyclic graphs using data collected under a prior interventional regime $P_{do(J)}(V)$, rather than purely observational data $P(V)$.
What exists now
The cyclic ID algorithm in cyclic_id.py on the feature branch called fix-line-23-cyclic-ID currently accepts a base_distribution parameter representing $P_{do(J)}(V)$. At the top level it:
- Extracts the intervention $do(J)$ using
_help_level_2_distribution()
- Performs graph surgery by removing incoming edges to $J$ from the graph
- Passes the intervened distribution down to
initialize_district_distribution and initialize_component_distribution
Where the gap is:
The internal IDCD recursion does not fully account for $do(J)$. More specifically, Line 23 of the IDCD algorithm requires:
$R_A[S] \leftarrow P(S \mid Pred^G_<(S) \cap A, do(J \cup V \setminus A))$
What this means is that the full interventional context $do(J \cup V\setminus A)$ must be passed through the recursive identification calls. Right now, the implementation performs graph surgery at the top level using _help_level_2_distribution_ but does not correctly propagate $J$ through the recursive calls inside idcd.py
Code vs Paper Mapping Table
| Paper Concept |
Notation |
Code Location |
File |
Status |
| Background intervention |
$do(J)$ |
base_distribution parameter in cyclic_id()
|
cyclic_id.py |
Exists |
| Graph surgery on $J$
|
Remove incoming edges to $J$
|
graph.remove_nodes_from(intervention_j) |
cyclic_id.py |
Exists (top level only) |
| Line 23: Full interventional context |
$R_A[S] \leftarrow P(S \mid Pred(S) \cap A, do(J \cup V\setminus A))$ |
compute_scc_distributions() |
cyclic_id.py |
GAP here, $J$ not propagated through recursion for this line |
Next steps to be able to complete:
-
Propagate $J$ through the recursion: We need to update the function signatures of the idcd , identify_through_scc_decomposition() and compute_scc_distributions() functions to be able to include do(J) through the recursion. Right now, $J$ is extracted at the top level in cyclic_id and passed via base_distribution to initialize_district_distribution and initialize_component_distribution, but is not explicitly passed through idcd(), identify_through_scc_decomposition() , and compute_scc_distributions().
-
Adjust Line 23: In identify_through_scc_decomposition(), update
the intervention_set calculation to union the local context $(V\setminus A)$ with the
background intervention $J$, which should map to the paper's $do(J \cup V\setminus A)$:
# Current (incomplete)
intervention_set = nodes - ancestral_closure
# suggested fix
intervention_set = (nodes - ancestral_closure) | background_interventions # was intervention_j
- Testing and validation: Add a test case using a confounded cyclic graph where
identifiability is only achieved with background interventional data, and verify
that:
cyclic_id(graph, outcomes={Y}, interventions={W}, base_distribution=P[J](V))
Returns the same thing as the manual graph surgery:
cyclic_id(graph_mutilated_by_J, outcomes={Y}, interventions={W})
Testing Strategy
To be able to validate the fix, a couple more automated pytest test cases should be added to the existing cyclic_id testing files.
The tests should cover the following scenarios:
-
Unidentifiable without context added; query becomes identifiable with additional parameter added: This is the core case. A query that raises Unidentifiable from observational data alone but returns a valid estimand when base_distribution=P[do(J)](V) is provided.
-
Identifiable without additional parameter stays identifiable with context: This test verifies that adding a base_distribution does not break queries that were already identifiable; serving as a regression test.
-
Equivalence with manual graph surgery: Verifies that cyclic_id with base_distribution=P[do(J)](V) returns the same result as manually removing incoming edges to J and running the cyclic ID algorithm on the mutilated graph.
Proposed Fix
The fix requires changes across different functions in cyclic_id.py. The goal is to fully propagate background_interventions (representing $J$ from the paper) through the full recursion chain.:
cyclic_id() → idcd() → identify_through_scc_decomposition() → compute_scc_distributions() → identify_district_variables_cyclic()
1. cyclic_id() - pass background_interventions into idcd()
intervention_j is extracted already, but needs to be passed into the recursive helper function IDCD.
# Current
district_distributions[district_c] = idcd(
graph=graph,
outcomes=set(district_c),
district=consolidated_district_of_c,
distribution=initial_distribution,
)
# Proposed fix
district_distributions[district_c] = idcd(
graph=graph,
outcomes=set(district_c),
district=consolidated_district_of_c,
distribution=initial_distribution,
background_interventions=intervention_j,
)
2. idcd() - add the background_interventions parameter to the signature and pass it down through the function.
Add background_interventions as a parameter and pass it through to identify_through_scc_decomposition():
# Current
def idcd(
graph: NxMixedGraph,
outcomes: set[Variable],
district: set[Variable],
*,
distribution: Expression | None = None,
_recursion_level: int = 0,
) -> Expression:
# Proposed fix
def idcd(
graph: NxMixedGraph,
outcomes: set[Variable],
district: set[Variable],
*,
distribution: Expression | None = None,
background_interventions: set[Variable] | None = None,
_recursion_level: int = 0,
) -> Expression:
And then update the call to identify_through_scc_decomposition() at the bottom of idcd():
# Current
return identify_through_scc_decomposition(
graph, outcomes, ancestral_closure, _recursion_level=_recursion_level
)
# Proposed fix
return identify_through_scc_decomposition(
graph, outcomes, ancestral_closure,
background_interventions=background_interventions,
_recursion_level=_recursion_level
)
3. identify_through_scc_decomposition() - add signature update and fix the intervention_set calculation
This is a core fix:
# Current
def identify_through_scc_decomposition(
graph: NxMixedGraph,
outcomes: set[Variable],
ancestral_closure: set[Variable],
*,
_recursion_level: int = 0,
) -> Expression:
# Proposed fix
def identify_through_scc_decomposition(
graph: NxMixedGraph,
outcomes: set[Variable],
ancestral_closure: set[Variable],
*,
background_interventions: set[Variable] | None = None,
_recursion_level: int = 0,
) -> Expression:
Update the intervention_set calculation to union with background_interventions
# Current
intervention_set: Annotated[set[Variable], InPaperAs("J")] = nodes - ancestral_closure
# Proposed fix
background_interventions = background_interventions or set()
intervention_set: Annotated[set[Variable], InPaperAs("J")] = (nodes - ancestral_closure) | background_interventions
Pass both the intervention_set and background_interventions into compute_scc_distributions():
# Current
scc_distributions = compute_scc_distributions(
graph=graph,
subgraph_a=ancestral_closure_subgraph,
relevant_sccs=relevant_sccs,
ancestral_closure=ancestral_closure,
intervention_set=intervention_set,
)
# Proposed fix
scc_distributions = compute_scc_distributions(
graph=graph,
subgraph_a=ancestral_closure_subgraph,
relevant_sccs=relevant_sccs,
ancestral_closure=ancestral_closure,
intervention_set=intervention_set,
background_interventions=background_interventions,
)
And then finally update the recursive call to idcd() to pass background_interventions through:
# Current
return idcd(
graph=graph,
outcomes=outcomes,
district=consolidated_district,
distribution=district_distribution,
_recursion_level=_recursion_level + 1,
)
# Proposed fix
return idcd(
graph=graph,
outcomes=outcomes,
district=consolidated_district,
distribution=district_distribution,
background_interventions=background_interventions,
_recursion_level=_recursion_level + 1,
)
4. compute_scc_distributions() - fix the unused intervention_setand add thebackground_interventions`
This function has an intervention_set parameter but it ignores it entirely. It is never passed to identify_district_variables_cyclic at all. We should add background_interventions to the signature and pass both into identify_district_variables_cyclic():
# Current
def compute_scc_distributions(
graph: NxMixedGraph,
subgraph_a: NxMixedGraph,
relevant_sccs: list[frozenset[Variable]],
ancestral_closure: set[Variable],
intervention_set: set[Variable],
) -> dict[frozenset[Variable], Expression]:
# Proposed fix
def compute_scc_distributions(
graph: NxMixedGraph,
subgraph_a: NxMixedGraph,
relevant_sccs: list[frozenset[Variable]],
ancestral_closure: set[Variable],
intervention_set: set[Variable],
background_interventions: set[Variable] | None = None,
) -> dict[frozenset[Variable], Expression]:
Update the call to identify_district_variables_cyclic() inside the loop to actually pass intervention_set and background_interventions:
# Current
result = identify_district_variables_cyclic(
input_variables=scc,
input_district=frozenset(consolidated_district),
district_probability=initial_distribution,
graph=graph,
topo=full_apt_order,
)
# Proposed fix
result = identify_district_variables_cyclic(
input_variables=scc,
input_district=frozenset(consolidated_district),
district_probability=initial_distribution,
graph=graph,
topo=full_apt_order,
intervention_set=intervention_set,
background_interventions=background_interventions,
)
5. identify_district_variables_cyclic() - add signature update and use the interventional context
Here we have to now add in the $do(J)$ parameter into this function which represents Line 23 in the cyclic ID algorithm:
# Current
def identify_district_variables_cyclic(
*,
input_variables: frozenset[Variable],
input_district: frozenset[Variable],
district_probability: Expression,
graph: NxMixedGraph,
topo: list[Variable],
) -> Expression | None:
# Proposed fix
def identify_district_variables_cyclic(
*,
input_variables: frozenset[Variable],
input_district: frozenset[Variable],
district_probability: Expression,
graph: NxMixedGraph,
topo: list[Variable],
intervention_set: set[Variable] | None = None,
background_interventions: set[Variable] | None = None,
) -> Expression | None:
We also need to pass background_interventions through the recursive call inside the function:
# Current
return identify_district_variables_cyclic(
input_variables=input_variables,
input_district=targeted_ancestral_set_subgraph_district,
district_probability=targeted_district_probability,
graph=graph,
topo=topo,
)
# Proposed fix
return identify_district_variables_cyclic(
input_variables=input_variables,
input_district=targeted_ancestral_set_subgraph_district,
district_probability=targeted_district_probability,
graph=graph,
topo=topo,
intervention_set=intervention_set,
background_interventions=background_interventions,
)
Motivation:
This is motivated by the need to identify causal effects in E. coli gene regulatory networks where wild-type data alone is insufficient, and single perturbation experimental data is required as background. Specifically we want to support two cases:
Single perturbation: causal queries identifiable from wild-type data$P(V)$ $P_{do(J)}(V)$
Double perturbation: causal queries not identifiable from wild-type data, but identifiable from single perturbation data
Issue Description
This issue is documenting the requirement to fully support identification of causal effects$P(Y | do(W))$ in cyclic graphs using data collected under a prior interventional regime $P_{do(J)}(V)$ , rather than purely observational data $P(V)$ .
What exists now
The cyclic ID algorithm in$P_{do(J)}(V)$ . At the top level it:
cyclic_id.pyon the feature branch calledfix-line-23-cyclic-IDcurrently accepts abase_distributionparameter representing_help_level_2_distribution()initialize_district_distributionandinitialize_component_distributionWhere the gap is:
The internal IDCD recursion does not fully account for$do(J)$ . More specifically, Line 23 of the IDCD algorithm requires:
$R_A[S] \leftarrow P(S \mid Pred^G_<(S) \cap A, do(J \cup V \setminus A))$
What this means is that the full interventional context$do(J \cup V\setminus A)$ must be passed through the recursive identification calls. Right now, the implementation performs graph surgery at the top level using $J$ through the recursive calls inside
_help_level_2_distribution_but does not correctly propagateidcd.pyCode vs Paper Mapping Table
base_distributionparameter incyclic_id()cyclic_id.pygraph.remove_nodes_from(intervention_j)cyclic_id.pycompute_scc_distributions()cyclic_id.pyNext steps to be able to complete:
Propagate$J$ through the recursion: We need to update the function signatures of the $J$ is extracted at the top level in
idcd,identify_through_scc_decomposition()andcompute_scc_distributions()functions to be able to include do(J) through the recursion. Right now,cyclic_idand passed viabase_distributiontoinitialize_district_distributionandinitialize_component_distribution, but is not explicitly passed throughidcd(),identify_through_scc_decomposition(), andcompute_scc_distributions().Adjust Line 23: In$(V\setminus A)$ with the$J$ , which should map to the paper's $do(J \cup V\setminus A)$ :
identify_through_scc_decomposition(), updatethe
intervention_setcalculation to union the local contextbackground intervention
identifiability is only achieved with background interventional data, and verify
that:
cyclic_id(graph, outcomes={Y}, interventions={W}, base_distribution=P[J](V))Returns the same thing as the manual graph surgery:
cyclic_id(graph_mutilated_by_J, outcomes={Y}, interventions={W})Testing Strategy
To be able to validate the fix, a couple more automated pytest test cases should be added to the existing
cyclic_idtesting files.The tests should cover the following scenarios:
Unidentifiable without context added; query becomes identifiable with additional parameter added: This is the core case. A query that raises Unidentifiable from observational data alone but returns a valid estimand when
base_distribution=P[do(J)](V)is provided.Identifiable without additional parameter stays identifiable with context: This test verifies that adding a
base_distributiondoes not break queries that were already identifiable; serving as a regression test.Equivalence with manual graph surgery: Verifies that
cyclic_idwithbase_distribution=P[do(J)](V)returns the same result as manually removing incoming edges to J and running the cyclic ID algorithm on the mutilated graph.Proposed Fix
The fix requires changes across different functions in$J$ from the paper) through the full recursion chain.:
cyclic_id.py. The goal is to fully propagatebackground_interventions(representingcyclic_id()→idcd()→identify_through_scc_decomposition()→compute_scc_distributions()→identify_district_variables_cyclic()1.
cyclic_id()- passbackground_interventionsintoidcd()intervention_jis extracted already, but needs to be passed into the recursive helper function IDCD.2.
idcd()- add thebackground_interventionsparameter to the signature and pass it down through the function.Add
background_interventionsas a parameter and pass it through toidentify_through_scc_decomposition():And then update the call to
identify_through_scc_decomposition()at the bottom ofidcd():3.
identify_through_scc_decomposition()- add signature update and fix theintervention_setcalculationThis is a core fix:
Update the
intervention_setcalculation to union withbackground_interventionsPass both the
intervention_setandbackground_interventionsintocompute_scc_distributions():And then finally update the recursive call to
idcd()to passbackground_interventionsthrough:4.
compute_scc_distributions() - fix the unusedintervention_setand add thebackground_interventions`This function has an
intervention_setparameter but it ignores it entirely. It is never passed toidentify_district_variables_cyclicat all. We should addbackground_interventionsto the signature and pass both intoidentify_district_variables_cyclic():Update the call to
identify_district_variables_cyclic()inside the loop to actually passintervention_setandbackground_interventions:5.
identify_district_variables_cyclic()- add signature update and use the interventional contextHere we have to now add in the$do(J)$ parameter into this function which represents Line 23 in the cyclic ID algorithm:
We also need to pass
background_interventionsthrough the recursive call inside the function: