Hc/fix impossible schemes by HenryCrosswell · Pull Request #43 · neuroinformatics-unit/OSCaR

HenryCrosswell · 2026-06-08T14:18:54Z

Before submitting a pull request (PR), please read the contributing guide.

Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
Since the PyRAT db is manually populated, there is a possibility that someone could have incorrectly entered genotypes for the offspring.

What does this PR do?
This adds a function to standardise.py that checks whether the offsprings genotype is a possible combination of the two parents, using the Mendelian ratios.

References

closes issue #10

How has this PR been tested?

Added new test to test_standardise, along with new test data which includes a raw file with impossible schemes, and a standardised file that asserts that those schemes have been removed.

Is this a breaking change?

No

Does this PR require an update to the documentation?

Not yet.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

K-Meech

Thanks @HenryCrosswell - this is looking great! Your additions are filtering the impossible schemes really nicely, which will stop any incorrect data making it to the later processing stages.

I've put a few comments below - most are minor wording suggestions, but the main one is we need to make sure un-genotyped individuals can pass through the standardise process without erroring.

K-Meech · 2026-06-12T10:23:21Z

        pooch_data_path("standardised-data-forbidden-genotypes.csv")
    )

+    with pytest.raises(TypeError):


By adding with pytest.raises(TypeError), this test is allowed to pass even though it throws an error during standardise_pyrat_csv and never reaches pd.testing.assert_frame_equal. This means it is no longer testing that genotypes are standardised correctly.

You'll need to remove the with pytest.raises(TypeError) and rather add a fix in _remove_impossible_breeding_schemes instead. Otherwise, at the moment, any ungenotyped individuals that enter standardise_pyrat_csv will cause the processing to stop early (we need these ungenotyped individuals later in the historical stats part - see this issue: #8)

K-Meech · 2026-06-12T13:59:51Z

+    genotype_father = standardised_df_row["genotype_father"]
+    genotype_mother = standardised_df_row["genotype_mother"]
+    genotype_offspring = standardised_df_row["genotype_offspring"]
+


To keep rows with un-genotyped offspring, you could add something like this here:

# If the offspring are un-genotyped, we keep the row as there is no way # of checking if the breeding scheme was valid if pd.isna(genotype_offspring): return pop

Hopefully this should make the test_standardise_genotypes test pass, although I see that there is one remaining impossible breeding scheme in the test file I missed. You can change: pyrat-data-forbidden-genotypes.csv / standardised-data-forbidden-genotypes.csv ID-011 genotype_mother to wt_het on your GIN branch.

K-Meech · 2026-06-12T14:06:54Z

    ].agg("_".join, axis=1)
+
+
+def _remove_impossible_breeding_schemes(


As this function doesn't directly remove the impossible breeding schemes, but rather identifies them with True / False. I'd maybe re-name this? Something like:

Suggested change

def _remove_impossible_breeding_schemes(

def _is_impossible_breeding_scheme(

K-Meech · 2026-06-12T14:07:39Z

+    Parameters
+    ----------
+    standardised_df_row : pd.Series
+        rom from standardised_dataframe (pd.DataFrame): standardised PyRAT df


Suggested change

rom from standardised_dataframe (pd.DataFrame): standardised PyRAT df

row from standardised_dataframe (pd.DataFrame): standardised PyRAT df

K-Meech · 2026-06-12T14:10:59Z

+) -> bool:
+    """Retrieves parent genotypes and pulls the mendalian ratios from
+    BreedingScheme.
+    Compares offspring to these ratios, removing those which are not possible.
+    e.g. hom x hom parents cannot produce wt x wt offspring.


Maybe modify this so it's clear that this particular function isn't removing them?

Suggested change

) -> bool:

"""Retrieves parent genotypes and pulls the mendalian ratios from

BreedingScheme.

Compares offspring to these ratios, removing those which are not possible.

e.g. hom x hom parents cannot produce wt x wt offspring.

) -> bool:

"""

Checks whether the given row contains an impossible breeding scheme.

Retrieves parent genotypes and pulls the mendalian ratios from

BreedingScheme. Compares offspring to these ratios, returning True for

those which are not possible.

e.g. hom x hom parents cannot produce wt offspring.

K-Meech · 2026-06-12T14:14:37Z

+    to_ignore = standard_csv.apply(_remove_impossible_breeding_schemes, axis=1)
+    standard_csv = standard_csv[~to_ignore]


Mostly personal preference, but maybe change the variable name here to make it super clear what is being ignored.

Suggested change

to_ignore = standard_csv.apply(_remove_impossible_breeding_schemes, axis=1)

standard_csv = standard_csv[~to_ignore]

impossible_breeding_schemes = standard_csv.apply(_is_impossible_breeding_scheme, axis=1)

standard_csv = standard_csv[~impossible_breeding_schemes]

Henry Crosswell and others added 5 commits June 3, 2026 14:53

Added a function for removing impossible breeding schemes

e37fba9

Updating branch to match main

4426557

added typeerror catch within class to fix standardised genotypes test

a856977

added pooch hashes for forbidden schemes

71ff347

added test function for row deletion of impossible scheme

2cfe8c8

HenryCrosswell requested a review from a team June 8, 2026 14:18

HenryCrosswell self-assigned this Jun 8, 2026

updated pooch path to remove master

ddef457

K-Meech requested changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hc/fix impossible schemes#43

Hc/fix impossible schemes#43
HenryCrosswell wants to merge 6 commits into
mainfrom
hc/fix_impossible_schemes

HenryCrosswell commented Jun 8, 2026

Uh oh!

K-Meech left a comment

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		].agg("_".join, axis=1)


		def _remove_impossible_breeding_schemes(

	def _remove_impossible_breeding_schemes(
	def _is_impossible_breeding_scheme(

	rom from standardised_dataframe (pd.DataFrame): standardised PyRAT df
	row from standardised_dataframe (pd.DataFrame): standardised PyRAT df

		to_ignore = standard_csv.apply(_remove_impossible_breeding_schemes, axis=1)
		standard_csv = standard_csv[~to_ignore]

Conversation

HenryCrosswell commented Jun 8, 2026

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

K-Meech left a comment

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants