-
Notifications
You must be signed in to change notification settings - Fork 1
Hc/fix impossible schemes #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e37fba9
4426557
a856977
71ff347
2cfe8c8
ddef457
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,7 +3,7 @@ | |||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| import pandas as pd | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| from oscar.breeding_scheme import Genotype | ||||||||||||||||||||||||||||
| from oscar.breeding_scheme import BreedingScheme, Genotype | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| class Identifier(Enum): | ||||||||||||||||||||||||||||
|
|
@@ -94,6 +94,9 @@ def standardise_pyrat_csv( | |||||||||||||||||||||||||||
| id_offspring_col = standard_csv.pop("ID_offspring") | ||||||||||||||||||||||||||||
| standard_csv.insert(0, "ID_offspring", id_offspring_col) | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| to_ignore = standard_csv.apply(_remove_impossible_breeding_schemes, axis=1) | ||||||||||||||||||||||||||||
| standard_csv = standard_csv[~to_ignore] | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| return standard_csv | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
@@ -384,3 +387,41 @@ def _make_combined_genotype_column_for_identifier( | |||||||||||||||||||||||||||
| line_data.loc[genotyped_rows, new_col_name] = pivoted_mutations.loc[ | ||||||||||||||||||||||||||||
| genotyped_rows, unique_mutations | ||||||||||||||||||||||||||||
| ].agg("_".join, axis=1) | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| def _remove_impossible_breeding_schemes( | ||||||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this function doesn't directly remove the impossible breeding schemes, but rather identifies them with True / False. I'd maybe re-name this? Something like:
Suggested change
|
||||||||||||||||||||||||||||
| standardised_df_row: pd.Series, | ||||||||||||||||||||||||||||
| ) -> bool: | ||||||||||||||||||||||||||||
| """Retrieves parent genotypes and pulls the mendalian ratios from | ||||||||||||||||||||||||||||
| BreedingScheme. | ||||||||||||||||||||||||||||
| Compares offspring to these ratios, removing those which are not possible. | ||||||||||||||||||||||||||||
| e.g. hom x hom parents cannot produce wt x wt offspring. | ||||||||||||||||||||||||||||
|
Comment on lines
+394
to
+398
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe modify this so it's clear that this particular function isn't removing them?
Suggested change
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Parameters | ||||||||||||||||||||||||||||
| ---------- | ||||||||||||||||||||||||||||
| standardised_df_row : pd.Series | ||||||||||||||||||||||||||||
| rom from standardised_dataframe (pd.DataFrame): standardised PyRAT df | ||||||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Returns | ||||||||||||||||||||||||||||
| ------- | ||||||||||||||||||||||||||||
| bool | ||||||||||||||||||||||||||||
| bool of whether or not that row contains an impossible breeding scheme | ||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| pop = False | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| genotype_father = standardised_df_row["genotype_father"] | ||||||||||||||||||||||||||||
| genotype_mother = standardised_df_row["genotype_mother"] | ||||||||||||||||||||||||||||
| genotype_offspring = standardised_df_row["genotype_offspring"] | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To keep rows with un-genotyped offspring, you could add something like this here: Hopefully this should make the |
||||||||||||||||||||||||||||
| typed_offspring = Genotype.from_string(genotype_offspring) | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| scheme = BreedingScheme(genotype_father, genotype_mother) | ||||||||||||||||||||||||||||
| ratio = scheme.mendelian_ratio() | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| if typed_offspring not in ratio: | ||||||||||||||||||||||||||||
| pop = True | ||||||||||||||||||||||||||||
| elif ratio[typed_offspring] == 0: | ||||||||||||||||||||||||||||
| pop = True | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| return pop | ||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -62,6 +62,27 @@ def test_standardise_genotypes(): | |
| pooch_data_path("standardised-data-forbidden-genotypes.csv") | ||
| ) | ||
|
|
||
| with pytest.raises(TypeError): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By adding You'll need to remove the |
||
| standard_csv = standardise_pyrat_csv(pyrat_csv) | ||
| pd.testing.assert_frame_equal( | ||
| standard_csv.reset_index(drop=True), | ||
| expected_csv.reset_index(drop=True), | ||
| ) | ||
|
|
||
|
|
||
| def test_remove_impossible_breeding_schemes(): | ||
| """ | ||
| Test that impossible breeding schemes are removed from raw data. | ||
| (e.g. hom x hom parents cannot make wt offspring) | ||
| """ | ||
|
|
||
| pyrat_csv = pd.read_csv( | ||
| pooch_data_path("pyrat-data-forbidden-schemes.csv") | ||
| ) | ||
| expected_csv = pd.read_csv( | ||
| pooch_data_path("standardised-data-forbidden-schemes.csv") | ||
| ) | ||
|
|
||
| standard_csv = standardise_pyrat_csv(pyrat_csv) | ||
| pd.testing.assert_frame_equal( | ||
| standard_csv.reset_index(drop=True), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly personal preference, but maybe change the variable name here to make it super clear what is being ignored.