Skip to content

Commit a2bad58

Browse files
authored
bump version from 0.2.13 -> 0.3.0
2 parents 4cfea73 + 801dab0 commit a2bad58

7 files changed

Lines changed: 242 additions & 61 deletions

File tree

README.md

Lines changed: 71 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88

99
## TL;DR
1010

11-
[ReadTheDocs](https://screenpro2.readthedocs.io) |
12-
[PyPI](https://pypi.org/project/ScreenPro2)
11+
[**ReadTheDocs**](https://screenpro2.readthedocs.io) |
12+
[**PyPI**](https://pypi.org/project/ScreenPro2)
1313

1414
ScreenPro2 enables perform flexible analysis on high-content CRISPR screening datasets. It has functionalities to process data from diverse CRISPR screen platforms and is designed to be modular to enable easy extension to custom CRISPR screen platforms or other commonly used platforms in addition to the ones currently implemented.
1515

@@ -39,6 +39,12 @@ pip install git+https://github.com/ArcInstitute/ScreenPro2.git
3939
```
4040

4141
## Usage
42+
First, import the ScreenPro2 package:
43+
44+
```python
45+
import screenpro as scp
46+
```
47+
4248
Data analysis for CRISPR screens with NGS readouts can be broken down into three main steps:
4349

4450
- [Step 1: FASTQ to counts](#step-1-fastq-to-counts)
@@ -47,11 +53,70 @@ Data analysis for CRISPR screens with NGS readouts can be broken down into three
4753

4854
### Step 1: FASTQ to counts
4955

50-
Since version 0.2.7, ScreenPro2 has a built-in method to process FASTQ files and generate counts. This method is implemented in the `ngs` module
51-
and relvent submodules. A minor novelty here has enabled processing single, dual, or multiple sgRNA CRISPR screens. Also, this approach can retain
52-
recombination events which can occur in dual or higher order sgRNA CRISPR screens.
56+
ScreenPro2 has a built-in method to process FASTQ files and generate counts.
57+
This method is implemented in the `ngs` module and relvent submodules.
58+
A minor novelty here has enabled processing single, dual, or multiple sgRNA
59+
CRISPR screens. Also, this approach can retain recombination events which can
60+
occur in dual or higher order sgRNA CRISPR screens.
61+
62+
Currently, `Counter` class from the `ngs` module can process FASTQ files and generate counts for standard
63+
CRISPR screens with [single](#dcas9-crisprai-single-sgrna-screens) or [dual](#crispri-dual-sgrna-screens)
64+
guide design.
65+
66+
Here is a draft code to process FASTQ files and generate counts for an experiment with [CRISPRi-dual-sgRNA-screens](#crispri-dual-sgrna-screens):
67+
68+
```python
69+
# Initialize the Counter object
70+
counter = scp.Counter(cas_type = 'cas9', library_type = 'single_guide_design')
71+
72+
# Load the reference library
73+
counter.load_library("<path-to-CRISPR-library-table>", sep = '\t', verbose = True, index_col=None)
74+
75+
# Define the samples
76+
samples = []
77+
## `samples` is a list of sample ids in the experiment.
78+
## Each sample id should match the sample name in the FASTQ files, i.e. <sample_id>.fastq.gz
79+
80+
# Process the FASTQ files and generate counts
81+
counter.get_counts_matrix(
82+
fastq_dir = '<path-to-fastq-directory>',
83+
samples = samples,
84+
verbose = True
85+
)
86+
```
87+
88+
Here is a draft code to process FASTQ files and generate counts for an experiment with [CRISPRi-dual-sgRNA-screens](#crispri-dual-sgrna-screens):
5389

54-
There is no example code for this step yet, but a command line interface (CLI) will be available soon.
90+
91+
```python
92+
# Initialize the Counter object
93+
counter = scp.Counter(cas_type = 'dCas9', library_type = 'dual_guide_design')
94+
95+
# Load the reference library
96+
counter.load_library("<path-to-CRISPR-library-table>", sep = '\t', verbose = True, index_col=None)
97+
98+
# Define the samples
99+
samples = []
100+
## `samples` is a list of sample ids in the experiment.
101+
## Each sample id should match the sample name in the FASTQ files, i.e. <sample_id>_R[1,2].fastq.gz
102+
103+
# Process the FASTQ files and generate counts
104+
counter.get_counts_matrix(
105+
fastq_dir = '<path-to-fastq-directory>',
106+
samples = samples,
107+
verbose = True
108+
)
109+
```
110+
111+
After this, you have `.counts_mat` calculated in the `Counter` object.
112+
113+
___
114+
115+
To proceed, you need to create an `AnnData` object from the counts matrix and metadata. You can use the following code to create an `AnnData` object:
116+
117+
```python
118+
adata = counter.build_counts_anndata()
119+
```
55120

56121
### Step 2: Phenotype calculation
57122

docs/source/history.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@
22
History
33
=======
44

5-
0.3.0 (coming soon)
5+
0.4.0 (coming soon)
66
~~~~~~~~~~~~~~~~~~~
77
* add command line interface
88

9-
0.2.11 (May 2024)
9+
0.2.11 - 0.3.0 (Apr 2024 - May 2024)
1010
~~~~~~~~~~~~~~~~~
11-
* introduce `counter` module
11+
* introduce `Counter` class as wrapper for `ngs` module
1212
* improve core functionalities for CLI
13+
* major bug fixes
1314

1415
0.2.7 - 0.2.10 (Mar 2024 - Apr 2024)
1516
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

screenpro/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@
66
from .ngs import Counter
77
from .assays import PooledScreens, GImaps
88

9-
__version__ = "0.2.13"
9+
__version__ = "0.3.0"
1010
__author__ = "Abe Arab"
1111
__email__ = 'abea@arcinstitute.org' # "abarbiology@gmail.com"

screenpro/load.py

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,54 @@
88
from .utils import check_protospacer_length, trim_protospacer
99

1010

11-
def load_cas9_sgRNA_library(library_path, library_type, sep='\t', index_col=0, protospacer_length=19, verbose=True):
11+
def load_cas9_sgRNA_library(library_path, library_type, sep='\t', index_col=0, protospacer_length=19, verbose=True, **args):
1212
'''Load Cas9 sgRNA library table for single or dual guide design.
1313
'''
1414
library = pd.read_csv(
1515
library_path,
1616
sep=sep,
1717
index_col=index_col,
18+
**args
1819
)
1920

2021
## Evaluate library table and reformat columns for downstream analysis
2122
# I would like to name the target column 'target' if it is named 'gene'!
22-
if 'gene' in library.columns:
23-
# rename gene column to target
24-
library = library.rename(columns={'gene': 'target'})
2523

2624
if library_type == "single_guide_design":
27-
eval_columns = ['target', 'sgID', 'protospacer']
25+
eval_columns = ['target', 'sgID', 'protospacer', 'sequence']
26+
27+
# reformating columns as needed
28+
if 'gene' in library.columns:
29+
# rename gene column to target
30+
library = library.rename(columns={'gene': 'target'})
31+
if 'sequence' in library.columns and 'protospacer' not in library.columns:
32+
library.rename(columns={'sequence': 'protospacer'}, inplace=True)
33+
if 'sgId' in library.columns:
34+
library.rename(columns={'sgId': 'sgID'}, inplace=True)
2835

2936
# Upper case protospacer sequences
3037
library['protospacer'] = library['protospacer'].str.upper()
3138

39+
protospacer_col = 'protospacer'
40+
in_length = check_protospacer_length(library, 'protospacer')
41+
if in_length == protospacer_length:
42+
pass
43+
elif in_length > protospacer_length:
44+
if verbose: print(f"Trimming protospacer sequences in '{protospacer_col}' column.")
45+
library = trim_protospacer(
46+
library, protospacer_col,
47+
'5prime',
48+
in_length - protospacer_length
49+
)
50+
51+
elif in_length < protospacer_length:
52+
raise ValueError(
53+
f"Input protospacer length for '{protospacer_col}' is less than {protospacer_length}"
54+
)
55+
56+
# write `sequence` column as `protospacer` (after trimming)
57+
library['sequence'] = library['protospacer']
58+
3259
for col in eval_columns:
3360
if col not in library.columns:
3461
raise ValueError(f"Column '{col}' not found in library table.")
@@ -43,6 +70,11 @@ def load_cas9_sgRNA_library(library_path, library_type, sep='\t', index_col=0, p
4370
'sequence'
4471
]
4572

73+
# reformating columns as needed
74+
if 'gene' in library.columns:
75+
# rename gene column to target
76+
library = library.rename(columns={'gene': 'target'})
77+
4678
# Upper case protospacer sequences
4779
library['protospacer_A'] = library['protospacer_A'].str.upper()
4880
library['protospacer_B'] = library['protospacer_B'].str.upper()
@@ -62,10 +94,10 @@ def load_cas9_sgRNA_library(library_path, library_type, sep='\t', index_col=0, p
6294

6395
elif in_length < protospacer_length:
6496
raise ValueError(
65-
f"Input protospacer length for '{protospacer_col}'is less than {protospacer_length}"
97+
f"Input protospacer length for '{protospacer_col}' is less than {protospacer_length}"
6698
)
6799

68-
# if 'sequence' not in library.columns:
100+
# write `sequence` column as `protospacer_A;protospacer_B` (after trimming)
69101
library['sequence'] = library['protospacer_A'] + ';' + library['protospacer_B']
70102

71103
for col in eval_columns:
@@ -74,6 +106,9 @@ def load_cas9_sgRNA_library(library_path, library_type, sep='\t', index_col=0, p
74106

75107
library = library[eval_columns]
76108

109+
else:
110+
raise ValueError(f"Invalid library type: {library_type}. Please choose 'single_guide_design' or 'dual_guide_design'.")
111+
77112
if verbose: print("Library table successfully loaded.")
78113

79114
return library

screenpro/ngs/cas9.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ def fastq_to_count_single_guide(
1616

1717
if trim5p_start and trim5p_length:
1818
sql_cmd = f"""
19-
SELECT substr(f.sequence, {trim5p_start}, {trim5p_length}) AS sequence, COUNT(*) as count
19+
SELECT substr(f.sequence, {trim5p_start}, {trim5p_length}) AS protospacer, COUNT(*) as count
2020
FROM fastq_scan('{fastq_file_path}') f
21-
GROUP BY sequence
21+
GROUP BY protospacer
2222
"""
2323
else:
2424
sql_cmd = f"""
25-
SELECT f.sequence AS sequence, COUNT(*) as count
25+
SELECT f.sequence AS protospacer, COUNT(*) as count
2626
FROM fastq_scan('{fastq_file_path}') f
27-
GROUP BY sequence
27+
GROUP BY protospacer
2828
"""
2929

3030
df_count = session.sql(sql_cmd).to_polars()
@@ -91,13 +91,14 @@ def map_to_library_single_guide(df_count, library, return_type='all', verbose=Fa
9191
# get counts for given input
9292
res = df_count.clone() #cheap deepcopy/clone
9393
res = res.sort('count', descending=True)
94+
9495
res = res.with_columns(
95-
pl.col("sequence").alias("sequence"),
96+
pl.col("protospacer").alias("sequence"),
9697
)
9798

9899
res_map = pl.DataFrame(library).join(
99-
res, on="sequence", how="left"
100-
)
100+
res, on="sequence", how="left"
101+
)
101102

102103
if return_type == 'unmapped' or return_type == 'all':
103104
res_unmap = res.join(

0 commit comments

Comments
 (0)