APA localization - Analysis and Pipelines

This repository contains the computational workflows and downstream analysis notebooks related to the analysis of alternative polyadenylation isoforms in subcellular compartments.

The repository is optimized for running BOTH the workflows and analysis in jupyter notebook on HPC cluster.

On sciCORE HPC, running jupyter notebook on a computational node is nicely enabled by OnDemand service.

We utilize a hybrid approach: Snakemake for robust, scalable data processing on HPC clusters (sciCORE), and Jupyter Notebooks for interactive downstream analysis and visualization.

Current state

Currently, we've reanalyzed the bulk RNA-seq data from the study System-wide analysis of RNA and protein subcellular localization dynamics We've run basic processing, alignment, and quantification of gene expression using customly prepared .gtf file and FeatureCounts utility. We further quantified relative usage of polyadenylation sites (PASs) with a PAQR2 workflow. At that, we've used the modified version of the workflow from the paper Leveraging multi-omics data to infer regulators of mRNA 3’ end processing in glioblastoma. As input for PAQR2, we used human PolyASite Atlas v3.0 filtered for 62% stringency level (see the paper for explanation of the optimality of that particular threshold value).

We further focused on NFYA gene and alternative polyadenylation at its terminal exon, in a collaborative project with the group of Prof. Dr. Paolo Gandellini.

Action plan In general, different sub-projects related to subcellular localization of APA isoforms, will be corresponding to different jupyter notebooks. For now, only NFYA project code and data are present.

Repository Structure

.
├── NFYA_project.ipynb                  # a Jupyter notebook dedicated to NFYA project, includes analysis and workflow configuration
├── APA_localization.template.env       # Template for required environment variables/paths
└── WF/                                 # Snakemake Workflow Engine
    ├── Snakefile-prepare-faster        # Pipeline Step 1: RNA-seq data processing (alignment, FastQC)
    ├── Snakefile-quantification-faster # Pipeline Step 2: Quantification of gene expression with FeatureCounts, separating .bam files by chromosomes for efficiency, preparation of coverages for PAQR quantification
    ├── Snakefile-PAQR-quantify         # Pipeline Step 3: Running final PAQR quantification to obtain PAS-vs-sample count matrix
    ├── config.template.yaml            # Template configuration for Snakemake parameters
    ├── envs/                           # Conda environments isolated for specific Snakemake rules
    ├── profile/                        # SLURM execution profile for the HPC
    └── scripts/                        # Python and R scripts utilized by both Snakemake and Jupyter

Quick Start & Setup

To ensure strict reproducibility and security, this project uses .env files to manage all absolute paths (data directories, genome annotations, etc.). Do not hardcode paths into the Python or Snakemake files.

1. Clone the Repository

Clone this repository into your local user space ($HOME):

git clone https://github.com/zavolanlab/APA_localization.git
cd APA_localization

2. Configure Environment Paths

You must map the project to your local HPC paths. First, copy the template, rename it, and fill in your absolute paths, for example like that:

cp APA_localization.template.env APA_localization.scicore.env
# Open .env and edit the "Base Directories" section to match your system

Recommended if you are a Zavolan group member on sciCORE: move the APA_localization.scicore.env to Project GROUP folder and symlink into your local repository directory:
```
ln -s <a file with specified sciCORE paths> APA_localization.scicore.env
```

This way APA_localization.scicore.env will be automatically accessible by group members but will not be tracked by git. (Note: *.env files are ignored by git to protect private cluster paths, except the APA_localization.template.env file). **(APA_localization.scicore.env does exist in the GROUP folder of the Project on Scicore. Look for README there.)

3. Install the conda environment with zavolab_pyutils

Analysis in the notebook is largely based on the functions from zavolab_pyutils repository. Follow the instruction from that repo "Developer Setup from source, with conda environment". Use the created conda environment "zavolab_pyutils" to execute the Jupyter Notebook.

4. Essential for developpers! Install nbstripout

When in the APA_localization directory, run:

nbstripout --install

This will automatically hide the output of cells in juputer notebooks when pushed to github! Otherwise there is a risk of exposing your HPC cluster paths to public.

5. Use the juputer notebook to configure the workflow and input table preparation

Configuration of the workflows (i.e. creation of input .tsv with sample specification and .yaml config is done inside the jupyter notebook)

6. Executing the Workflows

The heavy lifting is divided into (currently, three) separate Snakemake workflows located in the WF/ directory.

Bash commands are also prepared inside the jupyter notebook. They should be further copied into command line and executed.

On an HPC cluster like sciCORE, workflows should be executed on a login node. Snakemake further automatically submits jobs to computational nodes.

7. Downstream Analysis

Once the Snakemake workflows are complete, all results are routed to the shared group directories defined in your .env file.

Use respective sections of the Jupyter Notebook to analyze the outputs.

The notebook automatically loads your .env paths using python-dotenv, allowing it to dynamically locate all workflow results, figures, and metadata regardless of where you cloned this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
WF		WF
.gitignore		.gitignore
APA_localization.template.env		APA_localization.template.env
LICENSE		LICENSE
NFYA_project.ipynb		NFYA_project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APA localization - Analysis and Pipelines

Current state

Repository Structure

Quick Start & Setup

1. Clone the Repository

2. Configure Environment Paths

3. Install the conda environment with zavolab_pyutils

4. Essential for developpers! Install nbstripout

5. Use the juputer notebook to configure the workflow and input table preparation

6. Executing the Workflows

7. Downstream Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

APA localization - Analysis and Pipelines

Current state

Repository Structure

Quick Start & Setup

1. Clone the Repository

2. Configure Environment Paths

3. Install the conda environment with zavolab_pyutils

4. Essential for developpers! Install nbstripout

5. Use the juputer notebook to configure the workflow and input table preparation

6. Executing the Workflows

7. Downstream Analysis

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages