Skip to content

giannkas/tipwas-bionets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tipwas-bionets

This repository associates transposable insertion polymorphisms (TIPs) with genes to build subnetworks within a biological network. It is based on the same concept as the gwas-bionets repositories, but this time it focuses on TIPs instead of SNPs and on a different organism. While gwas-bionets and gwas-bionets2 have been tested on human datasets, the current repository is specifically tailored to work with A. thaliana for now.

Software requirements

I explain how to install all the network-based methods, and how to set the environment for running them.

Ideally, create a folder in your home directory to store all software. For example:

mkdir ~/bin

Install Java (required for installing nextflow)

Create a "java" folder in the software directory and navigate to it.

mkdir ~/bin/java
cd ~/bin/java

Download the x64 version of Java JDK as a tar.gz file from the Oracle website into your machine and decompress it. In this case is version 24, but you can check latest releases.

wget https://download.oracle.com/java/24/latest/jdk-24_linux-x64_bin.tar.gz
tar xzfv jdk-24_linux-x64_bin.tar.gz 
rm jdk-24_linux-x64_bin.tar.gz

Export the path to the bin directory of this folder into the system variable $PATH to make Java executable. Also, export the $JAVA_HOME variable indicating the root directory. Ideally, add these to ~/.bashrc to avoid repeating the process on each server connection or reboot, eg.

export PATH=/home/username/bin/java/jdk-24.0.1/bin:$PATH
export JAVA_HOME=/home/username/bin/java/jdk-24.0.1

You may need to source .bashrc file before checking installation, so type:

source ~/.bashrc

Test the installation:

java -version

You should see something like:

java version "24.0.1" 2025-04-15
Java(TM) SE Runtime Environment (build 24.0.1+9-30)
Java HotSpot(TM) 64-Bit Server VM (build 24.0.1+9-30, mixed mode, sharing)

Install Nextflow

People at sequera provide better explanations and documentation about Nextflow than I could offer. Please follow the installation steps there.

Install R and some packages (required for the methods)

If you dont't have R installed in your machine (add your superuse credentials if needed to install software), then proceed as follows (or you can check instructions from The Comprehensive R archive Network):

wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
apt-get -y install --no-install-recommends r-base r-base-dev

Setup the general CRAN repo.

echo 'local({
    r <- getOption("repos")
    r["CRAN"] = "https://cloud.r-project.org/"
    options(repos = r)
  })' >> /etc/R/Rprofile.site

Install Bioconductor (BiocManager), twilight and BioNet (the latter contains the necessary files to use Heinz method).

R -e "install.packages('BiocManager')"
R -e "BiocManager::install('BioNet')"
R -e "BiocManager::install('twilight')"

Install R packages, tidyverse, cowplot, igraph and gprofiler2:

R -e "install.packages(c('tidyverse', 'cowplot', 'igraph', 'gprofiler2', 'foreach'))" 

Install LEAN

As of the time of writing this README, LEANR has been removed from the CRAN repository. Therefore, a manual installation is recommended.

wget https://cran.r-project.org/src/contrib/Archive/LEANR/LEANR_1.4.9.tar.gz

And then install it using R console.

install.packages("path/to/LEANR_1.4.9.tar.gz", repos = NULL, type = "source")

Install dmGWAS

The pipeline uses dmGWAS v3.0 or EW_dmGWAS released on October 4, 2014, one can refer to this page for more information. Download it as follows:

wget https://bioinfo.uth.edu/dmGWAS/dmGWAS_3.0.tar.gz

And then install it using R console

install.packages("path/to/dmGWAS_3.0.tar.gz", repos = NULL, type = "source")

Install SigMod

You can install SigMod (version 2) from this website: Strongly Interconnected Gene MODule. It contains R scripts, then it suffices to assign the parameter sigmod_path when calling the bionets.nf script, eg. --sigmod_path="~/bin/SigMod_v2".

You change directory to our software directory bin folder:

cd ~/bin

Download the zip file and decompress it (no need to create a folder since there is a folder inside including the code and manual):

wget https://github.com/YuanlongLiu/SigMod/raw/20c561876d87a0faca632a6b93882fcffd719b17/SigMod_v2.zip
unzip SigMod_v2.zip

Change folder name

mv SigMod_v2 sigmod

Install Heinz

You have already installed it when installing the BioNet package from Bioconductor :-)

After that, we are all set!

Main Scripts

  1. This script works with a typical summary statistics GWAS file from PLINK and a gff annotation file to aggregate TIPs and find gene P-values. Since we do not use raw data at all here, k and data_samp parameters are always set to 1.

bionets_construction_tips.sh

Use

There is two main inputs for the entire pipeline to work.

Annotation file

This file is commonly in gff format, e.g. using data from A. thaliana.

##gff-3								
Chr1	Araport11	mRNA	6773302	6774523	.	-	.	ID=AT1G19570.1;Name=AT1G19570.1;Note=dehydroascorbate reductase;curator_summary=Encodes a member of the dehydroascorbate reductase gene family.;conf_class=2;symbol=DHAR1;full_name=dehydroascorbate reductase;computational_description=dehydroascorbate reductase;(source=Araport11);conf_rating=****;gene=3688014,UniProt=Q9FWR4
Chr1	Araport11	gene	6773302	6774523	.	-	.	ID=AT1G19570;locus_type=protein_coding;Name=AT1G19570;Note=dehydroascorbate reductase;curator_summary=Encodes a member of the dehydroascorbate reductase gene family..;symbol=DHAR1;full_name=dehydroascorbate reductase;computational_description=dehydroascorbate reductase;(source=Araport11);locus=2013119;locus_type=protein_coding
Chr5	Araport11	mRNA	5736630	5741500	.	-	.	ID=AT5G17420.1;Name=AT5G17420.1;Note=Cellulose synthase family protein;curator_summary=Encodes a xylem-specific cellulose synthase that is phosphorylated on one or more serine residues (on either S185 or one of S180 or S181). IRX3 is required for secondary cell wall biosynthesis.;conf_class=2;symbol=IRX3;full_name=IRREGULAR XYLEM 3;computational_description=Cellulose synthase family protein;(source=Araport11);conf_rating=****;gene=2178934,UniProt=Q9SWW6
Chr5	Araport11	gene	5736630	5741500	.	-	.	ID=AT5G17420;Name=AT5G17420;locus_type=protein_coding;Note=Cellulose synthase family protein;curator_summary=Encodes a xylem-specific cellulose synthase that is phosphorylated on one or more serine residues (on either S185 or one of S180 or S181). 

One can extract what we will use easily using the next command in a linux-like terminal:

awk '($3 == "gene") {print $1, $4, $5, $9}' original_file.gff | awk -F'[;ID=]' '{print $1, $4}' | awk '{print $1, $2, $3, $4}' OFS='\t' > output_file.gff

The output file will look like this:

Chr1	6773302	6774523	AT1G19570
Chr5	5736630	5741500	AT5G17420
Chr2	10676451	10676573	AT2G25095
Chr4	15415418	15415521	AT4G31877
Chr1	3631	5899	AT1G01010

It has no headers, column 1 is the chromosome, start and end positions for a gene are the next two columns, and gene id is the last column. This output_file should be the one used as input for the annotation file.

TIP gwas file

This file keeps the format of a PLINK output file. E.g.

 CHR             SNP         BP    NMISS       BETA         SE         R2        T            P 
   1     fixed.TIP     194454       51      92.78      351.3   0.001421   0.2641       0.7928 
   1     fixed.TIP     309216       51     -188.7      233.9    0.01311  -0.8069       0.4237 
   1     fixed.TIP     350928       51      336.1      201.5    0.05372    1.668       0.1017 
   1     fixed.TIP     370172       51      336.1      201.5    0.05372    1.668       0.1017 

About

This is a repository to associate transposable insertion polymorphisms with genes to build subnetworks in a biological network

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors