This repository associates transposable insertion polymorphisms (TIPs) with genes to build subnetworks within a biological network. It is based on the same concept as the gwas-bionets repositories, but this time it focuses on TIPs instead of SNPs and on a different organism. While gwas-bionets and gwas-bionets2 have been tested on human datasets, the current repository is specifically tailored to work with A. thaliana for now.
I explain how to install all the network-based methods, and how to set the environment for running them.
Ideally, create a folder in your home directory to store all software. For example:
mkdir ~/binInstall Java (required for installing nextflow)
Create a "java" folder in the software directory and navigate to it.
mkdir ~/bin/java
cd ~/bin/javaDownload the x64 version of Java JDK as a tar.gz file from the Oracle website into your machine and decompress it. In this case is version 24, but you can check latest releases.
wget https://download.oracle.com/java/24/latest/jdk-24_linux-x64_bin.tar.gz
tar xzfv jdk-24_linux-x64_bin.tar.gz
rm jdk-24_linux-x64_bin.tar.gzExport the path to the bin directory of this folder into the system variable $PATH to make Java executable. Also, export the $JAVA_HOME variable indicating the root directory. Ideally, add these to ~/.bashrc to avoid repeating the process on each server connection or reboot, eg.
export PATH=/home/username/bin/java/jdk-24.0.1/bin:$PATH
export JAVA_HOME=/home/username/bin/java/jdk-24.0.1You may need to source .bashrc file before checking installation, so type:
source ~/.bashrcTest the installation:
java -versionYou should see something like:
java version "24.0.1" 2025-04-15
Java(TM) SE Runtime Environment (build 24.0.1+9-30)
Java HotSpot(TM) 64-Bit Server VM (build 24.0.1+9-30, mixed mode, sharing)Install Nextflow
People at sequera provide better explanations and documentation about Nextflow than I could offer. Please follow the installation steps there.
Install R and some packages (required for the methods)
If you dont't have R installed in your machine (add your superuse credentials if needed to install software), then proceed as follows (or you can check instructions from The Comprehensive R archive Network):
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
apt-get -y install --no-install-recommends r-base r-base-devSetup the general CRAN repo.
echo 'local({
r <- getOption("repos")
r["CRAN"] = "https://cloud.r-project.org/"
options(repos = r)
})' >> /etc/R/Rprofile.siteInstall Bioconductor (BiocManager), twilight and BioNet (the latter contains the necessary files to use Heinz method).
R -e "install.packages('BiocManager')"
R -e "BiocManager::install('BioNet')"
R -e "BiocManager::install('twilight')"Install R packages, tidyverse, cowplot, igraph and gprofiler2:
R -e "install.packages(c('tidyverse', 'cowplot', 'igraph', 'gprofiler2', 'foreach'))" Install LEAN
As of the time of writing this README, LEANR has been removed from the CRAN repository. Therefore, a manual installation is recommended.
wget https://cran.r-project.org/src/contrib/Archive/LEANR/LEANR_1.4.9.tar.gzAnd then install it using R console.
install.packages("path/to/LEANR_1.4.9.tar.gz", repos = NULL, type = "source")Install dmGWAS
The pipeline uses dmGWAS v3.0 or EW_dmGWAS released on October 4, 2014, one can refer to this page for more information. Download it as follows:
wget https://bioinfo.uth.edu/dmGWAS/dmGWAS_3.0.tar.gzAnd then install it using R console
install.packages("path/to/dmGWAS_3.0.tar.gz", repos = NULL, type = "source")Install SigMod
You can install SigMod (version 2) from this website: Strongly Interconnected Gene MODule. It contains R scripts, then it suffices to assign the parameter sigmod_path when calling the bionets.nf script, eg. --sigmod_path="~/bin/SigMod_v2".
You change directory to our software directory bin folder:
cd ~/binDownload the zip file and decompress it (no need to create a folder since there is a folder inside including the code and manual):
wget https://github.com/YuanlongLiu/SigMod/raw/20c561876d87a0faca632a6b93882fcffd719b17/SigMod_v2.zip
unzip SigMod_v2.zipChange folder name
mv SigMod_v2 sigmodInstall Heinz
You have already installed it when installing the BioNet package from Bioconductor :-)
After that, we are all set!
- This script works with a typical summary statistics GWAS file from PLINK and a
gffannotation file to aggregate TIPs and find gene P-values. Since we do not use raw data at all here,kanddata_sampparameters are always set to 1.
bionets_construction_tips.sh
There is two main inputs for the entire pipeline to work.
This file is commonly in gff format, e.g. using data from A. thaliana.
##gff-3
Chr1 Araport11 mRNA 6773302 6774523 . - . ID=AT1G19570.1;Name=AT1G19570.1;Note=dehydroascorbate reductase;curator_summary=Encodes a member of the dehydroascorbate reductase gene family.;conf_class=2;symbol=DHAR1;full_name=dehydroascorbate reductase;computational_description=dehydroascorbate reductase;(source=Araport11);conf_rating=****;gene=3688014,UniProt=Q9FWR4
Chr1 Araport11 gene 6773302 6774523 . - . ID=AT1G19570;locus_type=protein_coding;Name=AT1G19570;Note=dehydroascorbate reductase;curator_summary=Encodes a member of the dehydroascorbate reductase gene family..;symbol=DHAR1;full_name=dehydroascorbate reductase;computational_description=dehydroascorbate reductase;(source=Araport11);locus=2013119;locus_type=protein_coding
Chr5 Araport11 mRNA 5736630 5741500 . - . ID=AT5G17420.1;Name=AT5G17420.1;Note=Cellulose synthase family protein;curator_summary=Encodes a xylem-specific cellulose synthase that is phosphorylated on one or more serine residues (on either S185 or one of S180 or S181). IRX3 is required for secondary cell wall biosynthesis.;conf_class=2;symbol=IRX3;full_name=IRREGULAR XYLEM 3;computational_description=Cellulose synthase family protein;(source=Araport11);conf_rating=****;gene=2178934,UniProt=Q9SWW6
Chr5 Araport11 gene 5736630 5741500 . - . ID=AT5G17420;Name=AT5G17420;locus_type=protein_coding;Note=Cellulose synthase family protein;curator_summary=Encodes a xylem-specific cellulose synthase that is phosphorylated on one or more serine residues (on either S185 or one of S180 or S181).
One can extract what we will use easily using the next command in a linux-like terminal:
awk '($3 == "gene") {print $1, $4, $5, $9}' original_file.gff | awk -F'[;ID=]' '{print $1, $4}' | awk '{print $1, $2, $3, $4}' OFS='\t' > output_file.gffThe output file will look like this:
Chr1 6773302 6774523 AT1G19570
Chr5 5736630 5741500 AT5G17420
Chr2 10676451 10676573 AT2G25095
Chr4 15415418 15415521 AT4G31877
Chr1 3631 5899 AT1G01010
It has no headers, column 1 is the chromosome, start and end positions for a gene are the next two columns, and gene id is the last column. This output_file should be the one used as input for the annotation file.
This file keeps the format of a PLINK output file. E.g.
CHR SNP BP NMISS BETA SE R2 T P
1 fixed.TIP 194454 51 92.78 351.3 0.001421 0.2641 0.7928
1 fixed.TIP 309216 51 -188.7 233.9 0.01311 -0.8069 0.4237
1 fixed.TIP 350928 51 336.1 201.5 0.05372 1.668 0.1017
1 fixed.TIP 370172 51 336.1 201.5 0.05372 1.668 0.1017