-
Notifications
You must be signed in to change notification settings - Fork 117
Editing Using Leafcutter with Windows Subsytem for Linux (WSL)
edited by Jesse G. Meyer (jgmeyerucsd) from Anil Chalisey's parseR wiki
- Installing the Linux subsystem for Windows
- Build environment
- Installing Anaconda
- Installing R and RStudio
- Installing key R packages
- Installing LeafCutter
- Convert .BAM to .junc
- Differential intron excision analysis
- Visualize results with leafviz
- Host Results with rsconnect and shinyapps.io
- Using X-windows
With the introduction of the windows subsystem for linux (WSL) in Windows 10, the Windows OS is now a viable option for bioinformatic analysis, with no need for virtual managers, Docker or Cygwin. It's early days, but I have found it possible to switch entirely from a Linux computer to a Windows 10 computer for my bioinformatics analyses.
This guide explains how to set up a Windows 10 computer to use WSL for leafcutter analysis.
This guide assumes you have mapped your reads already using RNA-STAR into .BAM files. You also need a .gtf file with your genome annoation if you want to use the leafviz shiny app.
- Go to 'Settings > Update & Security > For developers' and turn on 'Developer mode'
- Go to 'Control panel' > 'Programs and features' > 'Turn Windows features on and off' and then tick the 'Windows subsystem for Linux' box and then allow the machine to restart.
- Once restarted open a command prompt and type 'bash' - the linux subsystem will download and then guide you through setting up a username and password.
On the latest version of Windows 10, the linux subsystem installed will be Ubuntu 16.10.
Once installed, the bash terminal may be started by opening up a command prompt (press Win + R
on the keyboard and then type cmd
and press Enter
or click OK) and typing bash
at the command prompt followed by pressing Enter
. Once the terminal is open, the system should be updated using the following commands (remembering to provide your password when asked and to type y
when it asks if you wish to continue):
sudo apt-get update
sudo apt-get upgrade
The following steps should all be performed within the bash terminal. This means that unless specified otherwise, all the steps here will also work for a native Linux/Unix-based operating system. As a side note, my usual preference is to avoid amending the executable path, and instead I tend to make symbolic links to binaries within a directory already in the path. My preferred directory for this purpose is /usr/local/bin
, but if you do not have root access in your Linux system, then you should make a directory within the home directory called bin
and make the links within that directory. To make this directory, use the following commands:
cd ~
mkdir bin
Ensure there is a working build environment using the following command:
sudo apt-get install gcc make build-essential gfortran
Anaconda is a Python (and R) distribution specifically developed for data science and may be installed using the instructions below. While we could also simply use the default Python distribution from the Ubuntu repositories, Anaconda comes with Intel's MKL and thus provides a substantial performance boost (not to mention its conda package manager). It may be installed as follows:
wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh
bash Anaconda2-4.4.0-Linux-x86_64.sh
During installation, accept the license agreement and allow the install location to be prepended to your .bashrc. Once installed, update conda and anaconda.
conda update conda
conda update anaconda
Finally, add the bioconda channel and install software. This is much easier than installing the tools separately as it also installs all the dependencies (for example, the latest version of JAVA). The tools I install here are those necessary for my bioinformatics pathways and the packages I have developed:
conda config --add channels bioconda
conda install -c bioconda samtools bedtools fastqc sambamba MACS2 subread
# these need to be added to the executable path. Anaconda asks you whether
# this should be done during its installation. However, the path created
# by anaconda is only accessible from within R by specifying the entire
# path. To make it easier to access the programs from R I create symbolic
# links as described later below.
If using the WSL-based approach, there is no absolute requirement to install R or RStudio within WSL, as the workflow is to use R within Windows 10 and then to call Linux programs as needed. Installation in Windows 10 is straightforward - simply download the executables, double click and follow the instructions. I recommend using the Microsoft R Open (MRO) version of R which is super-charged with the Intel Maths Kernel Library for multi-threading.
For those using Linux, then instructions are below. R and RStudio can usually only be installed if the user has root priviliges. If you do not have root priviliges then speak to your administrator.
The R-base available via apt-get is usually out-of-date and is best installed directly from CRAN or MRAN.
To install R directly from CRAN, first add the repository to the sources list, then add R to the Ubuntu keyring, and then install R-base:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-dev
To install the MRO (Microsoft R open, multithreaded) version of R use the following commands, replacing x.x.x with whichever version is the latest:
sudo wget https://mran.blob.core.windows.net/install/mro/3.5.1/microsoft-r-open-3.5.1.tar.gz
tar -xvzf microsoft-r-open-3.5.1.tar.gz
cd microsoft-r-open
sudo ./install.sh
cd ..
rm microsoft-r-open*
It is also possible to install R using Anaconda. This may be a solution if one does not have root priviliges. The caveat, however, is that R packages must be installed in a non-standard way via conda channels, as the normal install.packages()
route (see below) throws an error. Also, on some systems, I have found that this results in errors in rendering fonts in plots.
To install R via Anaconda:
conda install -c r r-essentials
To install RStudio:
sudo apt-get update
sudo apt-get install gdebi-core gfortran libgdal-dev libgeos-dev libpng-dev
sudo apt-get install libjpeg62-dev libjpeg8-dev libcairo-dev libssl-dev
wget https://download1.rstudio.org/rstudio-1.0.143-amd64.deb
sudo gdebi -n rstudio-1.0.143-amd64.deb
rm rstudio-1.0.143-amd64.deb
To install RStudio via Anaconda
conda install -c r rstudio
This first step is only required if on Linux and describes installation of some key linux packages on which subsequent R packages are dependent. On most systems, these packages will already be installed. If they are not, root privilege is required.
sudo apt-get update
sudo apt-get install build-essential libx11-dev
sudo apt-get install libcurl4-openssl-dev libxml2 libxml2-dev libncurses5-dev zlib1g-dev curl
In Windows 10 the Rtools package must be installed, which can be downloaded from here.
To perform the subsequent steps in Linux open up an R terminal by typing R
at the command line. In Windows 10 open up R or Microsoft R Open from the start menu. Then type the following commands. After these processes have finished running, the R terminal may be closed and we return to the bash terminal.
install.packages(c("tidyverse", "devtools", "rmarkdown", "knitr", "data.table",
"ggthemes"))
source("http://bioconductor.org/biocLite.R")
biocLite("BiocUpgrade")
biocLite(c("rtracklayer", "limma", "DESeq2", "edgeR", "ComplexHeatmap", "goseq",
"Rsamtools"))
# once complete, close the R terminal
q()
If R has been installed via Anaconda, then the steps for installing these packages is as follows:
conda install -c r r-tidyverse r-devtools r-rmarkdown r-knitr r-data.table
conda install -c ncil r-ggthemes=3.3.0
conda install -c bioconda bioconductor-limma bioconductor-deseq2 bioconductor-edger
conda install -c bioconda bioconductor-complexheatmap bioconductor-rsamtools
conda install -c bioconda bioconductor-goseq bioconductor-rtracklayer
Leafcutter is the package we will use for the RNA-seq analysis [add citation]. The prerequisites for LeafCutter were installed in previous steps
samtools should be available on your PATH Python 2.7 (earlier versions may be OK) R (version 3.3.3, earlier versions may be OK) To download the code (you’ll need this for the leafcutter scripts)
You need to add a symlink to gtar because apparently that doesn't exist: https://github.com/r-dbi/RPostgres/issues/110
cd ~
ln -s /bin/tar /bin/gtar
or with root:
cd ~
sudo ln -s /bin/tar /bin/gtar
To compile the R package to perform differential splicing analysis and make junction plots we recommend you install using devtools (this should install the required R package dependencies for you). Fire up R and run: Start R by typing 'R' and then 'Enter' into the command line:
if (!require("devtools")) install.packages("devtools", repos='http://cran.us.r-project.org')
devtools::install_github("davidaknowles/leafcutter/leafcutter")
Added leafcutter/scripts to PATH export PATH=$PATH::~/leafcutter/scripts/
Made this file convertbam.sh in root directory:
nano convertbam.sh ## start nano text editor
for bamfile in `ls /mnt/d/20180718_RNAseq/DataProcessed/BAM/`
do
echo Converting $bamfile to $bamfile.junc
sh ~/leafcutter/scripts/bam2junc.sh $bamfile $bamfile.junc
echo $bamfile.junc >>test_juncfiles.txt
done
### then type control-x and save the file
Changed directory to windows mount point where my bam files were:
cd /mnt/d/20180718_RNAseq/DataProcessed/BAM/
Then ran the convertbam.sh
sh ~/convertbam.sh
Should start printing that its reading files, and then "# valid, # problematic spliced reads" Takes about 10 minutes per BAM file (I had ~35 million read files with 125bp paired-end reads
- Define intron clusters using leafcutter_cluster.py: "python ../clustering/leafcutter_cluster.py -j rsp7_juncfiles.txt -m 50 -o testNvsRSP -l 500000" Check file with: zcat testYRIvsEU_perind_numers.counts.gz | more
A. Create exon_file from gtf:
~/leafcutter/scripts/gtf_to_exons.R ce10.gtf.gz ce10.exons.txt.gz
B. create groups file, only 2 groups allowed right now:
RNA_STAR_ctrl_1.bam N2
RNA_STAR_ctrl_2.bam N2
RNA_STAR_ctrl_3.bam N2
RNA_STAR_ctrl_4.bam N2
RNA_STAR_rsp7_1.bam rsp7
RNA_STAR_rsp7_2.bam rsp7
RNA_STAR_rsp7_3.bam rsp7
RNA_STAR_rsp7_4.bam rsp7
C. run R ds script
~/leafcutter/scripts/leafcutter_ds.R -i 4 --num_threads 4 --exon_file=ce10.exons.rmempty.txt.gz testNvsRSP_perind_numers.counts.gz groups_file1.txt
D. plot splice junctions
~/leafcutter/scripts/ds_plots.R -e ce10.exons.rmempty.txt.gz testNvsRSP_perind_numers.counts.gz groups_file1.txt leafcutter_ds_cluster_significance.txt -f 0.05
Then again for another dataset: counts: "python ~/leafcutter/clustering/leafcutter_cluster.py -j glod4_juncfiles.txt -m 50 -o testNvsGLOD4 -l 500000"
B. create groups file, only 2 groups allowed right now:
RNA_STAR_ctrl_1.bam N2
RNA_STAR_ctrl_2.bam N2
RNA_STAR_ctrl_3.bam N2
RNA_STAR_ctrl_4.bam N2
RNA_STAR_glod4_1.bam GLOD4
RNA_STAR_glod4_2.bam GLOD4
RNA_STAR_glod4_3.bam GLOD4
RNA_STAR_glod4_4.bam GLOD4
C. run R ds script
~/leafcutter/scripts/leafcutter_ds.R -i 4 --num_threads 4 --exon_file=ce10.exons.rmempty.txt.gz testNvsGLOD4_perind_numers.counts.gz groups_file2.txt
D. plot splice junctions
~/leafcutter/scripts/ds_plots.R -e ce10.exons.rmempty.txt.gz testNvsGLOD4_perind_numers.counts.gz groups_file2.txt leafcutter_ds_cluster_significance.txt -f 0.05
You can now run your shiny app according to the instructions from http://davidaknowles.github.io/leafcutter/articles/Visualization.html with the exception that you need to use X-windows (bottom section).
generate annotation files: ~/leafcutter/leafviz/gtf2leafcutter.pl -o ce10 ce10.gtf.gz
prepare .Rdata file ~/leafcutter/leafviz/prepare_results.R --meta_data_file groups_file1.txt --code leafcutter testNvsRSP_perind_numers.counts.gz rsp7_cluster_significance.txt rsp7_effect_sizes.txt ce10
cd ~/leafcutter/leafviz/ ./run_leafviz.R /mnt/d/20180718_RNAseq/DataProcessed/BAM/leafviz.Rdata
Go to shinyapps.io to setup your account. http://docs.rstudio.com/shinyapps.io/getting-started.html#deploying-applications
Go to the command line in WSL and start R, then type install.packages('rsconnect')
Put in your secret key from shinyapps.io rsconnect::setAccountInfo(name='[your username]', token='[your token]', secret='[insert your key]')
Then move the .Rdata file you created to the same directory as leafcutter/leafviz/
cp /mnt/d/RNAseq/myresults.Rdata ~/leafcutter/leafviz/
and change the line in server.R near the top that is commented out that starts with "load" to be the local path to your file (it must be in your directory now):
nano ~/leafcutter/leafviz/server.R
load("myresults.Rdata")
Then type control+x to exit, and type y and "enter to save modified buffer and hit enter to save.
Next edit your run_leafviz.R file using
nano ~/leafcutter/leafviz/run_leafviz.R
At the top above the "library(shiny, quietly=TRUE)" line, add the following line: library(rsconnect) Use the cursor to scroll all the way to the bottom of the file, add the # symbol before the "print", "load", and " shiny::runApp()" statements. Add the line: rsconnect::deployApp() Type control+X again to exit, and this time change the name to "deploy_leafviz.R"
Then from the ~/leafcutter/leafviz/ directory and type:
./deploy_leafviz.R
It should print several statements with "Application successfully deployed" near the end. Go back to your web browser and shinyapps.io account and see if the app is deployed with your data.
WSL does not natively support Graphical User Interfaces (GUIs) such as RStudio, but there is a work-around so that these programs can be used. The simplest way I have found is to install Mobaxterm which has native X-forwarding and does not require any additional configuration. Other terminal emulators with X-forwarding also exist (e.g. ConEmu). If using such terminal, running RStudio is as simple as:
If you wish to stick with using the standard WSL terminal, then first you need to download and install an X server such as Xming or VcXsrv on windows. Once launched, this will then run in the background, and provide a fully functioning X-Windows system. You just need to tell programs that launch from the bash shell where to send their display by setting the DISPLAY variable:
nano ~/.bashrc
The above command will open .bashrc
in nano and you can scroll to the end of the file and write
export DISPLAY=:0.0
Save the modified file by pressing CTRL+X
and answering Y when asked if you want to save the file. Close and restart the console window or source the modified file using the command:
source ~/.bashrc
Now, open the display server by launching XLaunch from the windows start menu. Choose "One large window" or "One large window without titlebar" and set the "display number" to 0. Leave other settings as default and finish the configuration. Once setup, running a GUI-based program is as simple as starting XLaunch alongside WSL and then executing the program.