An R package implementing methods for finding in silico matched random controls (MRCs) for integration sites obtained via restriction enzymes is provided.
Details of the biological setting, descriptions of experimental methods, links to other software, and examples of research applications can be found at the Bushman Lab Web Site.
In brief, retroviruses do their work by integrating a construct into a host's genome. Finding the location of such an integration site is sometimes carried out by cleaving both the integrated DNA (at a known site) and the host DNA at a restriction enzyme recognition site. The resulting fragment is processed (steps omitted here) and the host DNA of the fragment is sequenced and mapped to the host genome to determine its location. This process depends on the juxtaposition of restriction recognition site to integration site and favors recovery of fragments that are long enough to yield mappable sequence and short enough for processing (via PCR amplification) to work efficiently. This favoritism generates a bias than needs to be accounted for in analysis. One method for dealing with the bias is to generate matched random controls that share the bias, so comparisons of integration sites to controls will not be confounded by that bias.
The restrSiteUtils
R package provides methods for handling genomic
sequence data to map the locations of restriction sites, to find the
distance for an integration site to its nearest restriction site (in
the proper direction), and to draw genomic sites whose distances to the
comparable restriction site match those of an integration site.
File | Description |
---|---|
`README.md` | Background |
`examples/mrc-example.pdf` | Finished Example |
`examples/mrc-example.Rnw` | Example knitr Document |
`examples/mrc-example.R` | R code |
`examples/sites.df.csv` | Integration Site Dataset |
`restrSiteUtils_1.2.*.tar.gz` | R Package Source Code |
An example is presented of the use of the restrSiteUtils
package to
create a mapping of restriction sites and use that mapping to find
matched random controls. It is generated by a knitr
document. Users
of the package are encouraged to run the document to verify that their
computer meets the prerequisites and see a simple instance of the
application of MRCs to data analysis.
The current release of R is almost certainly needed. So update your release of R or install R as appropriate.
Several Bioconductor Packages must be installed:
- GenomicRanges
- BSgenome
- GenomeInfoDb
- Biostrings
- IRanges
- S4Vectors
- BSgenome.Hsapiens.UCSC.hg18
Installing these packages will likely install some other packages on which they depend. Follow the instructions at
http://www.bioconductor.org/install/
and then run installed.packages()
in R to see what packages were
installed. Add any pacakges listed above that were not installed. If
you have an existing installation be sure is it up to date by
following the instructions on the installation page.
Once that has been done, install restrSiteUtils
. To do this,
download the restrSiteUtils_x.y.z.tar.gz
file (where x.y.z
indicates the version, like 1.2.3
). On unix alike system, this is
most easily done be running as command like
R CMD INSTALL restrSiteUtils_x.y.z.tar.gz
where x.y.z
is the version number. The install.packages
function
can be used in an R session on any system. See the help pages for
INSTALL
and install.packages
for additional information such as
installing to a user directory.
The knitr
package was used to process mrc-example.Rnw
. The ability
to convert *.tex
files to *.pdf
files is needed to render the
pdf. The R code can be run without creating a document by downloading
mrc-example.R
using purl(mrc-example.Rnw)
to extract the code. The
code can be run in an R session or by executing
R CMD BATCH mrc-example.R
To run the example, the file sites.df.csv
must be placed in the same
directory as mrc-example.Rnw
(or mrc-example.R
if that is to be
run). If a local directory is used for installing packges, that must
be specified in .libPaths(...)
. If not (or if environment variables
are used to specify the library directory), then that line should be
removed.
With the knitr
R package installed,
R CMD Sweave mrc-example.Rnw
will run the example and create mrc-example.tex
upon which a pdf
enabled version of LaTeX can be run to produce a mrc-example.pdf
.
With or without knitr
,
R CMD BATCH mrc-example.R
will run the code. Either command will install the package
restrEnz.Hsapiens.UCSC.hg18.RENZ.6.CUTTER
.
This document and the software it contains are copyrighted as of 2015, by Charles C. Berry and offered for use under the GPL-3 license, see http://www.gnu.org/licenses/ for details on that license.
The development of code and this document was supported by the National Institutes of Health (2R01 AI052845 and 5R01 AI082020).
The sites.df.csv
file contains data from
Schroeder, Astrid RW, et al. "HIV-1 integration in the human genome favors active genes and local hotspots." Cell 110.4 (2002): 521-529.