Skip to content

Methods for finding matched random controls (in silico) for integration sites obtained via restriction enzymes

License

Notifications You must be signed in to change notification settings

chasberry/integration-site-MRCs

Repository files navigation

Integration Site MRCs

An R package implementing methods for finding in silico matched random controls (MRCs) for integration sites obtained via restriction enzymes is provided.

Details of the biological setting, descriptions of experimental methods, links to other software, and examples of research applications can be found at the Bushman Lab Web Site.

In brief, retroviruses do their work by integrating a construct into a host's genome. Finding the location of such an integration site is sometimes carried out by cleaving both the integrated DNA (at a known site) and the host DNA at a restriction enzyme recognition site. The resulting fragment is processed (steps omitted here) and the host DNA of the fragment is sequenced and mapped to the host genome to determine its location. This process depends on the juxtaposition of restriction recognition site to integration site and favors recovery of fragments that are long enough to yield mappable sequence and short enough for processing (via PCR amplification) to work efficiently. This favoritism generates a bias than needs to be accounted for in analysis. One method for dealing with the bias is to generate matched random controls that share the bias, so comparisons of integration sites to controls will not be confounded by that bias.

The restrSiteUtils R package provides methods for handling genomic sequence data to map the locations of restriction sites, to find the distance for an integration site to its nearest restriction site (in the proper direction), and to draw genomic sites whose distances to the comparable restriction site match those of an integration site.

Files Provided

File Description
`README.md` Background
`examples/mrc-example.pdf` Finished Example
`examples/mrc-example.Rnw` Example knitr Document
`examples/mrc-example.R` R code
`examples/sites.df.csv` Integration Site Dataset
`restrSiteUtils_1.2.*.tar.gz` R Package Source Code

The example Directory

An example is presented of the use of the restrSiteUtils package to create a mapping of restriction sites and use that mapping to find matched random controls. It is generated by a knitr document. Users of the package are encouraged to run the document to verify that their computer meets the prerequisites and see a simple instance of the application of MRCs to data analysis.

Prerequisites

The current release of R is almost certainly needed. So update your release of R or install R as appropriate.

Several Bioconductor Packages must be installed:

  • GenomicRanges
  • BSgenome
  • GenomeInfoDb
  • Biostrings
  • IRanges
  • S4Vectors
  • BSgenome.Hsapiens.UCSC.hg18

Installing these packages will likely install some other packages on which they depend. Follow the instructions at

http://www.bioconductor.org/install/

and then run installed.packages() in R to see what packages were installed. Add any pacakges listed above that were not installed. If you have an existing installation be sure is it up to date by following the instructions on the installation page.

Once that has been done, install restrSiteUtils. To do this, download the restrSiteUtils_x.y.z.tar.gz file (where x.y.z indicates the version, like 1.2.3). On unix alike system, this is most easily done be running as command like

R CMD INSTALL restrSiteUtils_x.y.z.tar.gz

where x.y.z is the version number. The install.packages function can be used in an R session on any system. See the help pages for INSTALL and install.packages for additional information such as installing to a user directory.

The knitr package was used to process mrc-example.Rnw. The ability to convert *.tex files to *.pdf files is needed to render the pdf. The R code can be run without creating a document by downloading mrc-example.R using purl(mrc-example.Rnw) to extract the code. The code can be run in an R session or by executing

R CMD BATCH mrc-example.R

Running the Example

To run the example, the file sites.df.csv must be placed in the same directory as mrc-example.Rnw (or mrc-example.R if that is to be run). If a local directory is used for installing packges, that must be specified in .libPaths(...). If not (or if environment variables are used to specify the library directory), then that line should be removed.

With the knitr R package installed,

R CMD Sweave mrc-example.Rnw

will run the example and create mrc-example.tex upon which a pdf enabled version of LaTeX can be run to produce a mrc-example.pdf.

With or without knitr,

R CMD BATCH mrc-example.R

will run the code. Either command will install the package restrEnz.Hsapiens.UCSC.hg18.RENZ.6.CUTTER.

License and Acknowledgement

This document and the software it contains are copyrighted as of 2015, by Charles C. Berry and offered for use under the GPL-3 license, see http://www.gnu.org/licenses/ for details on that license.

The development of code and this document was supported by the National Institutes of Health (2R01 AI052845 and 5R01 AI082020).

The sites.df.csv file contains data from

Schroeder, Astrid RW, et al. "HIV-1 integration in the human genome favors active genes and local hotspots." Cell 110.4 (2002): 521-529.

About

Methods for finding matched random controls (in silico) for integration sites obtained via restriction enzymes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published