Construct a model to predict protein-protein interactions using RNA-seq data by identifying co-expressed gene groups and integrating with proteomics data.
There is an overwhelming amount of public RNA sequencing data available online for free. Since RNA-seq is generally much cheaper than mass spectrometry, if we are interested in predicting the behavior of proteins, we may wish to do so directly from RNA-seq data.
The purpose of this project is to construct a model and pipeline that identifies co-expressed groups of genes under particular conditions at the mRNA level, and integrates this with proteomics data to learn protein-protein interactions from mRNA co-expression and other relevant features.
Participants are encouraged to use any public sequencing data they find suitable, along with any quantitative or computational approaches. You may find it helpful to use GTEx data, as well as data from STRING and BioGRID.
Provide instructions on how to install and set up the project, such as installing dependencies and preparing the environment.
# Example command to install dependencies (Python)
pip install project-dependencies
# Example command to install dependencies (R)
install.packages("project-dependencies")
Provide a basic usage example or minimal code snippet that demonstrates how to use the project.
# Example usage (Python)
import my_project
demo = my_project.example_function()
print(demo)
# Example usage (R)
library(my_project)
demo <- example_function()
print(demo)
Add detailed information and examples on how to use the project, covering its major features and functions.
# More usage examples (Python)
import my_project
demo = my_project.advanced_function(parameter1='value1')
print(demo)
# More usage examples (R)
library(demoProject)
demo <- advanced_function(parameter1 = "value1")
print(demo)
Contributions are welcome! If you'd like to contribute, please open an issue or submit a pull request. See the contribution guidelines for more information.
If you have any issues or need help, please open an issue or contact the project maintainers.
This project is licensed under the MIT License.
- Clone the repository. Either use SSH or HTTPS:
git clone [email protected]:hackbio-ca/ppi-prediction-from-rna-seq.git
git clone https://github.com/hackbio-ca/ppi-prediction-from-rna-seq.git
- Making a branch and implementing changes:
git checkout -b your-name
Use git status
to check you're on the correct branch. You do not want to implement any features on the main
branch.
git add filename
git commit -m "Changes to filename"
git push origin your-name
-
Make a pull request. Navigate to your branch on the github repository page, and click Compare & pull request. Make sure you compare
main
and theyour-name
branch. -
The pull request will be reviewed and hopefully accepted. At this point, github will tell you that the
your-name
branch can be safely deleted. -
Navigate back to your local
main
branch and update, then cleanup.
git checkout main
git pull origin main
git branch -d your-name
- Upon implementing a new feature, simply repeat from step 2.
Publication | Cell line |
---|---|
Johnson et al., (2021) 1 | HEK293T |
Johnson et al., (2021) 1 | Jurkat |
Johnson et al., (2021) 1 | HUVEC |
Huttlin et al., (2021) 2 | HEK293T |
Huttlin et al., (2021) 2 | HCT116 |
Göös et al., (2022) 3 | HEK293 |
Khoroshkin et al., (2024) 4 | K562 |
Banks et al., (2014) 5 | HEK293 |
Footnotes
-
Johnson, K. L. et al. Revealing protein-protein interactions at the transcriptome scale by sequencing. Molecular Cell 81, 3877 (2021). ↩ ↩2 ↩3
-
Huttlin, E. L. et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 184, 11 (2021). ↩ ↩2
-
Göös, H. et al. Human transcription factor protein interaction networks. Nature Communications 13, (2022). ↩
-
Khoroshkin, M. et al. Systematic identification of post-transcriptional regulatory modules. Nature Communications 15, (2024). ↩
-
Banks, C. et al. Controlling for Gene Expression Changes in Transcription Factor Protein Networks. Molecular & Cellular Proteomics 13, (2014). ↩