ResearchMate

Pubmed-based collaborators' recommendation system: suggesting potential collaborators within the biomedical scientific community, based on the past collaborations' network and the reserach topics of each author.

That's my final graduation project for the Data Science Retreat ML bootcamp. The goal was to create a preliminary recommendation system that will find good matches of authors, candidates for future collaboration, based on the past collaborations' network and the research topics of each author.

Due to time and resources constraints, I opted to work on a limited dataset of authors (500), all of whom have at some point published a paper with Jennifer Doudna ('Doudna JA'), a famous genetic engineering scientist & Nobel prize recipient. The papers and their info were collected from PubMed and whenever the topic was lacking as a form of keywords, I generated it based on the abstract and title using GPT-3 API (paid). Then, according to their contribution in each paper I was able to figure out the main research topics of each author. Alongside, a graph based on their past contributions was created and the node embeddings were generated. Combining the word embeddings from the topics and the node embeddings from the past collaborations, I created a similarity matrix (pairwise) of the authors. This serves as an indirect prediction/indication for future collacborations ('the more 2 authors are similar, the more likely it is they will -and should- collaborate'). At the end, I used Flask to deploy it for showcasing purposes.

Disclaimer: This tool should be used as a starting point for exploring potential research partners, rather than as a definitive guide for selecting collaborators. The main goal was to make myself familiar with various ML techniques and not build a complete guide. The recommendations might be biased towards more established researchers with a larger number of publications and collaborations, which could inadvertently overlook early-career researchers or researchers from underrepresented groups. I plan to introduce fainess and diversity metrics in the future to mitigate such biases and avoid homophily. This is an initial approach which will hopefully be further developed in the future into a stronger recommendation system taking into account more parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
1_0_Authors_Selection.ipynb		1_0_Authors_Selection.ipynb
2_0_Keywords_Extraction.ipynb		2_0_Keywords_Extraction.ipynb
4_0_Node_Embeddings_757authors.ipynb		4_0_Node_Embeddings_757authors.ipynb
5_0_Combined_Embeddings.ipynb		5_0_Combined_Embeddings.ipynb
6_0_Cosine_Similarity.ipynb		6_0_Cosine_Similarity.ipynb
README.md		README.md
app.py		app.py
slides.pdf		slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResearchMate

Pubmed-based collaborators' recommendation system: suggesting potential collaborators within the biomedical scientific community, based on the past collaborations' network and the reserach topics of each author.

About

Releases

Packages

Languages

lavou/ResearchMate

Folders and files

Latest commit

History

Repository files navigation

ResearchMate

Pubmed-based collaborators' recommendation system: suggesting potential collaborators within the biomedical scientific community, based on the past collaborations' network and the reserach topics of each author.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages