Family Threshold #307

ShuyiF · 2024-12-19T17:10:42Z

Hi!

Thank you for developing this efficient tool!

I was using PPanGGOLiN to build pangenome. I'm wondering is there any parameter related to 'protein family sequence identity threshold' I could tune, or what's the default setting for it? Compared with other pangenome tools, PPanGGOLiN gives me a higher number of gene families. So I am thinking maybe it is caused by the higher 'family threshold' setting in PPanGGOLiN?

Looking forward to your reply!

Thank you!

JeanMainguy · 2025-01-07T15:40:56Z

Hi,

Thank you for your kind words and for using PPanGGOLiN!

Yes, there are parameters you can adjust related to the protein family clustering threshold. In both the ppanggolin all command and the ppanggolin cluster command (if you're running a step-by-step analysis), you’ll find these options:

  --coverage COVERAGE   Minimal coverage of the alignment for two proteins to be in the same cluster. Default: 0.8  
  --identity IDENTITY   Minimal identity percentage for two proteins to be in the same cluster. Default: 0.8

By default, both the identity and coverage thresholds are set to 0.8, meaning 80% identity and 80% coverage. These values control the clustering process, which is performed using MMseqs2.

On top of that, PPanGGOLiN includes a defragmentation step after the initial clustering. This step helps to reassign fragmented genes to existing families, preventing them from being clustered alone. Check out the documentation for more detail on this step: https://ppanggolin.readthedocs.io/en/latest/user/PangenomeAnalyses/pangenomeAnalyses.html#defragmentation

The higher number of families may simply be due to differences in clustering strategies between the tools. By the way, which tool did you compare PPanGGOLiN with? Perhaps @ggautreau has additional insight on this topic.

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Family Threshold #307

Family Threshold #307

ShuyiF commented Dec 19, 2024

JeanMainguy commented Jan 7, 2025

Family Threshold #307

Family Threshold #307

Comments

ShuyiF commented Dec 19, 2024

JeanMainguy commented Jan 7, 2025