A question on COG annotation - Fewer number of CDSs seems to be assigned to COGs when compared to NCBI CD-Search #350

ilnamkang · 2024-12-05T07:05:09Z

Hi,

Thanks for a great tool!

I've compared COG annotation of Bakta with NCBI CD-Search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) results for a complete bacterial genome with 1,360 CDSs.

NCBI CD-Search assigned COGs to ~1,200 CDSs, while Bakta assigned COGs to only ~150 CDSs.
(Note that I just used the grep command for tsv, gff3, and json files from Bakta, to check the number of CDSs having COG annotation.)

Is this difference usual and/or expected?

Thanks.

oschwengers · 2024-12-05T09:29:25Z

Hi and thanks a lot for asking.

Yes, this is to some extend expected because of two things:

NCBI's CD-Search service uses PSSMs to search against your query protein sequences. In general, this is a more sensitive search than Bakta's default (DIAMOND fast).
Thanks to your question, I realized that there is a COG 2024 update adding ~140 COG clusters compared to the 2020 update that Bakta uses.

So, for the next Bakta database update we will definitely use the novel 2024 COG update surely adding a couple good annotations. Also, we will think about how to improve the sensitivity during our pre-annotation of Bakta's db.

But, until then, you could try Bakta's new feature of accepting user-provided HMM models (if you have some).

ilnamkang · 2024-12-05T10:50:16Z

Thank you for a detailed explanation.

In my humble opinion, using the COG annotation provided by the eggNOG pipeline might be one of the options to improve COG annotation quality, if you incorporate the eggNOG in Bakta in future (#325).

Thanks.

oschwengers added the question Further information is requested label Dec 5, 2024

ilnamkang closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on COG annotation - Fewer number of CDSs seems to be assigned to COGs when compared to NCBI CD-Search #350

A question on COG annotation - Fewer number of CDSs seems to be assigned to COGs when compared to NCBI CD-Search #350

ilnamkang commented Dec 5, 2024

oschwengers commented Dec 5, 2024

ilnamkang commented Dec 5, 2024

A question on COG annotation - Fewer number of CDSs seems to be assigned to COGs when compared to NCBI CD-Search #350

A question on COG annotation - Fewer number of CDSs seems to be assigned to COGs when compared to NCBI CD-Search #350

Comments

ilnamkang commented Dec 5, 2024

oschwengers commented Dec 5, 2024

ilnamkang commented Dec 5, 2024