Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize the CAMI Binning Benchmark Workflow #27

Open
7 tasks
paulzierep opened this issue Jan 16, 2025 · 4 comments
Open
7 tasks

Finalize the CAMI Binning Benchmark Workflow #27

paulzierep opened this issue Jan 16, 2025 · 4 comments
Assignees
Labels

Comments

@paulzierep
Copy link
Contributor

paulzierep commented Jan 16, 2025

  • Update Tools
  • Update DBs
  • Add MaxBin2
  • Clean GTDB2NCBI-TaxID Sub-workflow (IWC). Output should be a table (bin2taxid-map) containing Bin ID, GTDB name, NCBI-Name, NCBI-TaxID. This should include Archaea and Bacteria. In fact, the output of GTDB summary should be extended with these columns. This is also useful for the main workflow.
  • Clean Biobox-Add-TaxID tool. A simple tool using the created table. That updates the biobox with another column. The tool should be able to work with bin2taxid-map, but also with contig2taxid (which could be generated with kraken2).
  • Taxonomic binning should be optional.

Optional:

  • CAT2NCBI-TaxID workflow, generating an identical output as GTDB2NCBI-TaxID Sub-workflow (bin2taxid-map). Compliant with the Biobox-Add-TaxID tool.
@paulzierep paulzierep changed the title Finalize the CAMI Binning Workflow Finalize the CAMI Binning Benchmark Workflow Jan 16, 2025
@SantaMcCloud
Copy link
Collaborator

Clean GTDB2NCBI-TaxID Sub-workflow (IWC). Output should be a table (bin2taxid-map) containing Bin ID, GTDB name, NCBI-Name, NCBI-TaxID. This should include Archaea and Bacteria. In fact, the output of GTDB summary should be extended with these columns. This is also useful for the main workflow.

I can rework my script for this which can modify the inputed GTDB summary file(s) and to be able to give the bin2taxid-map file. Just to be clear the modifieng and the creating of a bin2taxid-map file should work together with the modifing the biobox file or should the scrip either do one thing at the time?

Clean Biobox-Add-TaxID tool. A simple tool using the created table. That updates the biobox with another column. The tool should be able to work with bin2taxid-map, but also with contig2taxid (which could be generated with kraken2).

I also can change my scrip to work with like this or should there be a new scrip/tool make this possible?

@paulzierep
Copy link
Contributor Author

Clean GTDB2NCBI-TaxID Sub-workflow (IWC). Output should be a table (bin2taxid-map) containing Bin ID, GTDB name, NCBI-Name, NCBI-TaxID. This should include Archaea and Bacteria. In fact, the output of GTDB summary should be extended with these columns. This is also useful for the main workflow.

I can rework my script for this which can modify the inputed GTDB summary file(s) and to be able to give the bin2taxid-map file. Just to be clear the modifieng and the creating of a bin2taxid-map file should work together with the modifing the biobox file or should the scrip either do one thing at the time?

I would prefer to have one workflow that generates the bin2taxid-map file, I think the tools you added NCBI-GTDB map and Name2taxid are enough. You just need to collapse the Archaea and Bacteria files of the collection (https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fnml%2Fcollapse_collections%2Fcollapse_dataset%2F5.1.0&version=latest) output of GTDB, then combine the outputs of NCBI-GTDB map and Name2taxid with the summary file. https://usegalaxy.eu/?tool_id=Paste1&version=latest should work for this.

Modifying the biobox file should be an additional step with one tool as described below.

Clean Biobox-Add-TaxID tool. A simple tool using the created table. That updates the biobox with another column. The tool should be able to work with bin2taxid-map, but also with contig2taxid (which could be generated with kraken2).

I also can change my scrip to work with like this or should there be a new scrip/tool make this possible?

You can change your script to only use the bin2taxid-map and biobox as input. But then I would prefer, that the updated tool, excludes the previous options, its was too custom and complicated. But it can also be an additional tool up to you.

@SantaMcCloud
Copy link
Collaborator

SantaMcCloud commented Jan 20, 2025

I would prefer to have one workflow that generates the bin2taxid-map file, I think the tools you added NCBI-GTDB map and Name2taxid are enough. You just need to collapse the Archaea and Bacteria files of the collection (https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fnml%2Fcollapse_collections%2Fcollapse_dataset%2F5.1.0&version=latest) output of GTDB, then combine the outputs of NCBI-GTDB map and Name2taxid with the summary file. https://usegalaxy.eu/?tool_id=Paste1&version=latest should work for this.

Modifying the biobox file should be an additional step with one tool as described below.

Here is a sub-workflow which generated a binid2taxid file and modify the GTDB-Tk file:
Workflow: https://usegalaxy.eu/u/santinof/w/gtdb2ncbi-taxid-sub-workflow
Example history: https://usegalaxy.eu/u/santinof/h/gtdb2ncbi-taxid-sub-workflow-example

You can change your script to only use the bin2taxid-map and biobox as input. But then I would prefer, that the updated tool, excludes the previous options, its was too custom and complicated. But it can also be an additional tool up to you.

I did change my script to work with either kraken2 output or the output from the sub-workflow above.
https://github.com/SantaMcCloud/biobox_add_taxid

@SantaMcCloud
Copy link
Collaborator

SantaMcCloud commented Jan 25, 2025

Update Tools

Current list of all tools used in the Workflow with their version:

Name Current Galaxy Version Available Version Wrapper updated Notes
Bowtie2 2.5.3 2.5.4
  • has an open PR but it seems that they have to adjust Galaxy a bit to make this update
    Fairy 0.5.7 0.5.7 -  
    CAMI AMBER 2.0.7 2.0.7 -  
    CONCOCT 1.1.0 1.1.0 -  
    MaxBin2 2.2.7 ? ? The page is missing which is linked in IUC
    DAS Tool 1.1.7 1.17 -  
    MetaBAT2 2.15 (updated to 2.17) 2.17 is merged
    SemiBin 2.0.2 2.1.0
  • has an open PR and somebody is working on it. The last commit was pushed around 2 weeks and it still fails. Depend on how fast this should be done i can finish this PR
    GTDB-Tk 2.4.0 2.4.0 -  
    NCBI-GTDB map 0.1.9 0.1.9 -  
    Name2Taxid 0.18.0 0.18.0 -  
    Biobox_add_taxid 0.6 1.0.0
  • My Tool will be changed when approved by Paul

    Update DBs

    Current list of all DBs used in the Workflow with their version plus the DB from Kraken2:

    Name Current Galaxy Version Available Version DM updated if needed Notes
    Kraken2 2024-09-04 2024-12-28 The newest version was merged. Any admin can downlaod the newest Dbs when galaxy is updated
    NCBI 2024-06-05 2025-01-25 - The DM always pulls the newest version -> any admin can do this update anytime
    SemiBin2 ? ? ? I think it is the latest version but it is hard to tell since i can not access the table and there is no date written in the name
    GTDB-Tk 220 220 -  
    NCBI-GTDB map 220 220 -  

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants