-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error when using an external clustering #278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the documentation, we do not discuss the case where we provide 3 columns where it is "family_id gene_id fragmentation" in the doc. Is this voluntary and we have it only to support the old format, or do we want users to use this possibility of giving external clusterings with the info of fragmentation, but without the info of family representative?
@@ -447,11 +498,19 @@ def read_clustering(pangenome: Pangenome, families_tsv_path: Path, infer_singlet | |||
:param disable_bar: Allow to disable progress bar | |||
""" | |||
check_pangenome_former_clustering(pangenome, force) | |||
check_pangenome_info(pangenome, need_annotations=True, need_gene_sequences=True, disable_bar=disable_bar) | |||
|
|||
if pangenome.status["geneSequences"] == "No": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're in a "read external cluster' case, shouldn't we just not load gene sequences even if there are some?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to add protein sequences of representative genes at the end.
It's used when context and fasta commands are apply to the pangenome I believe.
Since pangenomes made with internal clustering already contain these sequences, I guess it is better to also have them in pangenomes generated with external clustering when it is possible to ensures that both types of pangenomes behave consistently...
<path>/ppanggolin/cluster/cluster.py:440: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
I updated the documentation to clarify that we are also taking cluster file made of 3 columns with : famille, gene and fragmentation. |
This PR addresses a few issues that users encountered when running the cluster command with an external cluster file.
Fixes made: