Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VariantDatasetCombiner documentation out of date? #14759

Closed
Maarten-vd-Sande opened this issue Nov 27, 2024 · 0 comments · May be fixed by #14760
Closed

VariantDatasetCombiner documentation out of date? #14759

Maarten-vd-Sande opened this issue Nov 27, 2024 · 0 comments · May be fixed by #14760
Labels
needs-triage A brand new issue that needs triaging.

Comments

@Maarten-vd-Sande
Copy link

Maarten-vd-Sande commented Nov 27, 2024

What happened?

When running the code example of the VariantDatasetCombiner, I can succesfully execute the combiner.run() code. The last line of the example, is not correct anymore: vds = hl.read_vds('gs://bucket/dataset.vds'). This should I think be turned into a hl.vds.read_vds call.

https://hail.is/docs/0.2/vds/hail.vds.combiner.VariantDatasetCombiner.html

** ignore all below, my gvcfs were not correctly formatted!! **

Nevertheless, when I replace it, I get the error:

annotate_globals: keyword argument 'ref_block_max_length': expected expression of type any, found NoneType: None

Not sure if it is related to the fact that I have a non-human genome?

This is the script I run:

intervals = []
interval_size = 10_000_000

# loop over each chromosome and its length
for chrom, length in chrom_lengths.items():
    # create intervals in steps of 10,000,000 bases
    for start in range(1, length + 1, interval_size):
        end = min(start + interval_size - 1, length)
        interval = hl.Interval(
            hl.Locus(chrom, start, reference_genome=dm6_ref),
            hl.Locus(chrom, end, reference_genome=dm6_ref)
        )
        hl.eval(interval)
        intervals.append(interval)

gvcfs = [
    'vcfs/0.g.vcf.gz',
    'vcfs/1.g.vcf.gz',
]

combiner = hl.vds.new_combiner(
    output_path='combined_vcfs.vds',
    temp_path='/tmp',
    gvcf_paths=gvcfs,
    reference_genome=dm6_ref,
    use_exome_default_intervals=False,
    use_genome_default_intervals=False,
    intervals=intervals,
)

combiner.run()  # <-- still works

vds = hl.vds.read_vds(  # <-- doesn't work!
    'combined_vcfs.vds',
)

Not sure if I misunderstand how to work with a non-human reference genome and the intervals?

Version

version 0.2.133-4c60fddb171a

Relevant log output

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage A brand new issue that needs triaging.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant