Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Char fields added as Unicode not string #268

Closed
jeromekelleher opened this issue Jul 5, 2024 · 1 comment
Closed

Char fields added as Unicode not string #268

jeromekelleher opened this issue Jul 5, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jeromekelleher
Copy link
Contributor

Per the spec, Char fields should have dtype |S1, but we are currently outputting "<U1", e.g.

cat field_type_combos.vcf.vcz/variant_IC1/.zarray 
{
    "chunks": [
        10000
    ],
    "compressor": {
        "blocksize": 0,
        "clevel": 7,
        "cname": "zstd",
        "id": "blosc",
        "shuffle": 0
    },
    "dimension_separator": "/",
    "dtype": "<U1",
    "fill_value": null,
    "filters": null,
    "order": "C",
    "shape": [
        208
    ],
    "zarr_format": 2
}
@jeromekelleher jeromekelleher added the bug Something isn't working label Jul 5, 2024
jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Jul 9, 2024
Change U1 to S1 as per spec

Closes sgkit-dev#268
@jeromekelleher
Copy link
Contributor Author

Ah yes - it's not actually clear we want to do this: sgkit-dev/vcf-zarr-spec#14

I think it would be simpler if we standardised on UTF8 going forward, so I'm going to close this as a "wonfix"

@jeromekelleher jeromekelleher closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant