Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set automatically coerce_shape for xarray_image #104

Open
acocac opened this issue Jun 9, 2021 · 3 comments
Open

set automatically coerce_shape for xarray_image #104

acocac opened this issue Jun 9, 2021 · 3 comments

Comments

@acocac
Copy link

acocac commented Jun 9, 2021

I have a local directory with GeoTIFF files with different shapes. I've explored the coerce_shape parameter to define manually a certain shape. I'm wondered if there's a workaround to coerce all images according to the largest shape in the directory instead of defining it manually. The following lines show how I define the catalog:

[...]
sources:
  test1:
    driver: xarray_image
    args:
      urlpath: '{{ CATALOG_DIR }}/data/*.tif'
      coerce_shape: [400,400]
[...]
@martindurant
Copy link
Member

Intake is not currently able to automatically investigate a set of data sources to derive a value for using in further data sources.

Two possible future routes that could implement the idea:

  • a new catalog type that has the ability to introspect a set of data prescriptions and update those prescriptions dynamically
  • extend the transforms idea to accept multiple data sources and go from there

@acocac
Copy link
Author

acocac commented Jun 9, 2021

Thanks @martindurant for pointing the possible future routes. Both are valid for me.

When you say a new catalog able to instrospect a set of data, do you have any specific example?

It would be great to implement a lazy operation to retrieve image size e.g. PIL's Image.open (see here). However, I am not sure how effective this operation might result for a catalog with million of images.

@martindurant
Copy link
Member

When you say a new catalog able to instrospect a set of data

Not really, this would be a new model. Catalogues have access to their child data sources of course, but it is not the normal pattern to try to access their internal metadata. As you say, this might be expensive. There are, however, lazy catalogues, where entries (the objects that make sources) are only created on request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants