Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Unable to synchronously open file (file signature not found) #96

Open
1 task done
ashbate opened this issue Aug 3, 2024 · 18 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@ashbate
Copy link

ashbate commented Aug 3, 2024

Bug Report

Description

When bm.extract() or bm.raster() methods are used, it can not generate the data.

Reproducibility

  • The bug is reproducible.

Steps to Reproduce

Calling the methods on jupyter notebook produces this error. I tried both on my computer and google colab. It looks like it is an OS error related to h5py


OSError                                   Traceback (most recent call last)
Cell In[16], line 2
      1 # f.close()
----> 2 ntl_r = bm_raster(
      3     continental_us,
      4     product_id="VNP46A2",
      5     date_range="2023-01-01",
      6     bearer=bearer,
      7     variable="Gap_Filled_DNB_BRDF-Corrected_NTL",
      8 )

File /opt/anaconda3/lib/python3.11/site-packages/pydantic/validate_call_decorator.py:60, in validate_call.<locals>.validate.<locals>.wrapper_function(*args, **kwargs)
     58 @functools.wraps(function)
     59 def wrapper_function(*args, **kwargs):
---> 60     return validate_call_wrapper(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/pydantic/_internal/_validate_call.py:96, in ValidateCallWrapper.__call__(self, *args, **kwargs)
     95 def __call__(self, *args: Any, **kwargs: Any) -> Any:
---> 96     res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
     97     if self.__return_pydantic_validator__:
     98         return self.__return_pydantic_validator__(res)

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:355, in bm_raster(gdf, product_id, date_range, bearer, variable, drop_values_by_quality_flag, check_all_tiles_exist, output_directory, output_skip_if_exists)
    351 filenames = _pivot_paths_by_date(pathnames).get(date)
    353 try:
    354     # Open each GeoTIFF file as a DataArray and store in a list
--> 355     da = [
    356         rioxarray.open_rasterio(
    357             h5_to_geotiff(
    358                 f,
    359                 variable=variable,
    360                 drop_values_by_quality_flag=drop_values_by_quality_flag,
    361                 output_directory=d,
    362             ),
    363         )
    364         for f in filenames
    365     ]
    366     ds = merge_arrays(da)
    367     clipped_dataset = ds.rio.clip(
    368         gdf.geometry.apply(mapping), gdf.crs, drop=True
    369     )

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:357, in <listcomp>(.0)
    351 filenames = _pivot_paths_by_date(pathnames).get(date)
    353 try:
    354     # Open each GeoTIFF file as a DataArray and store in a list
    355     da = [
    356         rioxarray.open_rasterio(
--> 357             h5_to_geotiff(
    358                 f,
    359                 variable=variable,
    360                 drop_values_by_quality_flag=drop_values_by_quality_flag,
    361                 output_directory=d,
    362             ),
    363         )
    364         for f in filenames
    365     ]
    366     ds = merge_arrays(da)
    367     clipped_dataset = ds.rio.clip(
    368         gdf.geometry.apply(mapping), gdf.crs, drop=True
    369     )

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:177, in h5_to_geotiff(f, variable, drop_values_by_quality_flag, output_directory)
    174 if variable is None:
    175     variable = VARIABLE_DEFAULT.get(product_id)
--> 177 with h5py.File(f, "r") as h5_data:
    178     attrs = h5_data.attrs
    179     data_field_key = "HDFEOS/GRIDS/VNP_Grid_DNB/Data Fields"

File /opt/anaconda3/lib/python3.11/site-packages/h5py/_hl/files.py:567, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    558     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    559                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    560                      alignment_threshold=alignment_threshold,
    561                      alignment_interval=alignment_interval,
    562                      meta_block_size=meta_block_size,
    563                      **kwds)
    564     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    565                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    566                      fs_page_size=fs_page_size)
--> 567     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    569 if isinstance(libver, tuple):
    570     self._libver = libver

File /opt/anaconda3/lib/python3.11/site-packages/h5py/_hl/files.py:231, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    229     if swmr and swmr_support:
    230         flags |= h5f.ACC_SWMR_READ
--> 231     fid = h5f.open(name, flags, fapl=fapl)
    232 elif mode == 'r+':
    233     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:106, in h5py.h5f.open()

OSError: Unable to open file (file signature not found)

Environment

  • Operating System: macOS, GoogleColab
  • Browser: Google Chrome
  • Application Version/Commit: 2024.8.1

Additional Context

Possible Fix

Initially i moved the project folder to the desktop for possible read&write permission issues. It worked the first run then the error persisted.

@ashbate ashbate added the bug Something isn't working label Aug 3, 2024
@ashbate
Copy link
Author

ashbate commented Aug 3, 2024

It is partially solved. After numerous re-runs i realized it doesn't give me the error for the same .h5 file each run. I tried extracting data for a small country that consisted of 6 tiles, which worked (as i am getting this error for a 17 tile territory). For an hour or so i re-ran my code and eventually it worked.

Maybe the problem is in the httpx timeout parameters i am not really sure at this point.

@Skerre
Copy link

Skerre commented Aug 20, 2024

I have the same issue

@ashbate
Copy link
Author

ashbate commented Aug 22, 2024

I have the same issue

Hi,

I had this issue when I was in Istanbul as well. I think it is somehow connected to your internet speed, like there is a sweet spot where you can download files. I solved it using vpns to control my download speed. Although I am in italy i still get this error 8/10 of every run.

I am considering copying and pasting this library to my local and adjusting the h5py read timeout values.

@Skerre
Copy link

Skerre commented Aug 23, 2024

@ashbate Hi, thanks for your message. Indeed, I am in Istanbul but I use different networks to test this tool. Some of them are literally high speed and through UNDP (enhanced and unrestricted so to say). I will do some more experimentation today and see if I can download another area of interest. The files get downloaded, but they have 0 KB after. Something in the subsequent step seems to go wrong.

@ashbate
Copy link
Author

ashbate commented Sep 4, 2024

@Skerre Hi, i solved it. Basically downloaded the library's files and ran it locally with adding
timeout = httpx.Timeout(15, read=None)
to the top of the download.py and updating this method


def _download_file(
        self,
        name: str,
        skip_if_exists: bool = True,
    ):
        """Download NASA Black Marble file

        Parameters
        ----------
        names: str
             NASA Black Marble filename

        Returns
        -------
        filename: pathlib.Path
            Filename of downloaded data file
        """
        url = f"{self.URL}{name}"
        name = name.split("/")[-1]

        if not (filename := Path(self.directory, name)).exists() or not skip_if_exists:
            with open(filename, "wb+") as f:
                with httpx.stream(

                    "GET",
                    url,
                    headers={"Authorization": f"Bearer {self.bearer}"},
                    timeout=timeout
                ) as response:
                    total = int(response.headers["Content-Length"])
                    with tqdm(
                        total=total,
                        unit="B",
                        unit_scale=True,
                        leave=None,
                    ) as pbar:
                        pbar.set_description(f"Downloading {name}...")
                        for chunk in response.iter_raw():
                            f.write(chunk)
                            pbar.update(len(chunk))

hope it will work for you too. If not try tweaking the timeout value.

@Skerre
Copy link

Skerre commented Sep 5, 2024

@ashbate Dear ashbate,

thank you for the effort of resolving this. However, I tried your solution and it did not work yet. I added the timeout to the beginning of the script (also tried other places, within the functions) and literally replaced my _download function with yours. It does not really seem to make a timeout and goes into the same error as before.

image

image

@Skerre
Copy link

Skerre commented Sep 5, 2024

@ashbate I can share more details on what I am doing if you want

@ashbate
Copy link
Author

ashbate commented Sep 7, 2024

@Skerre Hi skerre, can you check your linkedin please?

@koichisato-dev
Copy link

@ashbate I had the same issue and partially solved it with your modification. In my case, I included donwload options in bm_extract like 'output_directory' and 'output_skip_if_exists=False'. This consumes some of your storage, but it completely works. This information is those who faces the same issue.

@Abdulazizbek
Copy link

Abdulazizbek commented Nov 27, 2024

I'm facing the same issue. I have followed @ashbate and @koichisato-dev 's modifications, but still issue is not solved in my case.

However, I found another possible cause of problem in their starter guideline page (2nd step):

Please be aware that the “Affiliation” information on your Earthdata profile is mandatory. Without this information, the NASA Earthdata token will be invalid, which may result in an error (i.e., OSError: Unable to synchronously open file (file signature not found))

image

I have checked profile information and bearer status (How to check if it is working or not with this bearer token ?) according to above guideline, but still problem is not solved.
If anyone has dealt with this problem (OSError: Unable to synchronously open file), a guide to overcoming it would be very helpful.

Thank you beforehand.

@koichisato-dev
Copy link

koichisato-dev commented Nov 28, 2024

@Abdulazizbek I cannot see your current code, so I am not sure why the error is occurring, but I suspect you might not have set up your .env file correctly or loaded your token properly. Could you try the following code?

[setting in key.env]
BLACK_MARBLE_BEAPER_TOKEN = your token

load_dotenv('your_path/key.env')
bearer = os.getenv('BLACK_MARBLE_BEARER_TOKEN')

if not bearer:
    raise ValueError('No bearer token found in environment variables')

 ntl_monthly_mean = bm_extract(gdf, 
                              product_id="VNP46A3", 
                              date_range=pd.date_range('2014-01-01', '2024-08-01', freq='MS'), 
                              output_directory='your directory', 
                              output_skip_if_exists=False, 
                              variable='AllAngle_Composite_Snow_Free',
                              bearer=bearer)

@Abdulazizbek
Copy link

Abdulazizbek commented Dec 3, 2024

@koichisato-dev Thank you for your explanation. I have checked with your provided code and my token is working properly (expiry date is normal). But, still getting some issues like below:
Screenshot from 2024-12-03
Screenshot from 2024-12-03_1
Screenshot from 2024-12-03_2

I'm not sure where is going wrong. If someone handled codes successfully, source code would be helpful.
Thank you beforehand.

@koichisato-dev
Copy link

koichisato-dev commented Dec 4, 2024

@Abdulazizbek I see now your error messsage is 'No such file or directory', so I guess your error message changed from 'OSError: Unable to synchronously open file' to 'No such file or directory'. Is my guess correct? I just want to clarify your situation.

@Abdulazizbek
Copy link

@koichisato-dev Yes sure

@koichisato-dev
Copy link

@Abdulazizbek If that's the case, the 'output_directory' path in bm_extract is incorrect. Please ensure you specify the correct path.

@Abdulazizbek
Copy link

Abdulazizbek commented Dec 4, 2024

@koichisato-dev While it took too long to overcome an issue, I have double checked every parameters and I have checked images saved in 'output_directory'

@koichisato-dev
Copy link

@Abdulazizbek okay, great! Then, continue the tutorial

@msoltadeo
Copy link

msoltadeo commented Jan 9, 2025

Hi! I am having the same issue. This is a code that was working a couple of months ago and it stopped working.
What I notice is that the files are being downloaded with 0 bytes. I tried downloading the files on my own manually, and they get downloaded correctly.
I tried timeout = httpx.Timeout(15, read=None) but it did now work out.
Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants