This project processes High-Resolution Rapid Refresh (HRRR) weather data and associated forecast reports for machine learning purposes.
The project consists of two main scripts that handle:
- Downloading HRRR weather data files
- Creating metadata files that pair weather data with corresponding forecast discussions
This script downloads HRRR weather data files from NOAA's public dataset.
Key features:
- Downloads HRRR grib2 files from NOAA's S3 bucket
- Configurable date range and time intervals
- Supports different forecast hours and dataset types (pressure, natural, surface)
- Automatically creates storage directory if it doesn't exist
Configuration options:
start_date = "20180101-01" # Format: YYYYMMDD-HH
end_date = "20180102-01"
fhours = [0] # Forecast hours to download
dt = 24 # Download frequency in hours
This script creates a metadata CSV file that pairs HRRR weather data files with their corresponding forecast discussions.
Key features:
- Matches HRRR grib2 files with their corresponding caption files
- Extracts forecast discussions from CSV files
- Creates a metadata file suitable for machine learning training
- Output format: CSV with columns
file_name
andtext
Usage example:
image_directory = "hrrr" # Directory containing HRRR files
caption_directory = "csv_reports" # Directory containing forecast discussions
output_metadata_file = "metadata.csv"
- HRRR files:
hrrr.20180101.t01z.wrfnatf00.grib2
- Caption files:
20180101.csv
- Output metadata:
metadata.csv