diff --git a/.gitignore b/.gitignore index 1c782ba..7c04a60 100644 --- a/.gitignore +++ b/.gitignore @@ -72,4 +72,8 @@ original/ *.incorepw # examples -*examples \ No newline at end of file +*examples +tests/pyincore_data/shapefiles/ + +# local test files +tests/pyincore_data/nsi_tesy.py \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index e5dc827..beaf150 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,7 +5,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/) and this project adheres to [Semantic Versioning](http://semver.org/). ## [Unreleased] +- Utils for NSI data manipulation [#49](https://github.com/IN-CORE/pyincore-data/issues/49) - Documentation container tagging error by github action [#90](https://github.com/IN-CORE/pyincore/issues/90) +- Methods for getting FIPS related information [#94](https://github.com/IN-CORE/pyincore/issues/94) ## [0.7.0] - 2024-10-23 diff --git a/environment.yaml b/environment.yaml index 6e6d29b..0a160cb 100644 --- a/environment.yaml +++ b/environment.yaml @@ -14,3 +14,14 @@ dependencies: - ipyleaflet>=0.16.0 - branca>=0.3.1 - pytest>=3.9.0 + - pyincore>=1.8.0 + - ipyleaflet>=0.16.0 + - branca>=0.3.1 + - sqlalchemy>=1.4.42 + - geojson>=2.5.0 + - python-dotenv>=0.19.0 + - beautifulsoup4>=4.12.2 + - pycodestyle>=2.10.0 + - requests>=2.31.0 + - setuptools>=65.5.0 + - fiona>=1.9.5 diff --git a/pyincore_data/__init__.py b/pyincore_data/__init__.py index c717819..bb11f29 100644 --- a/pyincore_data/__init__.py +++ b/pyincore_data/__init__.py @@ -6,6 +6,7 @@ from pyincore_data.censusutil import CensusUtil from pyincore_data.censusviz import CensusViz +from pyincore_data.nsiparser import NsiParser from pyincore_data.utils.datautil import DataUtil import pyincore_data.globals diff --git a/pyincore_data/config.py b/pyincore_data/config.py new file mode 100644 index 0000000..c2d38f3 --- /dev/null +++ b/pyincore_data/config.py @@ -0,0 +1,30 @@ +# Copyright (c) 2025 University of Illinois and others. All rights reserved. +# +# This program and the accompanying materials are made available under the +# terms of the Mozilla Public License v2.0 which accompanies this distribution, +# and is available at https://www.mozilla.org/en-US/MPL/2.0/ + +# configs file +import os +from dotenv import load_dotenv + +# Load .env file +load_dotenv() + + +class Config: + """ + class to list all configuration settings required for preprocessing and formatting for EddyPro and PyFluxPro + """ + # database parameters + DB_URL = os.getenv('DB_URL', 'localhost') + DB_PORT = os.getenv('DB_PORT', '5432') + DB_NAME = os.getenv('DB_NAME') + DB_USERNAME = os.getenv('DB_USERNAME') + DB_PASSWORD = os.getenv('DB_PASSWORD') + + # NSI parameters + NSI_URL_STATE = os.getenv('NSI_URL_STATE', 'https://nsi.sec.usace.army.mil/downloads/nsi_2022/') + NSI_PREFIX = os.getenv('NSI_PREFIX', 'nsi_2022_') + NSI_URL_FIPS = os.getenv('NSI_URL_FIPS', 'https://nsi.sec.usace.army.mil/nsiapi/structures?fips=') + NSI_URL_FIPS_INTERNAL = os.getenv('NSI_URL_FIPS_INTERNAL', 'https://nsi.sec.usace.army.mil/internal/nsiapi/structures?fips=') diff --git a/pyincore_data/globals.py b/pyincore_data/globals.py index 2ec00c9..67fac4a 100644 --- a/pyincore_data/globals.py +++ b/pyincore_data/globals.py @@ -17,3 +17,18 @@ ) logging_config.fileConfig(LOGGING_CONFIG) LOGGER = logging.getLogger("pyincore-data") + +STATE_FIPS_CODES = { + "Alabama": "01", "Alaska": "02", "Arizona": "04", "Arkansas": "05", "California": "06", + "Colorado": "08", "Connecticut": "09", "Delaware": "10", "Florida": "12", "Georgia": "13", + "Hawaii": "15", "Idaho": "16", "Illinois": "17", "Indiana": "18", "Iowa": "19", + "Kansas": "20", "Kentucky": "21", "Louisiana": "22", "Maine": "23", "Maryland": "24", + "Massachusetts": "25", "Michigan": "26", "Minnesota": "27", "Mississippi": "28", "Missouri": "29", + "Montana": "30", "Nebraska": "31", "Nevada": "32", "New Hampshire": "33", "New Jersey": "34", + "New Mexico": "35", "New York": "36", "North Carolina": "37", "North Dakota": "38", "Ohio": "39", + "Oklahoma": "40", "Oregon": "41", "Pennsylvania": "42", "Rhode Island": "44", "South Carolina": "45", + "South Dakota": "46", "Tennessee": "47", "Texas": "48", "Utah": "49", "Vermont": "50", + "Virginia": "51", "Washington": "53", "West Virginia": "54", "Wisconsin": "55", "Wyoming": "56" +} + +COUNTY_FIPS_BASE_URL = "https://api.census.gov/data/2020/acs/acs5" diff --git a/pyincore_data/nsiparser.py b/pyincore_data/nsiparser.py new file mode 100644 index 0000000..f9165b9 --- /dev/null +++ b/pyincore_data/nsiparser.py @@ -0,0 +1,152 @@ +# Copyright (c) 2025 University of Illinois and others. All rights reserved. +# +# This program and the accompanying materials are made available under the +# terms of the Mozilla Public License v2.0 which accompanies this distribution, +# and is available at https://www.mozilla.org/en-US/MPL/2.0/ + +import pandas as pd +import geopandas as gpd +import requests + +from pyincore_data.utils.datautil import DataUtil +from pyincore_data import globals as pyincore_globals + +# Static mapping of state names to FIPS codes (since the API doesn't directly return them in this case) +STATE_FIPS_CODES = pyincore_globals.STATE_FIPS_CODES + + +class NsiParser: + @staticmethod + def create_nsi_gdf_by_county_fips(in_fips): + """ + Creates a GeoDataFrame by NSI data for a county FIPS codes. + + Args: + in_fips (Str): A county FIPS code (e.g., '29001'). + + Returns: + gpd.GeoDataFrame: A GeoDataFrame containing data for provided FIPS codes. + """ + # get feature collection from NIS api + gdf = DataUtil.get_features_by_fips(in_fips) + + return gdf + + @staticmethod + def create_nsi_gdf_by_counties_fips_list(fips_list): + """ + Creates a merged GeoDataFrame by fetching and combining NSI data for a list of county FIPS codes. + + Args: + fips_list (list): A list of county FIPS codes (e.g., ['15005', '29001']). + + Returns: + gpd.GeoDataFrame: A merged GeoDataFrame containing data for all provided FIPS codes. + """ + # initialize an empty GeoDataFrame + merged_gdf = gpd.GeoDataFrame() + + for fips in fips_list: + print(f"Processing FIPS: {fips}") + gdf = DataUtil.get_features_by_fips(fips) + + if gdf is not None and not gdf.empty: + merged_gdf = gpd.GeoDataFrame(pd.concat([merged_gdf, gdf], ignore_index=True)) + + # ensure CRS consistency in the merged GeoDataFrame + if not merged_gdf.empty: + merged_gdf = merged_gdf.set_crs(epsg=4326) + + return merged_gdf + + @staticmethod + def get_county_fips_by_state(state_name): + """ + Fetches all county FIPS codes for a given state using the US Census Bureau API. + + Args: + state_name (str): Full state name (e.g., "Illinois"). + + Returns: + list: A list of dictionaries containing county names and their FIPS codes. + """ + # Normalize state name to title case for matching + state_name_normalized = state_name.title() + + # Validate the state name and get the state FIPS code + state_fips = STATE_FIPS_CODES.get(state_name_normalized) + if not state_fips: + raise ValueError(f"State '{state_name}' not found. Please check the spelling.") + + # Census API URL for county-level data + county_fips_url = f"{pyincore_globals.COUNTY_FIPS_BASE_URL}?get=NAME&for=county:*&in=state:{state_fips}" + response = requests.get(county_fips_url) + + if response.status_code != 200: + raise ValueError(f"Error fetching counties for state '{state_name}': {response.status_code}") + + try: + counties_data = response.json() + except ValueError: + raise ValueError("Failed to parse JSON response for counties.") + + # Ensure counties_data is valid + if not isinstance(counties_data, list) or len(counties_data) < 2: + raise ValueError("Unexpected data format for county FIPS codes.") + + # Extract county names and FIPS codes + county_list = [ + {"county": row[0], "fips": f"{state_fips}{row[2]}"} + for row in counties_data[1:] # Skip the header + ] + + return county_list + + @staticmethod + def get_county_fips_only_list_by_state(state_name): + """ + Fetches a list of FIPS codes for all counties in a given state. + + Args: + state_name (str): Full state name (e.g., "Illinois"). + + Returns: + list: A list of FIPS codes (strings) for all counties in the state. + """ + try: + counties = NsiParser.get_county_fips_by_state(state_name) + fips_list = [county['fips'] for county in counties] + return fips_list + except ValueError as e: + print("Error:", e) + return [] + + @staticmethod + def get_fips_by_state_and_county(state_name, county_name): + """ + Fetches the FIPS code for a specific county in a given state. + + Args: + state_name (str): Full state name (e.g., "Illinois"). + county_name (str): Full county name (e.g., "Champaign"). + + Returns: + str: The FIPS code for the specified county. + None: If the state or county is not found. + """ + try: + # fetch all counties and their FIPS codes for the state + counties = NsiParser.get_county_fips_by_state(state_name) + + # find the county by name + for county in counties: + county_name_cleaned = county['county'].split(',')[0].replace(" County", "").strip().lower() + if county_name_cleaned == county_name.lower(): + return county['fips'] + + # if no match is found + print(f"County '{county_name}' not found in state '{state_name}'.") + return None + except ValueError as e: + print("Error:", e) + return None diff --git a/pyincore_data/utils/__init__.py b/pyincore_data/utils/__init__.py index e69de29..071c939 100644 --- a/pyincore_data/utils/__init__.py +++ b/pyincore_data/utils/__init__.py @@ -0,0 +1,7 @@ +# Copyright (c) 2025 University of Illinois and others. All rights reserved. +# +# This program and the accompanying materials are made available under the +# terms of the Mozilla Public License v2.0 which accompanies this distribution, +# and is available at https://www.mozilla.org/en-US/MPL/2.0/ + +from pyincore_data.utils.datautil import DataUtil \ No newline at end of file diff --git a/pyincore_data/utils/datautil.py b/pyincore_data/utils/datautil.py index 55053e2..b2aee41 100644 --- a/pyincore_data/utils/datautil.py +++ b/pyincore_data/utils/datautil.py @@ -4,6 +4,17 @@ # terms of the Mozilla Public License v2.0 which accompanies this distribution, # and is available at https://www.mozilla.org/en-US/MPL/2.0/ +import fiona +import requests +import uuid +import os +import geopandas as gpd +import sqlalchemy + +from sqlalchemy import create_engine +from geojson import FeatureCollection +from pyincore_data.config import Config + class DataUtil: @staticmethod @@ -51,3 +62,180 @@ def convert_dislocation_pd_to_csv(in_pd, save_columns, programname, savefile): # Save cen_blockgroup dataframe with save_column variables to csv named savefile print("CSV data file saved to: " + programname + "/" + savefile + ".csv") in_pd[save_columns].to_csv(programname + "/" + savefile + ".csv", index=False) + + @staticmethod + def get_features_by_fips(state_county_fips): + """ + Downloads a GeoJSON feature collection from the NSI endpoint using the provided county FIPS code + and returns it as a GeoDataFrame with additional columns for FIPS, state FIPS, and county FIPS. + + Args: + state_county_fips (str): The combined state and county FIPS code (e.g., '15005'). + + Returns: + gpd.GeoDataFrame: A GeoDataFrame containing the features with additional columns. + """ + print("Requesting data for " + str(state_county_fips) + " from NSI endpoint") + json_url = Config.NSI_URL_FIPS + str(state_county_fips) + result = requests.get(json_url) + result.raise_for_status() + result_json = result.json() + + collection = FeatureCollection(result_json['features']) + + gdf = gpd.GeoDataFrame.from_features(collection['features']) + gdf = gdf.set_crs(epsg=4326) + + gdf = DataUtil.add_columns_to_gdf(gdf, state_county_fips) + + return gdf + + @staticmethod + def download_nsi_data_state_file(state_fips): + """ + Downloads a zipped GeoPackage file for the given state FIPS code from the NSI endpoint. + + Args: + state_fips (str): The state FIPS code (e.g., '29' for Missouri). + + Returns: + None + """ + file_name = Config.NSI_PREFIX + str(state_fips) + ".gpkg.zip" + file_url = "%s/%s" % (Config.NSI_URL_STATE, file_name) + print("Downloading NSI data for the state: " + str(state_fips)) + r = requests.get(file_url, stream=True) + + if r is None or r.status_code != 200: + r.raise_for_status() + + else: + download_filename = os.path.join("data", file_name) + + with open(download_filename, "wb") as zipfile: + for chunk in r.iter_content(chunk_size=1024): + if chunk: + zipfile.write(chunk) + print("Downloading NSI data for the state: " + str(state_fips)) + + @staticmethod + def read_geopkg_to_gdf(infile): + """ + Reads a GeoPackage file and converts it into a GeoDataFrame. + + Args: + infile (str): Path to the GeoPackage file. + + Returns: + gpd.GeoDataFrame: A GeoDataFrame containing data from the GeoPackage file. + """ + print("Reading GeoPackage") + gpkgpd = None + for layername in fiona.listlayers(infile): + gpkgpd = gpd.read_file(infile, layer=layername, crs='EPSG:4326') + + return gpkgpd + + @staticmethod + def add_guid_to_gdf(gdf): + """ + Adds a globally unique identifier (GUID) column to the GeoDataFrame. + + Args: + gdf (gpd.GeoDataFrame): Input GeoDataFrame. + + Returns: + gpd.GeoDataFrame: GeoDataFrame with a new 'guid' column. + """ + print("Creating GUID column") + gdf['guid'] = [str(uuid.uuid4()) for _ in range(len(gdf))] + + return gdf + + @staticmethod + def add_columns_to_gdf(gdf, fips): + """ + Adds FIPS-related columns (FIPS, state FIPS, and county FIPS) to the GeoDataFrame. + + Args: + gdf (gpd.GeoDataFrame): Input GeoDataFrame. + fips (str): Combined state and county FIPS code. + + Returns: + gpd.GeoDataFrame: GeoDataFrame with new FIPS-related columns. + """ + print("Creating FIPS-related columns") + statefips = fips[:2] + countyfips = fips[2:] + for i, row in gdf.iterrows(): + guid_val = str(uuid.uuid4()) + gdf.at[i, 'guid'] = guid_val + gdf.at[i, 'fips'] = fips + gdf.at[i, 'statefips'] = statefips + gdf.at[i, 'countyfips'] = countyfips + + return gdf + + @staticmethod + def gdf_to_geopkg(gdf, outfile): + """ + Saves a GeoDataFrame as a GeoPackage file. + + Args: + gdf (gpd.GeoDataFrame): Input GeoDataFrame. + outfile (str): Path to the output GeoPackage file. + + Returns: + None + """ + print("Creating output GeoPackage") + gdf.to_file(outfile, driver="GPKG") + + @staticmethod + def upload_postgres_from_gpkg(infile): + """ + Reads data from a GeoPackage file and uploads it to a PostgreSQL database. + + Args: + infile (str): Path to the GeoPackage file. + + Returns: + None + """ + gpkgpd = None + for layername in fiona.listlayers(infile): + gpkgpd = gpd.read_file(infile, layer=layername, crs='EPSG:4326') + + DataUtil.upload_postgres_gdf(gpkgpd) + + @staticmethod + def upload_postgres_gdf(gdf): + """ + Uploads a GeoDataFrame to a PostgreSQL database. + + Args: + gdf (gpd.GeoDataFrame): Input GeoDataFrame. + + Returns: + bool: True if upload is successful, False otherwise. + """ + try: + db_connection_url = "postgresql://%s:%s@%s:%s/%s" % \ + (Config.DB_USERNAME, Config.DB_PASSWORD, Config.DB_URL, Config.DB_PORT, Config.DB_NAME) + con = create_engine(db_connection_url) + + print('Dropping ' + str(gdf.geometry.isna().sum()) + ' nulls.') + gdf = gdf.dropna(subset=['geometry']) + + print('Uploading GeoDataFrame to database') + gdf.to_postgis("nsi_raw", con, index=False, if_exists='replace') + + con.dispose() + + print('Upload to database completed.') + + return True + + except sqlalchemy.exc.OperationalError: + print("Error in connecting to the database server") + return False \ No newline at end of file diff --git a/recipes/meta.yaml b/recipes/meta.yaml index f2c6a18..cc238b8 100644 --- a/recipes/meta.yaml +++ b/recipes/meta.yaml @@ -39,6 +39,14 @@ requirements: - geopandas>=0.14.0 - ipyleaflet>=0.16.0 - branca>=0.3.0 + - sqlalchemy>=1.4.42 + - geojson>=2.5.0 + - python-dotenv>=0.19.0 + - beautifulsoup4>=4.12.2 + - pycodestyle>=2.10.0 + - requests>=2.31.0 + - setuptools>=65.5.0 + - fiona>=1.9.5 test: # Python imports diff --git a/requirements.txt b/requirements.txt index 2811c44..e669a20 100644 --- a/requirements.txt +++ b/requirements.txt @@ -6,3 +6,11 @@ geopandas>=0.14.0 pytest>=3.9.0 ipyleaflet>=0.16.0 branca>=0.3.1 +sqlalchemy>=1.4.42 +geojson>=2.5.0 +python-dotenv>=0.19.0 +beautifulsoup4>=4.12.2 +pycodestyle>=2.10.0 +requests>=2.31.0 +setuptools>=65.5.0 +fiona>=1.9.5 \ No newline at end of file diff --git a/tests/__init__.py b/tests/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/tests/pyincore_data/__init__.py b/tests/pyincore_data/__init__.py deleted file mode 100644 index d3c0a1e..0000000 --- a/tests/pyincore_data/__init__.py +++ /dev/null @@ -1,5 +0,0 @@ -# Copyright (c) 2021 University of Illinois and others. All rights reserved. -# -# This program and the accompanying materials are made available under the -# terms of the Mozilla Public License v2.0 which accompanies this distribution, -# and is available at https://www.mozilla.org/en-US/MPL/2.0/ diff --git a/tests/pyincore_data/test_nsi.py b/tests/pyincore_data/test_nsi.py new file mode 100644 index 0000000..44c6fae --- /dev/null +++ b/tests/pyincore_data/test_nsi.py @@ -0,0 +1,50 @@ +# Copyright (c) 2025 University of Illinois and others. All rights reserved. +# +# This program and the accompanying materials are made available under the +# terms of the Mozilla Public License v2.0 which accompanies this distribution, +# and is available at https://www.mozilla.org/en-US/MPL/2.0/ + +import pytest + +from pyincore_data.nsiparser import NsiParser + + +@pytest.fixture +def client(): + return pytest.client + + +def test_create_nsi_gdf_by_county_fips(): + fips = '15005' + gdf = NsiParser.create_nsi_gdf_by_county_fips(fips) + + assert gdf.shape[0] > 0 + + +def test_create_nsi_gdf_by_counties_fips_list(): + fips_list = ['15005', '29001', '01001'] + merged_gdf = NsiParser.create_nsi_gdf_by_counties_fips_list(fips_list) + + assert merged_gdf.shape[0] > 0 + + +def test_get_county_fips_by_state(): + state = 'illinois' + fips_list = NsiParser.get_county_fips_by_state(state) + + assert len(fips_list) > 0 + + +def test_get_county_fips_only_list_by_state(): + state = 'illinois' + fips_list = NsiParser.get_county_fips_only_list_by_state(state) + + assert len(fips_list) > 0 + + +def test_get_fips_by_state_and_county(): + state = 'illinois' + county = 'champaign' + fips = NsiParser.get_fips_by_state_and_county(state, county) + + assert fips == '17019'