From f670ffbc5ebbd71b3fa3a8b8e5d5e01a2230df39 Mon Sep 17 00:00:00 2001 From: David Altare Date: Wed, 27 Mar 2024 19:22:27 -0700 Subject: [PATCH] add links to github repository / issues --- .../execute-results/html.json | 2 +- 01_document/_quarto.yml | 20 +++++++++++++++++-- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/01_document/_freeze/example_census_race_ethnicity_calculation/execute-results/html.json b/01_document/_freeze/example_census_race_ethnicity_calculation/execute-results/html.json index 17bc8c9..e3b75ac 100644 --- a/01_document/_freeze/example_census_race_ethnicity_calculation/execute-results/html.json +++ b/01_document/_freeze/example_census_race_ethnicity_calculation/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "8fbec6f467c41ac9c50945a1375a0e5e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Estimating Demographics of Custom Spatial Features\"\nsubtitle: \"Accessing U.S. Census Bureau Data & Calculating Weighted Averages with Areal- and Population-Weighted Interpolation\"\nnumber-sections: true\ntoc: true\ntoc-depth: 4\nformat:\n html:\n self-contained: false\nbibliography: references.bib\n---\n\n```{=html}\n \n\n```\n\n## Background {#sec-background}\n\n::: callout-note\nFor comments, suggestions, corrections, or questions on anything below, contact [david.altare\\@waterboards.ca.gov](mailto:david.altare@waterboards.ca.gov), or [open an issue](https://github.com/daltare/example-census-race-ethnicity-calculation/issues) on github.\n:::\n\n::: callout-warning\nThis is a draft / work in progress – some parts are still under development, and existing parts may change.\n:::\n\nThis document provides an example of how to use tools available from the [R programming language](https://www.R-project.org/) [@R] to estimate characteristics of any given *target* spatial area(s) (e.g., neighborhoods, project boundaries, water supplier service areas, etc.) based on data from a *source* dataset containing the characteristic data of interest (e.g., census data, CalEnvrioScreen scores, etc.), especially when the boundaries of the *source* and *target* areas overlap but don't necessarily align with each other. It also provides some brief background on the various types of data available from the U.S Census Bureau, and links to a few places to find more in-depth information.\n\nThis particular example estimates demographic characteristics of community water systems in the Sacramento County area (the *target* dataset). It uses the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package [@tidycensus] to access selected demographic data from the U.S. Census Bureau (the *source* dataset) for census units whose spatial extent covers those water systems' service areas, then uses the [`sf`](https://r-spatial.github.io/sf/) package [@sf] package (for working with spatial data) and the [`tidyverse`](https://www.tidyverse.org/) collection of packages [@tidyverse] (for general data cleaning and transformation) to estimate some demographic characteristics of each water system based on that census data. It also uses the [`areal`](https://chris-prener.github.io/areal/) R package [@areal] to check some of the results, and as general guidance on the principles and techniques for implementing areal interpolation.\n\nThis example is just intended to be a simplified demonstration of a possible workflow. For a real analysis, additional steps and considerations – that may not be covered here – may be needed to deal with data inconsistencies (e.g., missing or incomplete data), required level of precision and acceptable assumptions (e.g. more fine-grained datasets or more sophisticated techniques could be used to estimate/model population distributions), or other project-specific issues that might arise.\n\n## Setup {#sec-setup}\n\nThe code block below loads required packages for this analysis, and sets some user-defined options and defaults. If they aren't already installed on your computer, you can install them with the R command `install.packages('package-name')` (and replace `package-name` with the name of the package you want to install).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# packages ----\nlibrary(tidycensus)\nlibrary(tigris)\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(areal)\nlibrary(janitor)\nlibrary(here)\nlibrary(units)\n# library(Polychrome)\nlibrary(knitr)\nlibrary(kableExtra)\nlibrary(tmap)\nlibrary(patchwork)\nlibrary(scales)\nlibrary(digest)\nlibrary(mapview)\nlibrary(biscale)\nlibrary(cowplot)\nlibrary(glue)\nlibrary(ggtext)\n\n# conflicts ----\nlibrary(conflicted)\nconflicts_prefer(dplyr::filter)\n\n# options ----\noptions(scipen = 999) # turn off scientific notation\noptions(tigris_use_cache = TRUE) # use data caching for tigris\n\n# reference system ----\ncrs_projected <- 3310 # set a common projected coordinate reference system to be used throughout this analysis - see: https://epsg.io/3310\n```\n:::\n\n\n## Census Data Overview {#sec-census-overview}\n\nThis section provides some brief background on the various types of data available from the U.S. Census Bureau (a later section - @sec-census-access - demonstrates how to retrieve data from the U.S. Census Bureau using the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package). Most of the information covered here comes from the book [Analyzing US Census Data: Methods, Maps, and Models in R](https://walker-data.com/census-r/index.html), which is a great source of information if you'd like more detail about any of the topics below [@walker2023].\n\n::: callout-note\nIf you're already familiar with Census data and want to skip this overview, go directly to the next section: @sec-system-boundaries\n:::\n\nDifferent census products/surveys contain data on different variables, at different geographic scales, over varying periods of time, and with varying levels of certainty. Therefore, there are a number of judgement calls to make when determining which type of census data to use for an analysis – e.g., which data product to use (Decennial Census or American Community Survey), which geographic scale to use (e.g., Block, Block Group, Tract, etc.), what time frame to use, which variables to assess, etc.\n\nMore detailed information about U.S. Census Bureau's data products and other topics mentioned below is available [here](https://walker-data.com/census-r/the-united-states-census-and-the-r-programming-language.html#the-united-states-census-and-the-r-programming-language).\n\n### Census Unit Geography / Hierarchy {#sec-census-hierarchy}\n\nPublicly available datasets from the U.S Census Bureau generally consist of individual survey responses aggregated to defined census units (e.g., census tracts) that cover varying geographic scales. Some of these units are nested and can be neatly aggregated (e.g., each census tract is composed of a collection of block groups, and each block group is composed of a collection of blocks), while other census units are outside this hierarchy (e.g., Zip Code Tabulation Areas don't coincide with any other census unit). @fig-census-hierarchies shows the relationship of all of the various census units.\n\nCommonly used census statistical units like tracts and block groups have target population size ranges, and can be adjusted every 10 years (with the decennial census) based on population changes. For example, all ACS 5-year datasets prior to 2020 use the 2010 boundaries for tracts, block groups, and blocks, and all ACS 5-year datasets from [2020 onward](https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2020/geography-changes.html) (presumably through 2029) use the 2020 boundaries for those units. [Census tracts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_13) are generally around 4,000 people, with a range from about 1,200 to 8,000, and [block groups](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_4) generally contain 600 to 3,000 people. [Blocks](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_5) are the smallest census units, and are \"areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads\". For example, a census block may be \"a city block bounded on all sides by streets\", while \"blocks in suburban and rural areas may be larger, more irregular in shape, and bounded by a variety of features, such as roads, streams, and transmission lines\".\n\n::: callout-caution\nCensus boundaries can change over time. Commonly used statistical units like tracts, block groups, and blocks tend to be revised every 10 years (with the decennial census), so it's important to use a census boundary dataset that matches the version of the census demographic data you're retrieving; otherwise, the demographic data may not match geographic areas in your boundary dataset. In some cases, a census unit that exists in a given year of the census data may not exist at all in a different year's dataset, because census units can be split or merged when boundaries are revised.\n\nFor more information, see [here](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_geography_handbook_2020_ch02.pdf) or [here](https://www.census.gov/programs-surveys/acs/geography-acs/geography-boundaries-by-year.html) or [here](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_6) or [here](https://www.census.gov/data/academy/data-gems/2021/compare-2020-census-and-2010-census-redistricting-data.html).\n:::\n\nFor a list of the different geographic units available for each of the different census products/surveys (see @sec-census-datasets) that can be accessed via the `tidycensus` package, go [here](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus).\n\n![Census Unit Hierarchies](https://walker-data.com/census-r/img/screenshots/census-hierarchies.png){#fig-census-hierarchies}\n\n### Census Datasets / Surveys {#sec-census-datasets}\n\nThe Decennial Census is conducted every 10 years, and is intended to provide a complete count of the US population and assist with political redistricting. As a result, it collects a relatively limited set of basic demographic data, but (should) provide a high degree of precision (i.e., in general it should provide exact counts). It is available for geographic units down to the census block (the smallest census unit available – see @sec-census-hierarchy). For information about existing and planned future releases of 2020 census data products, go [here](https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/release/about-2020-data-products.html).\n\nThe American Community Survey (ACS) provides a much larger array of demographic information than the Decennial Census, and is updated more frequently. The ACS is based on a sample of the population (rather than a count of the entire population, as in the Decennial Census), so it represents estimated values rather than precise counts; therefore, each data point is available as an estimate (typically labeled with an \"E\" in census variable codes, which are discussed in @sec-census-variables ) along with an associated margin of error (typically labeled with \"M\" or \"MOE\" in census variable codes) around its estimated value.\n\nThe ACS is available in two formats. The 5-year ACS is a rolling average of 5 years of data (e.g., the 2021 5-year ACS dataset is an average of the ACS data from 2017 through 2021), and is generally available for geographic units down to the census block group (though some 5-year ACS data may only be available at less granular levels). The 1-year ACS provides data for a single year, and is only available for geographies with population greater than 65,000 (e.g., large cities and counties). Therefore, only the 5-year ACS will be useful for any analysis at a relatively fine scale (e.g., anything that requires data at or more detailed than the census tract level, or any analysis that considers smaller counties/cities – by definition, census tracts always contain significantly fewer than 65,000 people).\n\nIn addition to the Decennial Census and ACS data, a number of other census data products/surveys are also available. For example, see the `censusapi` R package ([here](https://github.com/hrecht/censusapi) or [here](https://www.hrecht.com/censusapi/index.html)) for access to over 300 census API endpoints. For historical census data, see the discussion [here](https://walker-data.com/census-r/other-census-and-government-data-resources.html?q=API%20endpoint#other-census-and-government-data-resources) on using NHGIS, IPUMS, and the `ipumsr` package.\n\n### Census Variables / Codes {#sec-census-variables}\n\nEach census product collects data for many different demographic variables, and each variable is generally associated with an identifier code. In order to access census data programmatically, you often need to know the code associated with each variable of interest. When determining which variables to use, you need to consider what census product contains those variables (see @sec-census-datasets) and how they differ in terms of time frame, precision, spatial granularity (see @sec-census-hierarchy), etc.\n\nThe `tidycensus` package offers a convenient generic way to search for variables across different census products using the `load_variables()` function, as described [here](https://walker-data.com/tidycensus/articles/basic-usage.html#searching-for-variables).\n\nThe following websites may also be helpful for exploring the various census data products and finding the variable names and codes they contain:\n\n- Census Reporter (for ACS data): (especially )\n\n- Census Bureau's list of variable codes, e.g.:\n\n - 2020 Census codes: \n\n - 2022 ACS 5 year codes: \n\n- Census Bureau's data interface (for Decennial Census and ACS, and other census datasets): \n\n- National Historical Geographic Information System (NHGIS) (for ACS data and historical decennial Census data): \n\n## Target Data Boundaries (Water Systems) {#sec-system-boundaries}\n\nIn this section, we'll get the service area boundaries for Community Water Systems within the Sacramento County area. This will serve as the *target* dataset – i.e., the set of areas which we'll be estimating the characteristics of – and will also be used to specify the geographic areas of the census data we want to retrieve. We'll also get a dataset of county boundaries which overlap the water service areas in this study, which can also help with specifying what census data to access and/or be used to make maps and visualizations.\n\n### Read Water System Data\n\nIn this case, we'll get the water system dataset from a shapefile that's saved locally, then transform that dataset into a common coordinate reference system for mapping and analysis (which is defined above in the variable `crs_projected`).\n\nThis water system dataset comes from the [California Drinking Water System Area Boundaries dataset](https://gispublic.waterboards.ca.gov/portal/home/item.html?id=fbba842bf134497c9d611ad506ec48cc). For this example, the dataset has been pre-filtered for systems within Sacramento County (by selecting records where the `COUNTY` field is \"SACRAMENTO\") and for Community Water Systems (by selecting records where the `STATE_CLAS` field is \"COMMUNITY\"). Some un-needed fields have also been dropped, remaining fields have been re-orderd.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_systems_sac <- st_read(here('02_data_input', \n 'water_supplier_boundaries_sac', \n 'System_Area_Boundary_Layer_Sac.shp')) %>% \n st_transform(crs_projected) # transform to common coordinate system\n```\n:::\n\n\nWe can use the `glimpse` function (below) to take get a sense of what type of information is available in the water system dataset and how it's structured.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_systems_sac)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 12\n$ WATER_SY_1 \"HOOD WATER MAINTENCE DIST [SWS]\", \"MC CLELLAN MHP\", \"MAGNO…\n$ WATER_SYST \"CA3400101\", \"CA3400179\", \"CA3400130\", \"CA3400135\", \"CA3400…\n$ GLOBALID \"{36268DB3-9DB2-4305-A85A-2C3A85F20F34}\", \"{E3BF3C3E-D516-4…\n$ BOUNDARY_T \"Water Service Area\", \"Water Service Area\", \"Water Service …\n$ OWNER_TYPE \"L\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\",…\n$ COUNTY \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMENTO\", \"SA…\n$ REGULATING \"LPA64 - SACRAMENTO COUNTY\", \"LPA64 - SACRAMENTO COUNTY\", \"…\n$ FEDERAL_CL \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUN…\n$ STATE_CLAS \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUN…\n$ SERVICE_CO 82, 199, 34, 64, 128, 83, 28, 50, 164, 5684, 14798, 115, 33…\n$ POPULATION 100, 700, 40, 150, 256, 150, 32, 100, 350, 18005, 44928, 20…\n$ geometry MULTIPOLYGON (((-132703 403..., MULTIPOLYGON (…\n```\n\n\n:::\n:::\n\n\nNote that this dataset already includes a `POPULATION` variable that indicates the population served by each water system, which we renamed to `water_system_population_reported` above (note: I'm not exactly how the data in this variable is derived). However, for this analysis we'll be making our own estimate of the population within each system's service area based on U.S. Census Bureau data and the spatial representation of the system boundaries. Given the uncertainty in how the reported population data was derived (including potential temporal differences), the population estimates produced here will likely will not exactly match the reported population data; but, the reported population data may serve as a useful check to make sure our estimates are reasonable.\n\nTo make the water system data easier to work with, we can make some more descriptive field names (note that while it's redundant, we're using the prefix `water_system_` for all field names to distinguish data types when joining this data with other datasets later).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_systems_sac <- water_systems_sac %>% \n rename(water_system_name = WATER_SY_1, \n water_system_number = WATER_SYST,\n water_system_id\t= GLOBALID,\n water_system_boundary_type = BOUNDARY_T,\n water_system_owner_type\t= OWNER_TYPE,\n water_system_county\t= COUNTY,\n water_system_regulating_agency = REGULATING,\n water_system_federal_class = FEDERAL_CL,\n water_system_state_class = STATE_CLAS,\n water_system_service_connections = SERVICE_CO,\n water_system_population_reported = POPULATION)\n```\n:::\n\n\nHere's a view of the structure of the revised dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_systems_sac)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 12\n$ water_system_name \"HOOD WATER MAINTENCE DIST [SWS]\", \"M…\n$ water_system_number \"CA3400101\", \"CA3400179\", \"CA3400130\"…\n$ water_system_id \"{36268DB3-9DB2-4305-A85A-2C3A85F20F3…\n$ water_system_boundary_type \"Water Service Area\", \"Water Service …\n$ water_system_owner_type \"L\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P…\n$ water_system_county \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMEN…\n$ water_system_regulating_agency \"LPA64 - SACRAMENTO COUNTY\", \"LPA64 -…\n$ water_system_federal_class \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\"…\n$ water_system_state_class \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\"…\n$ water_system_service_connections 82, 199, 34, 64, 128, 83, 28, 50, 164…\n$ water_system_population_reported 100, 700, 40, 150, 256, 150, 32, 100,…\n$ geometry MULTIPOLYGON (((-132703 …\n```\n\n\n:::\n:::\n\n\n#### Alternative Data Retrieval Method\n\nReading in data from a shapefile is shown above because it's likely one of the more common ways that users will access their *target* boundary data. However, depending on the dataset, there may be other ways to access the data. For example, the code chunk below demonstrates an alternative – using the [`arcgislayers`](https://r.esri.com/arcgislayers/index.html) package [@arcgislayers] – that connects directly to the source dataset (to retrieve the most recent version) and applies the filters needed to reproduce the dataset in the `System_Area_Boundary_Layer_Sac.shp` file. Also, note that storing data in formats other than the common shapefile format – such as the geopackage format – can have some advantages (for example, see [here](http://switchfromshapefile.org/)).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# load arcgislayers package (see: https://r.esri.com/arcgislayers/index.html)\ninstall.packages('pak') # only needed if the pak package is not already installed\npak::pkg_install(\"R-ArcGIS/arcgislayers\", dependencies = TRUE)\n\nlibrary(arcgislayers)\n\n# define link to data source\nurl_feature <- 'https://gispublic.waterboards.ca.gov/portalserver/rest/services/Drinking_Water/California_Drinking_water_system_numberem_Area_Boundaries/FeatureServer/0'\n\n# connect to data source\nwater_system_numberems_feature_layer <- arc_open(url_feature)\n\n# download and filter data from source\nwater_system_numberems_sac <- arc_select(\n water_system_numberems_feature_layer,\n # apply filters\n where = \"COUNTY = 'SACRAMENTO' AND STATE_CLASSIFICATION = 'COMMUNITY'\",\n # select fields\n fields = c('WATER_SYSTEM_NAME', 'WATER_SYSTEM_NUMBER', 'GLOBALID',\n 'BOUNDARY_TYPE', 'OWNER_TYPE_CODE', 'COUNTY',\n 'REGULATING_AGENCY', 'FEDERAL_CLASSIFICATION', 'STATE_CLASSIFICATION',\n 'SERVICE_CONNECTIONS', 'POPULATION')) %>%\n # transform to common coordinate system\n st_transform(crs_projected) %>%\n # rename fields to match names from the shapefile (which automatically truncates field names)\n rename(water_system_name = WATER_SYSTEM_NAME,\n water_system_number = WATER_SYSTEM_NUMBER,\n water_system_id = GLOBALID,\n water_system_boundary_type = BOUNDARY_TYPE,\n water_system_owner_type = OWNER_TYPE_CODE,\n water_system_county = COUNTY,\n water_system_regulating_agency = REGULATING_AGENCY,\n water_system_federal_class = FEDERAL_CLASSIFICATION,\n water_system_state_class = STATE_CLASSIFICATION,\n water_system_service_connections = SERVICE_CONNECTIONS,\n water_system_population_reported = POPULATION)\n```\n:::\n\n\n### Get County Boundaries {#sec-county-boundaries}\n\nWhen accessing census data using the `tidycensus` R package as shown below (in @sec-census-access), it's often useful (though not strictly required) to know which counties overlap the target dataset (note that, even though the dataset is filtered for systems in Sacramento county, there are some systems whose boundaries extend into neighboring counties). County boundaries may also be useful for making maps in later stages of the analysis. We can get a dataset of county boundaries in California from the [TIGER dataset](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html), which can be accessed with R using the [`tigris`](https://github.com/walkerke/tigris) R package [@tigris].\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncounties_ca <- counties(state = 'CA', \n cb = TRUE) %>% # simplified\n st_transform(crs_projected) # transform to common coordinate system\n```\n:::\n\n\nThen, we can get a list of counties that overlap with the boundaries of the Sacramento area community water systems obtained above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncounties_overlap <- counties_ca %>% \n st_filter(water_systems_sac, \n .predicate = st_overlaps)\n\ncounties_list <- counties_overlap %>% pull(NAME)\n```\n:::\n\n\nThe counties in the `counties_list` variable are: San Joaquin, Yolo, Placer, Sacramento.\n\n### Plot Target Data\n\n@fig-suppliers-counties shows the water systems and county boundaries in an interactive map.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmapview(counties_overlap, \n alpha.regions = 0, \n zcol = 'NAME', \n layer.name = 'County', \n legend = FALSE) + \n mapview(water_systems_sac, \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE)\n```\n\n::: {#fig-suppliers-counties .cell-output-display}\n\n```{=html}\n
\n\n```\n\n\nSelected water systems (with county boundaries for reference).\n:::\n:::\n\n\n## Accessing Census Data {#sec-census-access}\n\nThe following sections demonstrate how to retrieve census data from the Decennial Census and the ACS using the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package.\n\nIn order to use the `tidycensus` R package, you'll need to obtain a personal API key from the US Census Bureau (which is free and available to anyone) by signing up here: . Once you have your API key, you'll need to register it in R by entering the command `census_api_key(key = \"YOUR API KEY\", install = TRUE)` in the console. Note that the `install = TRUE` argument means that the key is saved for all future R sessions, so you'll only need to run that command once on your computer (rather than including it in your scripts). Alternatively, you could save your key to an environment variable and retrieve it using `Sys.getenv()`. Either way will help you avoid the possibility of entering your API key into any scripts that could be shared publicly.\n\n::: callout-caution\nBecause the boundaries of census units (e.g., tracts, block groups, blocks, etc) can change over time, it's important to make sure that the version (year) of the census data you're retrieving matches the version of the census boundary dataset you're using. The methods shown below retrieve the census boundary dataset together with the census demographic data, which ensures that this won't be a potential problem. However, if you use a different workflow that retrieves the geographic boundaries and demographic data via separate processes, you should ensure that the versions are consistent.\n:::\n\n### Decennial Census {#sec-census-access-decennial}\n\nThis section retrieves census data from the Decennial Census, using the `get_decennial` function from the `tidycensus` package. As of this writing, the most recent version of the decennial census data available is from 2020, and we can set that as a variable below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# set year\ndecennial_year <- 2020\n```\n:::\n\n\nNext, we can define the list of demographic variables we'd like to retrieve tabular data for, by saving the census variables we want in the `census_vars_decennial` object (see @sec-census-variables for more information about how to discover variables of interest and find their associated codes). Note that here we're providing descriptive names associated with each variable code, which makes the data easier to work with later, but isn't strictly necessary (i.e., you could just supply the variable codes alone).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# define variables to pull from the decennial census\ncensus_vars_decennial <- c(\n 'population_hispanic_or_latino_count' = 'P2_002N', # Total Hispanic or Latino\n 'population_white_count' = 'P2_005N', # White (Not Hispanic or Latino)\n 'population_black_or_african_american_count' = 'P2_006N', # Black or African American (Not Hispanic or Latino)\n 'population_native_american_or_alaska_native_count' = 'P2_007N', # American Indian and Alaska Native (Not Hispanic or Latino)\n 'population_asian_count' = 'P2_008N', # Asian (Not Hispanic or Latino)\n 'population_pacific_islander_count' = 'P2_009N', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)\n 'population_other_count' = 'P2_010N', # Some other race (Not Hispanic or Latino)\n 'population_multiple_count' = 'P2_011N', # Two or more races (Not Hispanic or Latino)\n 'population_total_count' = 'P2_001N'\n)\n```\n:::\n\n\nThen, we can create an object that we can use to filter our request to the census API so that it will only return census units that overlap with our target areas (the object will be passed to the `filter_by` argument of the `get_decennial` function below). Note that this isn't strictly necessary (you could also apply the filter after making the API request), but may helpful to speed the query and reduce memory usage, especially in the case of large queries.\n\n::: callout-note\nAt the time of this writing, the `filter_by` argument of the tidycensus `get_decennial` function is fairly new, and not yet included in the official documentation.\n\nAlso, the `filter_by` argument is optional, and only appears to accept a simple features (sf) object with a single row / feature (e.g., a single water system), and will not accept an sf object with multiple rows / features. The process below attempts to work around this constraint by joining all of the selected water systems into a single multi-part polygon (i.e., an sf object with a single row). However, if you only want to retrieve data for census units that overlap a single target area (e.g., a single water system), you can skip this step.\n:::\n\n\n::: {.cell}\n\n```{#lst-filter_obj .r .cell-code lst-cap=\"Create object for filtering the API query\"}\nwater_system_numberems_filter <- water_systems_sac %>% \n st_union() %>% \n st_as_sf()\n```\n:::\n\n\nFinally, we can make the data request, using the `get_decennial` function, which accepts several arguments that specify exactly what data to return.\n\nFor this example we're getting data at the 'Block' level (with the `geography = 'block'` argument) for the demographic variables defined above in the `census_vars_decennial` object (which is passed to the `variables` argument). As noted above, block-level data is the most granular level of spatial data available, and should provide the best results when estimating demographics for areas whose boundaries don't align with census unit boundaries. However, depending on the use case, it may require too much time and computational resources to use the most granular spatial data, and may not be necessary to obtain a reasonable estimate. Also, keep in mind that block-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like block groups or tracts).\n\nIn addition to the tabular data associated with the demographic variables in our list, we'll also get the spatial data – i.e., the boundaries of the census blocks – by setting the `geometry = TRUE` argument. When we do this, the tabular demographic data is pre-joined to the spatial data, so the API request returns a single dataset with both the spatial and attribute (demographic) data combined.\n\n::: callout-note\nThe `tidycensus` package generally returns the Census Bureau's [cartographic boundary shapefiles](https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html) by default (as opposed to the [core TIGER/Line shapefiles](https://www.census.gov/geo/maps-data/data/tiger-line.html), which is the default format returned by the `tigris` R package). The default cartographic boundary shapefiles are pre-clipped to the US coastline, and are smaller/faster to process (alternatively you can use `cb = FALSE` to get the core TIGER/Line data) (see [here](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html#better-cartography-with-spatial-overlay)). So the default spatial data returned by `tidycensus` may be somewhat different than the default spatial data returned by the `tigris` package, but in general I find it's best to use the default `tidycensus` spatial data.\n:::\n\n::: callout-warning\nAt the block level, it appears that `tidycensus` only returns the more detailed core TIGER/Line shapefiles (i.e., they are identical to the default block-level geographic data returned by `tigris`). In some cases, that can create minor inconsistencies when working with both blocks and block groups and using the default geographies.\n:::\n\nWe also narrow down the search parameters geographically by specifying the state (with `state = 'CA'`) and counties (`county = counties_list`) we're seeking data for.\n\n::: callout-note\nSupplying a list of counties may not be strictly necessary, especially in cases where you supply the optional `filter_by` argument. However, especially when working with granular data like blocks, supplying the county argument seems to greatly speed the API request.\n:::\n\nAlso, while by default the `tidycensus` package returns data in long/tidy format, we're getting the data in wide format for this example (by specifying `output = 'wide'`) because it'll be easier to work with for the interpolation method described below to estimate demographics for non-census geographies.\n\n\n::: {.cell}\n\n```{#lst-get_decennial .r .cell-code lst-cap=\"Retrieve decennial census data\"}\n# get census data\ncensus_data_decennial <- get_decennial(geography = 'block', # can be 'block', 'block group', 'tract', 'county', etc.\n state = 'CA', \n county = counties_list,\n filter_by = water_system_numberems_filter,\n year = decennial_year,\n variables = census_vars_decennial,\n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) # convert to common coordinate system\n```\n:::\n\n\nThe output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the population of each racial/ethnic group is reported in a separate column. Here's a view of the contents and structure of the Decennial Census data that's returned:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_decennial)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 17,745\nColumns: 12\n$ GEOID \"060670019003011\", \"…\n$ NAME \"Block 3011, Block G…\n$ population_hispanic_or_latino_count 4, 6, 8, 11, 1, 14, …\n$ population_white_count 20, 4, 167, 70, 86, …\n$ population_black_or_african_american_count 2, 2, 0, 8, 9, 18, 0…\n$ population_native_american_or_alaska_native_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_asian_count 19, 5, 2, 1, 23, 8, …\n$ population_pacific_islander_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_other_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_multiple_count 8, 3, 4, 10, 5, 10, …\n$ population_total_count 53, 20, 181, 100, 12…\n$ geometry POLYGON ((-1…\n```\n\n\n:::\n:::\n\n\n### American Community Survey (ACS) {#sec-census-access-acs}\n\nTo get data from the ACS, you can use the `get_acs()` function, which is very similar to the `get_decennial()` function used above. As of this writing, the most recent version of the 5-year ACS data available is the 2018-2022 ACS, and we can set that as a variable below (which makes it easier to update this document in future years).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# set year\nacs_year <- 2022\n```\n:::\n\n\nHowever, since the ACS data contains data on a much broader set of socio-economic metrics, the requested data includes a greatly expanded list of variables, defined in the `census_vars_acs` object (see @sec-census-variables for more information about how to discover variables of interest and find their associated codes). As above, we can provide descriptive names associated with each variable code, which makes the data easier to work with later, but isn't strictly necessary (i.e., you could just supply the variable codes alone).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# define variables to pull from the ACS\ncensus_vars_acs <- c(\n # --- population variables ---\n 'population_total_count' = 'B01003_001',\n 'population_hispanic_or_latino_count' = 'B03002_012', # Total Hispanic or Latino\n 'population_white_count' = 'B03002_003', # White (Not Hispanic or Latino)\n 'population_black_or_african_american_count' = 'B03002_004', # Black or African American (Not Hispanic or Latino)\n 'population_native_american_or_alaska_native_count' = 'B03002_005', # American Indian and Alaska Native (Not Hispanic or Latino)\n 'population_asian_count' = 'B03002_006', # Asian (Not Hispanic or Latino)\n 'population_pacific_islander_count' = 'B03002_007', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)\n 'population_other_count' = 'B03002_008', # Some other race (Not Hispanic or Latino)\n 'population_multiple_count' = 'B03002_009', # Two or more races (Not Hispanic or Latino)\n \n # --- poverty variables ---\n 'poverty_total_assessed_count' = 'B17021_001', # also available from 'B17020_001' (at the tract level only). Total population for whom poverty status is determined. Poverty status was determined for all people except institutionalized people, people in military group quarters, people in college dormitories, and unrelated individuals under 15 years old. These groups were excluded from the numerator and denominator when calculating poverty rates.\n 'poverty_below_level_count' = 'B17021_002', # also available from 'B17020_002' (at the tract level only). Population whose income in the past 12 months is below federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.\n 'poverty_above_level_count' = 'B17021_019', # also available from 'B17020_010' (at the tract level only). Population whose income in the past 12 months is at or above federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.\n \n # --- household variables ---\n 'households_count' = 'B19001_001', # also available from variable 'B19053_001'. A household includes all the people who occupy a housing unit - a house, an apartment, a mobile home, a group of rooms, or a single room that is occupied. People not living in households are classified as living in group quarters.\n 'average_household_size' = 'B25010_001', # A measure obtained by dividing the number of people living in occupied housing units by the total number of occupied housing units. This measure is rounded to the nearest hundredth.\n \n # --- household income variables ---\n 'median_household_income' = 'B19013_001', # also available from 'B19019_001' (at the tract level only). Income in the past 12 months is the sum of wage or salary income; net self-employment income; interest, dividends, or net rental or royalty income or income from estates and trusts; Social Security or Railroad Retirement income; Supplemental Security Income (SSI); public assistance or welfare payments; retirement, survivor, or disability pensions; and all other income.\n 'households_income_below_10k_count' = 'B19001_002', # count of households with income below $10,000 \n 'households_income_10k_15k_count' = 'B19001_003', # count of households with income $10,000 to $15,000 \n 'households_income_15k_20k_count' = 'B19001_004', \n 'households_income_20k_25k_count' = 'B19001_005', \n 'households_income_25k_30k_count' = 'B19001_006', \n 'households_income_30k_35k_count' = 'B19001_007', \n 'households_income_35k_40k_count' = 'B19001_008', \n 'households_income_40k_45k_count' = 'B19001_009', \n 'households_income_45k_50k_count' = 'B19001_010', \n 'households_income_50k_60k_count' = 'B19001_011', \n 'households_income_60k_75k_count' = 'B19001_012', \n 'households_income_75k_100k_count' = 'B19001_013', \n 'households_income_100k_125k_count' = 'B19001_014', \n 'households_income_125k_150k_count' = 'B19001_015', \n 'households_income_150k_200k_count' = 'B19001_016',\n 'households_income_above_200k_count' = 'B19001_017', # count of households with income above $200,000\n\n # --- housing costs variables (% of household income) ---\n # Housing Costs as a Percentage of Household Income in the past 12 months - NOTE: THIS TABLE IS NEW FOR THE 2022 ACS, AND WON'T BE AVAILABLE FOR PREVIOUS YEARS - Table B25140 shows the count of households paying more than 30% of their income towards housing costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented). The table also shows the number of households paying more than 50% of their income toward housing costs.\n # 'households_count' = 'B25140_001', \n 'households_mortgage_total_count' = 'B25140_002',\n 'households_mortgage_housing_costs_over30pct_count' = 'B25140_003',\n 'households_mortgage_housing_costs_over50pct_count' = 'B25140_004',\n 'households_no_mortgage_total_count' = 'B25140_006',\n 'households_no_mortgage_housing_costs_over30pct_count' = 'B25140_007',\n 'households_no_mortgage_housing_costs_over50pct_count' = 'B25140_008',\n 'households_rent_total_count' = 'B25140_010',\n 'households_rent_housing_costs_over30pct_count' = 'B25140_011',\n 'households_rent_housing_costs_over50pct_count' = 'B25140_012',\n \n # --- other income / economic variables ---\n 'per_capita_income' = 'B19301_001' # note: per capita income by race (at block group level) available in table B19301I\n)\n```\n:::\n\n\nFinally, we can make the data request, using the `get_acs` function, which is very similar to the `get_decennial` function described above ( @sec-census-access-decennial). However, for this example we're getting data at the 'Block Group' level (with the `geography = 'block group'` argument), which is the most granular level of spatial data available for ACS data. But, keep in mind that block group-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like tracts). Note that the `water_system_numberems_filter` object supplied to the `filter_by` argument was created above in @lst-filter_obj.\n\n\n::: {.cell}\n\n```{#lst-get_acs .r .cell-code lst-cap=\"Retrieve ACS data\"}\n# get census data\ncensus_data_acs <- get_acs(geography = 'block group',\n state = 'CA', \n county = counties_list,\n filter_by = water_system_numberems_filter,\n year = acs_year,\n survey = 'acs5',\n variables = census_vars_acs, \n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) # convert to common coordinate system\n```\n:::\n\n\nAs above, the output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the each demographic variable is reported in a separate column. Here's a view of the contents and structure of the 2022 5-year ACS data that's returned (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_acs[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 1,054\nColumns: 21\n$ GEOID \"060670081451\", \"06…\n$ NAME \"Block Group 1; Cen…\n$ population_total_countE 1768, 1881, 1098, 2…\n$ population_total_countM 520, 585, 395, 583,…\n$ population_hispanic_or_latino_countE 38, 327, 376, 782, …\n$ population_hispanic_or_latino_countM 59, 298, 280, 315, …\n$ population_white_countE 1627, 1337, 293, 18…\n$ population_white_countM 521, 475, 191, 460,…\n$ population_black_or_african_american_countE 0, 1, 272, 26, 351,…\n$ population_black_or_african_american_countM 13, 3, 251, 38, 334…\n$ population_native_american_or_alaska_native_countE 41, 0, 0, 26, 0, 0,…\n$ population_native_american_or_alaska_native_countM 58, 13, 13, 42, 13,…\n$ population_asian_countE 45, 0, 105, 58, 144…\n$ population_asian_countM 71, 13, 116, 66, 18…\n$ population_pacific_islander_countE 0, 98, 0, 0, 27, 13…\n$ population_pacific_islander_countM 13, 98, 13, 13, 50,…\n$ population_other_countE 0, 0, 39, 0, 0, 0, …\n$ population_other_countM 13, 13, 63, 13, 13,…\n$ population_multiple_countE 17, 118, 13, 39, 15…\n$ population_multiple_countM 27, 125, 20, 57, 25…\n$ geometry POLYGON ((-…\n```\n\n\n:::\n:::\n\n\nNote that the dataset that's returned includes fields corresponding to Margin of Error (MOE) for each variable we've requested (these are the fields that end with two digits and an M – e.g., \"001M\"), since, as noted above in @sec-census-datasets , the ACS is based on a sample of the population and reports estimated values.\n\n::: callout-tip\nIt is possible to calculate MOEs for derived estimates – e.g., when aggregating groups of census units – and in many cases it may be worthwhile to do that to provide extra context to the data. However, it may be difficult to do for more complex aggregations, such as the areal interpolation shown below. For guidance on how calculate MOEs for some types of derived estimates, see [this document](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf).\n\n`tidycensus` also has functions for calculating derives margins of error based on Census-supplied formulas, including [`moe_sum()`](https://walker-data.com/tidycensus/reference/moe_sum.html), [`moe_product()`](https://walker-data.com/tidycensus/reference/moe_product.html), [`moe_ratio()`](https://walker-data.com/tidycensus/reference/moe_ratio.html), and [`moe_prop()`](https://walker-data.com/tidycensus/reference/moe_prop.html).\n:::\n\nBecause we won't be incorporating those MOEs into the analysis below, we can drop them for this example, then clean up the field names.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# drop MOE fields\ncensus_data_acs <- census_data_acs %>% \n select(-matches('M$')) # the $ specifies \"ends with\"\n\n# clean names\nnames(census_data_acs) <- names(census_data_acs) %>% \n str_remove('E$') %>% # remove 'E' (estimate) from field names\n str_replace('NAM', 'NAME') # add 'E' back to NAME field\n```\n:::\n\n\nHere's a view of the contents and structure of the revised 2022 5-year ACS dataset (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_acs[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 1,054\nColumns: 21\n$ GEOID \"060670081451\", \"060…\n$ NAME \"Block Group 1; Cens…\n$ population_total_count 1768, 1881, 1098, 27…\n$ population_hispanic_or_latino_count 38, 327, 376, 782, 3…\n$ population_white_count 1627, 1337, 293, 181…\n$ population_black_or_african_american_count 0, 1, 272, 26, 351, …\n$ population_native_american_or_alaska_native_count 41, 0, 0, 26, 0, 0, …\n$ population_asian_count 45, 0, 105, 58, 144,…\n$ population_pacific_islander_count 0, 98, 0, 0, 27, 13,…\n$ population_other_count 0, 0, 39, 0, 0, 0, 0…\n$ population_multiple_count 17, 118, 13, 39, 15,…\n$ poverty_total_assessed_count 1768, 1847, 1098, 27…\n$ poverty_below_level_count 101, 328, 272, 116, …\n$ poverty_above_level_count 1667, 1519, 826, 263…\n$ households_count 680, 718, 405, 905, …\n$ average_household_size 2.59, 2.62, 2.71, 2.…\n$ median_household_income 123500, 66768, 56216…\n$ households_income_below_10k_count 18, 47, 10, 22, 6, 1…\n$ households_income_10k_15k_count 0, 0, 24, 0, 15, 231…\n$ households_income_15k_20k_count 0, 13, 18, 0, 51, 12…\n$ geometry POLYGON ((-1…\n```\n\n\n:::\n:::\n\n\nFor further analysis, we may want to get the statewide data as a baseline for comparison (this could also be done for other scales, like the county level). We can use a similar process to get that data and clean/format it to match the more detailed data obtained above. Note that in this case we're also using the 5-year ACS (even though the 1-year ACS is also available at the statewide level, and would provide more up-to-date data) so that the statewide data will be directly comparable to the block group level data obtained above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_acs_state <- get_acs(geography = 'state',\n state = 'CA', \n year = acs_year,\n survey = 'acs5',\n variables = census_vars_acs, \n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) %>% # convert to common coordinate system\n select(-matches('M$')) %>% # the $ specifies \"ends with\"\n # clean names (note this is a little different than the way we renamed fields above, either works)\n rename_with(.fn = ~ str_remove(., # remove 'E' (estimate) from field names\n pattern = 'E$')) %>% \n rename_with(.fn = ~ str_replace(., # add 'E' back to NAME field\n pattern = 'NAM', \n replacement = 'NAME'))\n```\n:::\n\n\n### Plot Census & Supplier Data {#sec-census-plot}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsystem_plot <- 'SACRAMENTO SUBURBAN WATER DISTRICT'\n```\n:::\n\n\n@fig-suppliers-census-map shows the 2022 5-year ACS census units that overlap with one of the water systems (Sacramento Suburban Water District) that we'll compute demographics for below (plotting the census units that overlap all systems tends to be slow in this format).\n\n::: {#fig-suppliers-census-map}\n\n::: {.cell}\n\n```{.r .cell-code}\n# label: fig-suppliers-census-map\n# fig-cap: \"Water system (filled polygon) and boundaries of census units (light blue) that will be used to estimate water system demographics.\"\n\nmapview(water_systems_sac %>% \n filter(water_system_name == system_plot), \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE) +\n mapview(census_data_acs %>% \n st_filter(water_systems_sac %>% \n filter(water_system_name == system_plot)), \n alpha.regions = 0, \n color = 'cyan', \n lwd = 1.3, label = 'NAME', \n layer.name = 'ACS Data', \n legend = FALSE) # zcol = 'NAME'\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\nWater system Sacramento Suburban Water District (filled polygon) and boundaries of census units (light blue) that will be used to estimate water system demographics.\n:::\n\n## Compute Water System Demographics {#sec-estimate-demographics}\n\nNow we can perform the calculations to estimate demographic characteristics for our *target* areas (water system service boundaries in the Sacramento County area) from our *source* demographic dataset (the census data we obtained above). For this example, we'll use the 2022 5-year ACS data that we retrieved above (which is saved in the `census_data_acs` variable) as our source of demographic data, and we'll estimate the following for each water system's service area:\n\n- Population of each racial/ethnic group (using the racial/ethnic categories defined in the census dataset), and each racial/ethnic group's portion of the total service area population\n- Socio-economic variables like poverty rate, median household income, income distributions, and per capita income\n\nThere are multiple ways this estimation can be done. For this example, we'll employ a three step strategy:\n\n1. Estimate values for count-based variables (typically referred to as 'extensive' data types) – e.g., total population, popultion by race/ethnicity, population above / below poverty rate, households by income bracket – for overlapping census unit, using areal interpolation. This is essentially an area weighted average, which estimates how much of each *source* unit's (census unit) count applies to the *target* area (a given water service area), based on the portion of its area that overlaps that target area – for more information about the process, see [this documentation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html) from the `areal` R package. For example, for a census unit that partially overlaps a service area, only a fraction of its count for a given variable will be applied to that service area; for a census unit that completely overlaps a service area, the full count for that variable will be applied to the service area.\n\n The major simplifying assumption of this approach is that the population or count-based variable of interest are evenly distributed within each unit in the *source* data. For example, in this case we're assuming that population (including the total population and the population of each racial/ethic group), households of each income bracket, populations above / below the poverty rate, etc. are evenly distributed within each census block group.\n\n::: callout-tip\nWhile this section uses the block group-level count data from the 5-year ACS, there may be cases where it could be useful or necessary to use more granular block-level population data from the decennial census to estimate population densities and distributions within larger census units, like block groups and tracts. This could especially be the case when estimating characteristics for small areas in rural environments. See @sec-detailed-pop-estimates and/or @sec-small-area-estimates for more information.\n:::\n\n2. Using the estimated count data (populations, households, etc), compute weighted values for variables that describe those populations, using the associated count data as a weighting factor (e.g., population-weighted values for population based data, or household-weighted values for household-based data) – these variables are typically referred to as 'intensive' data types.\n\n::: callout-tip\nAlthough it's possible to use areal interpolation to aggregate these variables as well, the multi-step approach described here can be useful because we know (from the population / household count data) that population densities differ between census units. Since we have a reasonable estimate of the count data (population, households, etc) within each census unit, using a population or household weighted average likely will yield more accurate results than a simple area-weighted average for these variables. For example, for per capita income, we can use the estimated population counts to produce a population weighted average per capita income (rather than an area weighted average per capita income, which is likely less meaningful as it over-weights large census areas with lower population densities). Areal interpolation may be more useful for cases where we generally have no other information about how density varies between the source polygons (unless significantly more effort is invested, such as looking at aerial imagery data)\n:::\n\n3. Aggregate interpolated values at the water system level.\n\n### Prepare Census Data\n\nNote that we already transformed the 2022 5-year ACS dataset into the common projected coordinate reference system used for this example immediately after we downloaded the data using the `get_acs()` function (see @lst-get_acs). This allows us to work with the water system data and the census data together in a common coordinate system.\n\nBefore calculating demographics for the *target* areas, we can do a bit of additional transformation to prepare the census data if needed. For example, we can combine the 'other' and 'multiple' racial/ethnic groupings into one 'other or multiple' racial/ethnic group.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## combine other and multiple\ncensus_data_acs <- census_data_acs %>% \n mutate('population_other_or_multiple_count' = population_other_count + population_multiple_count, \n .after = population_pacific_islander_count) %>% \n select(-c(population_other_count, population_multiple_count))\n```\n:::\n\n\nWe can also calculate the poverty rate for each census unit (which may be useful for presenting results later).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_acs <- census_data_acs %>% \n mutate(poverty_rate_pct_calc_census_unit = case_when(\n poverty_total_assessed_count == 0 ~ 0,\n .default = 100 * poverty_below_level_count / poverty_total_assessed_count\n ), \n .after = poverty_above_level_count)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# We can also drop census units with zero population, since they won't contribute anything to our calculations.\n\n## drop census units with zero population\n# census_data_acs <- census_data_acs %>% \n# filter(population_total > 0)\n```\n:::\n\n\n### Interpolation Step 1: Areal Interpolation (for Count Variables) {#sec-areal-interp}\n\nThere are a couple of ways to implement the areal interpolation method. The example below 'manually' implements the process using functions from the `sf` package, for reasons described below. However, note that there are R packages which make it possible to perform areal interpolation with a single function - for example, the `sf` package's [`st_interpolate_aw`](https://r-spatial.github.io/sf/reference/interpolate_aw.html) function and the [`areal`](https://chris-prener.github.io/areal/) package's [`aw_interpolate`](https://chris-prener.github.io/areal/reference/aw_interpolate.html) function. This example uses a more 'manual' approach because this makes it possible to use the multi-step process described above, and also produces useful intermediate calculated data for mapping and visualization. However, we can use the single-function approach to double check our implementation of the areal interpolation approach for the count data (see @sec-check-areal-interp).\n\nFirst, we clip the census data to the water system boundaries:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_clip <- census_data_acs %>% \n mutate(census_unit_area = st_area(.)) %>% \n st_intersection(water_systems_sac) %>% \n mutate(clipped_area = st_area(.)) %>% \n mutate(areal_weight_factor = drop_units(clipped_area / census_unit_area))\n```\n:::\n\n\n@fig-map-clipped-polygons shows a plot of the census units clipped to the Sacramento Suburban Water District water system, along with the original/complete census units. Note that you can toggle layers on and off (and change their order of appearance) using the layers button in the upper left part of the map (below the zoom buttons).\n\n::: {#fig-map-clipped-polygons}\n\n::: {.cell}\n\n```{.r .cell-code}\nmapview(water_systems_sac %>% \n filter(water_system_name == system_plot), \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE) + \n mapview(census_data_acs %>% \n st_filter(water_systems_sac %>% \n filter(water_system_name == system_plot)), \n alpha.regions = 0.15, \n col.regions = 'grey', \n color = 'black', \n lwd = 1, \n label = 'NAME', \n layer.name = 'ACS Data Full', \n legend = FALSE) +\n mapview(census_data_clip %>% \n filter(water_system_name == system_plot),\n alpha.regions = 0, \n color = 'cyan', \n lwd = 1.3, \n label = 'NAME', \n layer.name = 'ACS Data Clipped', \n legend = FALSE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\nWater system Sacramento Suburban Water District (filled polygon), boundaries of overalpping census units (grey), and clipped portions of census units (light blue) that will be used to estimate water system demographics.\n:::\n\nNext, we can compute the area-weighted counts for the portions of census units that overlap each water system boundary:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate <- census_data_clip %>% \n mutate(\n across(\n .cols = ends_with('_count'),\n .fns = ~ .x * areal_weight_factor\n )) \n```\n:::\n\n\nAs noted above, it's also possible to use pre-built functions from several R packages to perform areal interpolation in a single step. Since we're using a three-step process, which also implements population weighted averaging for some variables, we're not using those functions directly in this example. However, they can be a useful check to validate our computed count data, but only after we aggregate our data at the system level – see @sec-check-areal-interp for more details.\n\n### Interpolation Step 2: Compute Population Weighted Values (Intensive Variables) {#sec-pop-interp}\n\nCompute population weighted values\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate <- census_data_interpolate %>% \n mutate(average_household_size_weighted = average_household_size * households_count,\n median_household_income_weighted = median_household_income * households_count,\n per_capita_income_weighted = per_capita_income * population_total_count)\n```\n:::\n\n\n::: callout-caution\nTo calculate an aggregated value for a variable like median household income, which depends on the distribution of the underling data, it may be worth considering whether a weighed average value is an appropriate measure. In some cases, it may be more appropriate to use the counts in each income bracket to estimate a median income, and/or present the income distribution rather than a single value.\n\nFor a discussion of the problem and a proposed solution, see [this document](https://www.documentcloud.org/documents/6165014-How-to-Recalculate-a-Median.html#document/p1).\n:::\n\n### Interpolation Step 3: Aggregate by Water System\n\nNext, we need to combine the weighted values calculated above to produce the estimates for each water system, and can also use those combined values to compute some additional metrics for each system (like rates, income distributions, etc.).\n\n#### Combine Results by Water System\n\nFirst, combine the results by summing all of the count-based variables (derived from areal interpolation), and calculating weighted averages for all variables computed in step 2 above. Note that we have to first calculate the denominator for each variable calculated with population weighted interpolation, because some of those variables contain missing values for records where the denominator is present (and if we don't remove the missing values, we get an `NA` for any water system that contains a block group with a missing value for that variable). For example, there are block groups where the median household income is missing, but the total household count is available for that block group – in that case, the weighted average should not include the households in that block group in the denominator; otherwise, the true value will be underestimated.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- census_data_interpolate %>% \n group_by(water_system_name) %>% \n mutate(\n average_household_size_denominator = if_else(is.na(average_household_size), 0, households_count),\n median_household_income_denominator = if_else(is.na(median_household_income), 0, households_count),\n per_capita_income_denominator = if_else(is.na(per_capita_income), 0, population_total_count)\n ) %>% \n summarize(\n across(\n .cols = ends_with('_count'),\n .fns = ~ sum(.x)\n ),\n average_household_size_hh_weighted = \n sum(average_household_size_weighted, na.rm = TRUE) / \n sum(average_household_size_denominator),\n median_household_income_hh_weighted = \n sum(median_household_income_weighted, na.rm = TRUE) /\n sum(median_household_income_denominator),\n per_capita_income_pop_weighted = \n sum(per_capita_income_weighted, na.rm = TRUE) / \n sum(per_capita_income_denominator)\n ) %>% \n ungroup()\n```\n:::\n\n\n#### Check - Count Variables Estimated with Areal Interpolation {#sec-check-areal-interp}\n\nAs noted above, it's also possible to use pre-built functions for areal interpolation. This section demonstrates those functions and uses them as a check of our computed count data.\n\nFrom the `sf` package, we can use the `st_interpolate_aw` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# NOTE: it's only necessary to check the estimated values for one variable - \n# this just checks the total estimated population\n\n# sf package\ncheck_sf <- st_interpolate_aw(x = census_data_acs %>% \n select(population_total_count),\n to = water_systems_sac,\n extensive = TRUE) %>% \n bind_cols(water_systems_sac %>% st_drop_geometry)\n\n# check - should be TRUE if results are equivalent\nall(check_sf %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5) ==\n water_system_demographics %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\nFrom the `areal` package, we can use the `aw_interpolate` function. Note that there are some settings that you may need to modify in the `aw_interpolate` function depending on the type of analysis you're doing. In particular, for more information about the `weight` argument – which can be either `sum` or `total` – see [this section of the documentation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html#calculating-weights-for-extensive-interpolations). For more information about extensive versus intensive interpolations, see [this section of the documenation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html#extensive-and-intensive-interpolations) (as noted above, the method applied here avoids using areal interpolation to calculate intensive variables, because area may not be a good metric for determining how to weight those variables, considering that we can estimate associated counts for populations/households/etc.).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# NOTE: it's only necessary to check the estimated values for one variable - \n# this just checks the total estimated population\n\n# areal package\ncheck_areal <- aw_interpolate(water_systems_sac,\n tid = water_system_name,\n source = census_data_acs,\n sid = GEOID,\n weight = 'total',\n extensive = c('population_total_count'))\n\n# check - should be TRUE if results are equivalent\nall(check_areal %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5) ==\n water_system_demographics %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\n#### Clean & Format Summarized Water System Demographic Data {#sec-results-clean}\n\nWe could stop here, and save the dataset containing the results to an output file (done below - see @sec-results-save). But, it may be useful to do some additional computations and re-formatting before saving the dataset. For example, in this case it may be useful to calculate the racial/ethnic breakdown of each system's population as percentages of the total population (in addition to the total counts computed above), and calculate other rates / distributions.\n\nFirst we can add fields with each racial/ethnic group's estimated percent of the total population within each water system's service area:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>%\n mutate(\n across(\n .cols = starts_with('population_'),\n .fns = ~ round(.x / population_total_count * 100, 2),\n .names = \"{str_replace(.col, '_count', '_percent')}\"\n ),\n .after = population_other_or_multiple_count) %>% \n select(-population_total_percent) # this always equals 1, not needed\n```\n:::\n\n\nWe can also calculate the estimated poverty rate for each water system's service area.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(poverty_rate_percent = case_when(\n poverty_total_assessed_count == 0 ~ 0,\n .default = 100 * poverty_below_level_count / poverty_total_assessed_count\n ), \n .after = poverty_above_level_count)\n```\n:::\n\n\nAnd compute income brackets in 25k increments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(households_income_0_25k_count = \n households_income_below_10k_count + \n households_income_10k_15k_count + \n households_income_15k_20k_count +\n households_income_20k_25k_count,\n households_income_25k_50k_count =\n households_income_25k_30k_count + \n households_income_30k_35k_count +\n households_income_35k_40k_count +\n households_income_40k_45k_count +\n households_income_45k_50k_count,\n households_income_50k_75k_count =\n households_income_50k_60k_count +\n households_income_60k_75k_count,\n .after = households_income_above_200k_count\n ) # note - above 75k is already in 25k increments\n```\n:::\n\n\nAnd compute income brackets in 50k increments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(households_income_0_50k_count = \n households_income_0_25k_count + \n households_income_25k_50k_count,\n households_income_50k_100k_count =\n households_income_50k_75k_count +\n households_income_75k_100k_count,\n households_income_100k_150k_count =\n households_income_100k_125k_count +\n households_income_125k_150k_count,\n .after = households_income_50k_75k_count\n ) # above 150k is already in 50k increments\n```\n:::\n\n\nAnd compute grouped median household income:\n\n\n::: {.cell}\n\n:::\n\n\nAnd compute \\# and % of households below income thresholds:\n\n\n::: {.cell}\n\n:::\n\n\nAnd, compute the portion of households paying more than 30% / 50% of their income toward housing costs:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# portion of households paying more than 30% / 50% of income on housing\nwater_system_demographics <- water_system_demographics %>%\n mutate(households_all_housing_costs_over30pct_percent = \n 100 * (households_mortgage_housing_costs_over30pct_count + \n households_no_mortgage_housing_costs_over30pct_count +\n households_rent_housing_costs_over30pct_count) / \n households_count, \n .after = households_rent_housing_costs_over50pct_count) %>% \n mutate(households_all_housing_costs_over50pct_percent = \n 100 * (households_mortgage_housing_costs_over50pct_count + \n households_no_mortgage_housing_costs_over50pct_count +\n households_rent_housing_costs_over50pct_count) / \n households_count,\n .after = households_all_housing_costs_over30pct_percent)\n```\n:::\n\n\nFinally, we can round the estimated values to appropriate levels of precision:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>%\n mutate(\n across(\n .cols = ends_with('_count'),\n .fns = ~ round(.x, 0)\n )) %>%\n mutate(\n across(\n .cols = ends_with('_percent'),\n .fns = ~ round(.x, 2)\n ))\n```\n:::\n\n\nWe now have a dataset with the selected metrics from the census data (*source* data) estimated for each of the water system service areas (*target* geographic features). Here's a view of the contents and structure of the re-formatted dataset (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_system_demographics[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 21\n$ water_system_name \"B & W RESORT MARI…\n$ population_total_count 0, 22603, 33120, 1…\n$ population_hispanic_or_latino_count 0, 10939, 5245, 34…\n$ population_white_count 0, 3504, 19456, 23…\n$ population_black_or_african_american_count 0, 2663, 3199, 197…\n$ population_native_american_or_alaska_native_count 0, 121, 113, 70, 0…\n$ population_asian_count 0, 4075, 2947, 108…\n$ population_pacific_islander_count 0, 240, 77, 59, 0,…\n$ population_other_or_multiple_count 0, 1060, 2082, 110…\n$ population_hispanic_or_latino_percent 41.43, 48.40, 15.8…\n$ population_white_percent 52.47, 15.50, 58.7…\n$ population_black_or_african_american_percent 0.00, 11.78, 9.66,…\n$ population_native_american_or_alaska_native_percent 0.00, 0.54, 0.34, …\n$ population_asian_percent 4.55, 18.03, 8.90,…\n$ population_pacific_islander_percent 0.00, 1.06, 0.23, …\n$ population_other_or_multiple_percent 1.56, 4.69, 6.29, …\n$ poverty_total_assessed_count 0, 22556, 33034, 1…\n$ poverty_below_level_count 0, 6010, 3389, 313…\n$ poverty_above_level_count 0, 16546, 29645, 6…\n$ poverty_rate_percent 22.60, 26.64, 10.2…\n$ geometry POLYGON ((…\n```\n\n\n:::\n:::\n\n\n@tbl-water-sys-demographics-rev provides a complete view of the cleaned and re-formatted dataset. These results are saved locally in tabular and spatial format in @sec-results-save.\n\n\n::: {#tbl-water-sys-demographics-rev .cell .tbl-cap-location-top tbl-cap='Water System Demographics'}\n\n```{.r .cell-code}\nwater_system_demographics %>%\n kable(caption = 'A Caption') %>%\n scroll_box(height = \"400px\")\n```\n\n::: {.cell-output-display}\n`````{=html}\n
\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
A Caption
water_system_name population_total_count population_hispanic_or_latino_count population_white_count population_black_or_african_american_count population_native_american_or_alaska_native_count population_asian_count population_pacific_islander_count population_other_or_multiple_count population_hispanic_or_latino_percent population_white_percent population_black_or_african_american_percent population_native_american_or_alaska_native_percent population_asian_percent population_pacific_islander_percent population_other_or_multiple_percent poverty_total_assessed_count poverty_below_level_count poverty_above_level_count poverty_rate_percent households_count households_income_below_10k_count households_income_10k_15k_count households_income_15k_20k_count households_income_20k_25k_count households_income_25k_30k_count households_income_30k_35k_count households_income_35k_40k_count households_income_40k_45k_count households_income_45k_50k_count households_income_50k_60k_count households_income_60k_75k_count households_income_75k_100k_count households_income_100k_125k_count households_income_125k_150k_count households_income_150k_200k_count households_income_above_200k_count households_income_0_25k_count households_income_25k_50k_count households_income_50k_75k_count households_income_0_50k_count households_income_50k_100k_count households_income_100k_150k_count households_mortgage_total_count households_mortgage_housing_costs_over30pct_count households_mortgage_housing_costs_over50pct_count households_no_mortgage_total_count households_no_mortgage_housing_costs_over30pct_count households_no_mortgage_housing_costs_over50pct_count households_rent_total_count households_rent_housing_costs_over30pct_count households_rent_housing_costs_over50pct_count households_all_housing_costs_over30pct_percent households_all_housing_costs_over50pct_percent average_household_size_hh_weighted median_household_income_hh_weighted per_capita_income_pop_weighted geometry
B & W RESORT MARINA 0 0 0 0 0 0 0 0 41.43 52.47 0.00 0.00 4.55 0.00 1.56 0 0 0 22.60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40.53 21.84 2.030000 51977.00 40522.00 POLYGON ((-138282.2 13643.2...
CAL AM FRUITRIDGE VISTA 22603 10939 3504 2663 121 4075 240 1060 48.40 15.50 11.78 0.54 18.03 1.06 4.69 22556 6010 16546 26.64 6900 354 339 521 263 367 302 359 355 565 692 876 784 459 235 287 141 1477 1948 1569 3425 2352 694 1620 745 345 1236 95 58 4044 2131 1059 43.06 21.18 3.257806 53040.44 20519.57 POLYGON ((-127001.4 54266.8...
CALAM - ANTELOPE 33120 5245 19456 3199 113 2947 77 2082 15.84 58.74 9.66 0.34 8.90 0.23 6.29 33034 3389 29645 10.26 10529 315 184 101 122 116 469 248 368 449 737 1077 1669 1501 1077 1158 937 723 1650 1814 2373 3483 2578 5544 1861 621 1747 184 106 3238 1678 649 35.36 13.07 3.134530 93741.55 34660.44 POLYGON ((-120906.3 77326.5...
CALAM - ARDEN 10112 3433 2392 1977 70 1082 59 1100 33.95 23.65 19.55 0.69 10.70 0.58 10.87 10034 3130 6904 31.19 3823 201 259 239 167 319 190 142 236 207 440 394 535 228 148 62 58 866 1093 834 1959 1368 376 265 84 46 133 8 3 3426 2124 1170 57.97 31.87 2.623643 49624.62 22770.82 POLYGON ((-123052 64046.06,...
CALAM - ISLETON 34 14 17 0 0 2 0 1 42.06 51.14 0.00 0.00 4.55 0.00 2.25 34 7 27 20.89 16 1 1 0 1 1 0 1 1 0 2 1 1 3 1 0 1 4 3 3 6 4 4 6 4 1 7 2 2 4 1 1 39.45 20.68 2.078994 57361.76 40672.21 POLYGON ((-138730.9 17272.8...
CALAM - LINCOLN OAKS 42916 9056 26529 1486 143 2706 288 2708 21.10 61.82 3.46 0.33 6.31 0.67 6.31 42823 4074 38749 9.51 15621 740 375 308 622 488 616 585 629 645 1035 1641 2442 1889 1272 1555 778 2046 2964 2675 5010 5118 3161 7390 2671 919 3332 503 298 4900 2523 1302 36.46 16.13 2.730281 82035.52 33728.94 POLYGON ((-117495.2 73240.4...
CALAM - PARKWAY 58635 18665 8921 6965 21 19228 1386 3449 31.83 15.21 11.88 0.04 32.79 2.36 5.88 58434 9804 48630 16.78 17667 1081 753 514 713 694 640 713 700 727 1145 1918 2490 1634 1532 1546 865 3061 3475 3064 6536 5554 3166 7163 2719 1049 3418 647 383 7086 3517 1917 38.96 18.96 3.284608 72938.51 26938.14 POLYGON ((-124522.5 52428.5...
CALAM - SUBURBAN ROSEMONT 57897 13791 25062 7725 91 6905 380 3942 23.82 43.29 13.34 0.16 11.93 0.66 6.81 57661 8374 49287 14.52 21045 1156 612 472 744 653 568 582 874 628 1289 2508 3438 2595 1594 1671 1661 2985 3305 3797 6290 7235 4189 8262 2262 730 3425 439 271 9358 4521 2320 34.31 15.78 2.726937 81229.87 34497.37 POLYGON ((-119360.4 58937.6...
CALAM - WALNUT GROVE 12 5 5 0 0 1 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 12 2 10 15.75 5 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 1 2 2 2 0 2 0 0 1 0 0 2 1 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-131705.3 26403.6...
CALIFORNIA STATE FAIR 532 78 262 91 0 48 0 52 14.68 49.25 17.13 0.00 9.10 0.00 9.85 526 152 374 28.89 285 65 13 8 5 9 14 2 0 23 29 30 35 21 11 17 3 91 48 59 140 93 32 0 0 0 0 0 0 285 177 95 62.11 33.45 1.820000 52886.00 33141.00 POLYGON ((-125611.2 65287.3...
CARMICHAEL WATER DISTRICT 39253 6192 25026 2230 68 3326 295 2116 15.78 63.76 5.68 0.17 8.47 0.75 5.39 38700 5000 33700 12.92 15937 570 534 513 472 398 607 522 684 541 996 1595 1782 1724 1200 1678 2122 2088 2751 2591 4839 4373 2924 5256 1399 669 3147 358 177 7534 4056 2068 36.48 18.29 2.405914 96967.64 46901.80 POLYGON ((-117711 65208.06,...
CITRUS HEIGHTS WATER DISTRICT 68912 12380 48148 2092 162 2875 71 3186 17.96 69.87 3.04 0.23 4.17 0.10 4.62 68581 6961 61620 10.15 25633 1012 569 446 769 665 867 841 723 1165 1875 3057 3954 2744 2332 2533 2080 2796 4261 4932 7057 8886 5075 10344 3553 1380 4293 554 286 10996 5759 2620 38.49 16.72 2.653808 82960.78 37323.17 POLYGON ((-114405.5 72735.6...
CITY OF SACRAMENTO MAIN 516189 151211 159508 62060 1249 98585 9242 34334 29.29 30.90 12.02 0.24 19.10 1.79 6.65 508800 77003 431797 15.13 194000 9540 9401 6217 6407 5804 6255 6278 6139 6729 13349 17396 26982 20453 15080 17439 20531 31564 31205 30745 62769 57728 35533 67435 21769 8217 29857 3476 1805 96708 47510 24524 37.50 17.81 2.609594 84694.02 39105.61 POLYGON ((-133314 51929.51,...
DEL PASO MANOR COUNTY WATER DI 5592 687 3967 390 15 119 31 382 12.28 70.95 6.97 0.26 2.13 0.56 6.84 5592 621 4971 11.10 2222 170 45 54 66 21 51 66 237 40 158 278 166 171 120 347 231 336 416 436 752 601 291 922 326 189 572 112 68 729 509 114 42.59 16.67 2.516895 90374.38 40254.83 POLYGON ((-120068.3 65980.9...
DELTA CROSSING MHP 0 0 0 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 0 0 0 17.42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 POLYGON ((-132498.4 40410.2...
EAST WALNUT GROVE [SWS] 3 2 2 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 3 1 3 15.75 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-132506.3 25966.4...
EDGEWATER MOBILE HOME PARK 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-153562.3 7972.28...
EL DORADO MOBILE HOME PARK 139 84 11 15 0 19 0 11 60.26 7.80 10.48 0.00 13.27 0.00 8.19 139 60 79 43.12 48 6 10 0 4 6 1 0 8 1 7 0 1 0 4 0 1 19 15 8 34 9 4 3 0 0 10 5 5 35 17 10 46.70 31.09 2.710000 29468.00 17394.00 POLYGON ((-124341.2 53660.1...
EL DORADO WEST MHP 148 89 12 16 0 20 0 12 60.26 7.80 10.48 0.00 13.27 0.00 8.19 147 63 84 43.12 51 6 10 0 4 6 1 0 8 2 8 0 1 0 5 0 1 20 16 8 37 9 5 3 0 0 10 6 6 38 18 10 46.70 31.09 2.710000 29468.00 17394.00 POLYGON ((-124532.3 53662.9...
ELEVEN OAKS MOBILE HOME COMMUNITY 233 45 94 56 0 37 0 1 19.27 40.19 24.01 0.00 15.91 0.00 0.62 233 87 146 37.48 71 7 2 3 6 10 2 1 1 3 1 13 17 3 0 3 0 17 17 15 34 32 3 8 3 1 21 1 1 42 29 23 46.85 35.36 3.280000 60521.00 18213.00 POLYGON ((-119819.8 71950.9...
ELK GROVE WATER SERVICE 42647 7656 19550 3209 70 8939 388 2835 17.95 45.84 7.53 0.16 20.96 0.91 6.65 42258 3264 38994 7.72 13239 430 202 253 224 328 102 345 292 245 667 1117 1441 1470 1386 1907 2832 1108 1311 1784 2420 3225 2856 7552 1903 628 2861 283 113 2826 1595 864 28.55 12.12 3.179068 122771.00 43429.03 POLYGON ((-118730.1 42496.7...
FAIR OAKS WATER DISTRICT 36003 4655 27050 708 94 1372 12 2113 12.93 75.13 1.97 0.26 3.81 0.03 5.87 35775 2852 32923 7.97 14233 546 332 113 229 208 391 206 469 293 804 1064 2214 1447 1568 1875 2474 1220 1568 1868 2788 4082 3016 7090 1872 845 3092 261 108 4051 1844 768 27.94 12.09 2.480217 107985.74 54435.01 POLYGON ((-112317.5 69577.6...
FLORIN COUNTY WATER DISTRICT 9951 2963 1548 1394 7 2743 866 430 29.78 15.56 14.01 0.07 27.56 8.70 4.32 9835 1285 8550 13.06 2755 84 125 53 154 103 46 86 176 224 258 223 432 297 215 143 137 417 635 481 1051 913 512 981 426 90 675 49 28 1100 476 260 34.48 13.70 3.573005 67048.12 24517.64 POLYGON ((-122791.9 52602.2...
FOLSOM STATE PRISON 3536 1257 652 1390 57 70 34 77 35.55 18.43 39.31 1.60 1.97 0.96 2.17 29 1 28 2.20 23 0 0 0 0 0 0 0 0 0 0 0 0 4 4 12 1 0 0 0 0 1 8 3 1 0 0 0 0 19 0 0 4.67 0.53 2.726311 161047.22 2271.22 POLYGON ((-99838.11 75350.0...
FOLSOM, CITY OF - ASHLAND 3845 318 2934 43 1 125 1 423 8.26 76.32 1.12 0.03 3.26 0.02 10.99 3780 143 3637 3.79 1800 44 17 104 43 34 209 103 74 43 43 158 248 132 80 123 345 208 463 201 670 449 212 594 164 90 847 368 82 358 196 74 40.42 13.70 2.087286 76810.17 56773.97 POLYGON ((-102605.9 74922.1...
FOLSOM, CITY OF - MAIN 62462 8433 35222 1693 105 12934 177 3897 13.50 56.39 2.71 0.17 20.71 0.28 6.24 62115 3405 58710 5.48 22409 807 218 390 477 418 283 329 373 451 670 1181 2255 2382 1747 4083 6344 1892 1855 1851 3747 4106 4129 11491 2728 1179 3590 237 146 7328 3010 1321 26.66 11.81 2.769356 141856.37 58469.35 POLYGON ((-101870.6 66094.5...
FREEPORT MARINA 3 2 1 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 3 1 3 17.42 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 POLYGON ((-130970.7 50553.3...
GALT, CITY OF 21490 9314 9952 520 22 872 20 789 43.34 46.31 2.42 0.10 4.06 0.09 3.67 21341 1404 19937 6.58 6988 139 168 243 210 141 342 161 347 152 550 687 807 1096 504 789 650 761 1143 1237 1904 2044 1601 3724 907 523 1454 109 44 1809 906 414 27.52 14.05 3.048249 90632.93 33685.54 MULTIPOLYGON (((-113921.6 2...
GOLDEN STATE WATER CO - ARDEN WATER SERV 6556 1706 2887 322 0 888 11 742 26.02 44.04 4.91 0.00 13.54 0.16 11.32 6453 1626 4828 25.19 2173 19 82 19 141 53 173 34 179 37 139 351 319 132 172 141 183 262 476 490 738 809 303 728 239 123 131 0 0 1315 599 335 38.56 21.09 2.897716 66579.36 30417.36 POLYGON ((-121143.9 63698.4...
GOLDEN STATE WATER CO. - CORDOVA 48115 9009 26042 3982 229 6050 188 2615 18.72 54.13 8.28 0.48 12.57 0.39 5.43 47835 4408 43427 9.21 18022 509 482 310 496 480 437 389 469 598 1276 1692 2653 2565 1671 1948 2047 1796 2374 2968 4170 5621 4236 7380 2174 836 3506 364 201 7137 2744 1410 29.31 13.58 2.650717 96697.06 42695.41 POLYGON ((-112985.4 62375.3...
HAPPY HARBOR (SWS) 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-139842 11256.85,...
HOLIDAY MOBILE VILLAGE 46 18 7 3 0 15 0 3 38.66 15.12 7.10 0.00 32.49 0.00 6.64 46 10 36 22.33 16 2 1 0 1 0 1 5 1 0 0 2 2 1 0 0 0 4 7 2 11 4 1 2 0 0 2 1 1 12 6 4 44.88 28.55 2.860000 38491.00 16707.00 POLYGON ((-123874.7 52485.3...
HOOD WATER MAINTENCE DIST [SWS] 1 1 0 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 1 0 1 17.42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 MULTIPOLYGON (((-132506 403...
IMPERIAL MANOR MOBILEHOME COMMUNITY 209 52 129 1 0 6 0 21 24.93 61.63 0.45 0.00 2.93 0.00 10.05 209 45 164 21.48 124 4 26 18 3 0 16 7 5 6 1 4 29 0 0 0 6 51 34 5 84 34 0 9 0 0 89 37 34 27 27 22 50.97 45.07 1.680363 31831.84 32878.17 POLYGON ((-115390.2 74250.3...
KORTHS PIRATES LAIR 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-137314.9 10213.1...
LAGUNA DEL SOL INC 24 5 18 0 0 0 0 0 21.55 75.20 0.00 0.67 1.46 0.00 1.12 24 2 22 6.40 9 0 1 1 0 0 0 0 0 0 0 0 2 0 0 1 2 2 1 0 3 2 1 5 2 2 3 0 0 2 0 0 23.37 23.37 2.640000 95227.00 50793.00 POLYGON ((-104662.2 49197.3...
LAGUNA VILLAGE RV PARK 20 3 2 1 0 11 2 2 12.79 8.48 7.28 0.00 52.62 8.38 10.45 20 2 18 11.79 7 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 2 2 1 3 1 0 1 0 0 3 1 0 32.52 12.26 3.030000 84332.00 32668.00 POLYGON ((-122461.8 48066.6...
LINCOLN CHAN-HOME RANCH 4 2 2 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 4 1 3 15.75 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-136788.6 36526.1...
LOCKE WATER WORKS CO [SWS] 1 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 1 0 1 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-131952.8 27176.6...
MAGNOLIA MUTUAL WATER 1 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 1 0 1 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-137022.9 36118.9...
MC CLELLAN MHP 269 52 108 65 0 43 0 2 19.27 40.19 24.01 0.00 15.91 0.00 0.62 269 101 168 37.48 82 8 2 3 7 11 2 2 1 3 1 15 20 3 0 3 0 20 19 17 39 36 3 9 4 2 25 1 1 48 34 27 46.85 35.36 3.280000 60521.00 18213.00 POLYGON ((-119814.9 72169.0...
OLYMPIA MOBILODGE 290 70 81 18 0 101 16 3 24.12 28.03 6.30 0.00 34.95 5.53 1.08 290 68 222 23.43 114 11 0 6 10 9 3 13 0 0 10 19 8 3 12 5 5 28 25 29 53 36 14 31 22 10 51 12 10 33 9 7 37.35 23.74 2.510000 53786.00 29451.00 POLYGON ((-123342.4 53061.6...
ORANGE VALE WATER COMPANY 17387 2658 12308 241 181 633 86 1281 15.28 70.79 1.39 1.04 3.64 0.49 7.37 17288 1904 15384 11.01 6595 389 111 61 94 226 58 274 120 181 372 752 990 901 626 678 766 655 858 1123 1512 2113 1526 3246 1021 453 1686 315 185 1663 693 305 30.77 14.29 2.608348 92693.71 42509.89 POLYGON ((-108131.3 74330.4...
PLANTATION MOBILE HOME PARK 10 4 1 1 0 3 0 1 38.66 15.12 7.10 0.00 32.49 0.00 6.64 10 2 7 22.33 3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 2 1 0 1 0 0 0 0 0 2 1 1 44.88 28.55 2.860000 38491.00 16707.00 POLYGON ((-124180.4 53321.5...
RANCHO MARINA 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-138041.4 11320.9...
RANCHO MURIETA COMMUNITY SERVI 3239 661 2157 120 7 188 0 106 20.42 66.59 3.71 0.21 5.80 0.00 3.26 3239 199 3040 6.13 1402 59 42 0 6 5 18 74 27 75 44 81 88 118 204 241 319 108 199 125 307 213 323 1029 205 103 270 63 57 103 41 40 22.02 14.30 2.307704 144993.81 66451.34 POLYGON ((-92457.85 52674.7...
RIO COSUMNES CORRECTIONAL CENTER [SWS] 22 6 8 4 1 1 0 2 25.74 37.49 16.82 2.97 4.50 1.81 10.66 4 0 4 0.00 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 23.75 0.00 3.450000 115897.00 11095.00 POLYGON ((-124032.5 32206.2...
RIO LINDA/ELVERTA COMMUNITY WATER DIST 11831 2585 7595 337 17 765 21 512 21.85 64.19 2.85 0.14 6.46 0.18 4.33 11829 1619 10210 13.69 3762 177 156 67 169 56 113 116 114 118 173 297 607 492 431 416 259 569 518 470 1087 1077 922 1918 573 157 773 114 47 1070 519 340 32.07 14.49 3.123012 83603.04 33734.49 POLYGON ((-126609.8 73568.2...
RIVER'S EDGE MARINA & RESORT 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-141102.2 11867.3...
SAC CITY MOBILE HOME COMMUNITY LP 229 82 17 7 0 123 0 0 35.66 7.50 3.27 0.00 53.57 0.00 0.00 229 110 119 48.14 89 11 16 9 10 8 0 0 4 2 7 1 13 4 4 0 0 46 14 8 60 21 8 4 2 2 15 2 0 71 41 30 48.95 35.43 2.530000 22380.00 16689.00 POLYGON ((-124544.3 56147.0...
SACRAMENTO SUBURBAN WATER DISTRICT 193126 43047 97872 17684 834 20602 624 12464 22.29 50.68 9.16 0.43 10.67 0.32 6.45 190984 33399 157585 17.49 72505 3817 3001 3069 2884 3205 3100 3337 2893 2342 5541 6792 10037 6480 4342 5488 6177 12771 14878 12333 27649 22370 10822 23467 7204 2837 12037 2087 1160 37001 21072 10274 41.88 19.68 2.635471 73746.51 35321.18 MULTIPOLYGON (((-122206.9 6...
SAN JUAN WATER DISTRICT 30122 3409 21349 831 287 2762 17 1467 11.32 70.87 2.76 0.95 9.17 0.06 4.87 30014 1718 28297 5.72 10750 389 168 100 275 128 160 111 133 127 472 684 984 854 876 1032 4256 932 658 1156 1591 2141 1730 6210 1754 724 2883 528 357 1658 726 339 27.98 13.21 2.783858 160696.10 72978.42 POLYGON ((-104526.8 73044.7...
SCWA - ARDEN PARK VISTA 8086 990 6016 270 12 396 8 395 12.24 74.40 3.33 0.15 4.90 0.10 4.88 8038 523 7515 6.51 3303 79 36 48 77 65 38 18 49 162 139 187 253 465 208 416 1065 241 330 326 571 579 673 1823 520 112 673 76 23 807 384 225 29.69 10.90 2.424845 139081.65 84548.46 POLYGON ((-120985.4 62883.8...
SCWA - LAGUNA/VINEYARD 145495 27502 38496 16568 246 50411 2220 10052 18.90 26.46 11.39 0.17 34.65 1.53 6.91 145198 14710 130489 10.13 45137 1692 666 742 878 839 1336 850 788 752 2363 3198 6037 5323 5057 6578 8038 3978 4565 5561 8543 11598 10380 24581 7232 2916 7878 861 471 12677 6368 3337 32.04 14.90 3.207447 114494.03 41415.71 MULTIPOLYGON (((-126550 404...
SCWA MATHER-SUNRISE 18249 2708 8114 1553 23 4507 164 1180 14.84 44.47 8.51 0.12 24.70 0.90 6.47 18211 1005 17206 5.52 5503 228 35 97 57 68 39 12 20 36 189 320 533 645 755 1003 1469 416 174 509 590 1042 1399 3756 881 266 855 60 43 893 318 167 22.89 8.66 3.296327 147818.01 47448.37 MULTIPOLYGON (((-112526.7 5...
SEQUOIA WATER ASSOC 0 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 0 0 0 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-136929.5 36128.1...
SOUTHWEST TRACT W M D [SWS] 174 29 42 24 3 75 1 0 16.58 24.48 13.69 1.55 43.11 0.60 0.00 174 38 136 21.83 57 1 2 7 0 7 0 0 10 12 3 2 5 0 1 2 4 10 29 6 39 10 1 3 1 0 8 0 0 45 29 7 52.53 12.40 3.040000 45671.00 36348.00 MULTIPOLYGON (((-125843.6 5...
SPINDRIFT MARINA 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-139920.3 11468.3...
TOKAY PARK WATER CO 652 214 134 37 0 239 0 28 32.80 20.55 5.61 0.00 36.69 0.00 4.35 652 113 539 17.29 173 2 2 3 21 0 0 13 13 10 18 27 36 14 4 10 0 27 36 45 64 81 18 81 38 11 44 0 0 48 32 12 40.57 13.58 3.757973 62802.24 19400.05 POLYGON ((-122824.8 54197.9...
TUNNEL TRAILER PARK 0 0 0 0 0 0 0 0 49.74 34.94 0.00 0.00 4.65 0.00 10.67 0 0 0 0.00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20.30 0.00 2.950000 153092.00 42507.00 POLYGON ((-136160.9 24171.2...
VIEIRA'S RESORT, INC 4 2 2 0 0 0 0 0 41.43 52.47 0.00 0.00 4.55 0.00 1.56 4 1 3 22.60 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 40.53 21.84 2.030000 51977.00 40522.00 POLYGON ((-143780.4 18567.4...
WESTERNER MOBILE HOME PARK 32 6 6 9 0 10 0 1 17.59 17.62 28.31 0.55 31.36 0.00 4.57 31 7 24 23.76 10 1 0 0 0 1 0 0 1 0 2 1 1 2 0 1 0 1 2 3 3 4 2 4 2 1 1 0 0 5 3 2 56.87 29.49 3.160000 59296.00 23437.00 POLYGON ((-122657.2 48977.8...
\n\n`````\n:::\n:::\n\n\n
\n\n#### Clean & Format Intermediate (Clipped) Data\n\nFor visualization and exploration, it'll also be useful to apply some additional formatting to the clipped block-group data used in intermediate parts of the interpolation process above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# portion of households paying more than 30% / 50% of income on housing\ncensus_data_interpolate <- census_data_interpolate %>% \n mutate(households_all_housing_costs_over30pct_percent = \n 100 * (households_mortgage_housing_costs_over30pct_count + \n households_no_mortgage_housing_costs_over30pct_count + \n households_rent_housing_costs_over30pct_count) / households_count, \n .after = households_rent_housing_costs_over50pct_count) %>% \n mutate(households_all_housing_costs_over50pct_percent = \n 100 * (households_mortgage_housing_costs_over50pct_count + \n households_no_mortgage_housing_costs_over50pct_count + \n households_rent_housing_costs_over50pct_count) / households_count,\n .after = households_all_housing_costs_over30pct_percent)\n\n# drop water system data except name (water_system_name)\ncensus_data_interpolate <- census_data_interpolate %>% \n select(-census_unit_area, -clipped_area) %>% \n select(-c(water_system_number:water_system_population_reported)) %>% \n select(-c(average_household_size_weighted:per_capita_income_weighted)) %>% \n relocate(water_system_name, .after = NAME) %>% \n relocate(areal_weight_factor, .after = water_system_name)\n```\n:::\n\n\n#### Transform Results to Long Format {#sec-results-transform-long}\n\nFor further analysis and exploration / visualization of the results, it will help to convert the results from wide to long format, and edit the group names so that they can be used as titles.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# pivot from wide to long format\nwater_system_demographics_long <- water_system_demographics %>% \n # select(water_system_name, starts_with('percent_')) %>% # select only the fields with percentages, and the water system name/id\n # convert to long format\n # st_drop_geometry() %>% \n pivot_longer(cols = !c(water_system_name, geometry), \n names_to = 'variable', \n values_to = 'value') %>% \n relocate(geometry, .after = last_col())\n\n# clean variable names and add grouping fields (type, group_type)\nwater_system_demographics_long <- water_system_demographics_long %>% \n mutate(variable = variable %>% \n # str_remove_all(pattern = 'percent_') %>% \n str_replace_all(pattern = '_', replacement = ' ') %>% \n str_replace_all(pattern = ' or ', replacement = ' / ') %>% \n str_to_title(.) %>%\n str_remove_all(pattern = ' / Alaska Native')) %>% \n mutate(variable_type = case_when(\n str_detect(variable, pattern = 'Count') ~ 'Count',\n str_detect(variable, pattern = 'Percent') ~ 'Percent',\n str_detect(variable, pattern = 'Pop Weighted') ~ 'Pop Weighted',\n str_detect(variable, pattern = 'Hh Weighted') ~ 'Hh Weighted',\n .default = NA), \n .after = variable) %>% \n mutate(variable_group_type = case_when(\n str_detect(variable, pattern ='Population') ~ 'Population',\n str_detect(variable, pattern = 'Households') ~ 'Households',\n str_detect(variable, pattern = 'Average Household Size Hh Weighted') ~ 'Household Weighted', \n str_detect(variable, pattern = 'Median Household Income Hh Weighted') ~ 'Household Weighted',\n str_detect(variable, pattern = 'Per Capita Income Pop Weighted') ~ 'Population Weighted',\n str_detect(variable, pattern = 'Poverty') ~ 'Population'),\n .after = variable_type) %>% \n mutate(variable = case_when(\n str_detect(variable, pattern = 'Households Count') ~ 'Households Total',\n .default = str_remove_all(variable, pattern = 'Households'))) %>% \n mutate(variable = case_when(\n str_detect(variable, 'Population Total Count') ~ 'Population Total',\n .default = str_remove_all(variable, 'Population'))) %>%\n mutate(variable = str_remove_all(variable, \n pattern = 'Count')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = 'Percent')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = ' Hh Weighted')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = ' Pop Weighted')) %>% \n mutate(variable = str_replace_all(variable, \n pattern = 'Over30pct', \n replacement = 'Over 30% Income')) %>% \n mutate(variable = str_replace_all(variable, \n pattern = 'Over50pct', \n replacement = 'Over 50% Income')) %>% \n mutate(variable = str_trim(variable)) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'k ',\n replacement = 'k-')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = '0 ',\n replacement = '0-')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Black-',\n replacement = 'Black ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Mortgage ',\n replacement = 'Mortgage - ')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Rent ',\n replacement = 'Rent - ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'All ',\n replacement = 'All Households - ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Households Total',\n replacement = 'Total Households')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Population Total',\n replacement = 'Total Population')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Poverty ',\n replacement = 'Poverty - ')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Poverty - Rate',\n replacement = 'Poverty Rate'))\n```\n:::\n\n\nHere's a view of the structure of the reformatted data:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_system_demographics_long)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 3,472\nColumns: 6\n$ water_system_name \"B & W RESORT MARINA\", \"B & W RESORT MARINA\", \"B &…\n$ variable \"Total Population\", \"Hispanic / Latino\", \"White\", …\n$ variable_type \"Count\", \"Count\", \"Count\", \"Count\", \"Count\", \"Coun…\n$ variable_group_type \"Population\", \"Population\", \"Population\", \"Populat…\n$ value 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 41…\n$ geometry POLYGON ((-138282.2 13643.2..., POLYGON ((…\n```\n\n\n:::\n:::\n\n\nWe can also do the same for the clipped block-group data used in intermediate parts of the interpolation process above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate_long <- census_data_interpolate %>% \n pivot_longer(cols = !c(GEOID, NAME, water_system_name, \n areal_weight_factor, geometry), \n names_to = 'variable', \n values_to = 'value') %>% \n relocate(geometry, .after = last_col())\n```\n:::\n\n\n### Save Results {#sec-results-save}\n\nOnce we've finished the computations and verified the outputs look reasonable, we can save the results to output files so they can be re-used and shared. The results can be saved in tabular (e.g., csv, excel) and/or spatial (e.g., shapefile, geopackage) formats, which may be helpful for different use cases. Note that you may need to think about exactly what variables to include in the output file(s) and how to format the output datasets (e.g., wide versus long format).\n\nThe chunk of code below (which is hidden by default), just tests to see whether any of the datasets to be saved have been changed since the previous version was saved. In general this is probably not needed for a typical workflow and can be ignored for most use cases – it is just used here to make rendering of this document a little more efficient.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# compute hash for datasets to be saved (i.e., a unique identifier for each dataset), and compare against previous versions\n\n## define file that stores hash (unique identifier for dataset)\nhash_file <- here('03_data_results',\n '_dataset_hash.csv')\n\n## compute hashes (unique identifier for datasets)\nhash_current <- digest(object = water_system_demographics,\n algo = 'md5')\nhash_current_long <- digest(object = water_system_demographics_long,\n algo = 'md5')\nhash_interpolate <- digest(object = census_data_interpolate,\n algo = 'md5')\nhash_interpolate_long <- digest(object = census_data_interpolate_long, \n algo = 'md5')\nhash_table_current <- tibble(\n dataset = c('water_system_demographics', \n 'water_system_demographics_long',\n 'census_data_interpolate',\n 'census_data_interpolate_long'),\n hash = c(hash_current, \n hash_current_long,\n hash_interpolate,\n hash_interpolate_long))\n\n## get the previous hashes from file (if it exists), else create a new file to store the hashes\nif (file.exists(hash_file)) {\n hash_table_previous <- read_csv(file = hash_file)\n} else {\n file.create(hash_file)\n hash_table_previous <- tibble(\n dataset = c('water_system_demographics', \n 'water_system_demographics_long',\n 'census_data_interpolate',\n 'census_data_interpolate_long'),\n hash = c('missing', \n 'missing',\n 'missing', \n 'missing'))\n}\n\n## if new hash is different from previous hash, set flag to update the output file (i.e., write a new version of the file)\nfile_update <- !identical(hash_table_current %>% \n filter(dataset == 'water_system_demographics') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'water_system_demographics') %>% \n pull(hash))\nfile_update_long <- !identical(hash_table_current %>% \n filter(dataset == 'water_system_demographics_long') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'water_system_demographics_long') %>% \n pull(hash))\nfile_update_interpolate <- !identical(hash_table_current %>% \n filter(dataset == 'census_data_interpolate') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'census_data_interpolate') %>% \n pull(hash))\nfile_update_interpolate_long <- !identical(hash_table_current %>% \n filter(dataset == 'census_data_interpolate_long') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'census_data_interpolate_long') %>% \n pull(hash))\n\n## write current hashes to file (for comparison with future versions)\nwrite_csv(x = hash_table_current,\n file = hash_file,\n append = FALSE)\n```\n:::\n\n\n#### Tabular Dataset {#sec-results-save-tabular}\n\nThe code below saves the tabular results to a csv file, in both the 'wide' and 'long' formats:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# wide\nif (file_update == TRUE) {\n write_csv(water_system_demographics %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'water_system_demographics_sac.csv'))\n}\n\n# long\nif (file_update_long == TRUE) {\n write_csv(water_system_demographics_long %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'water_system_demographics_sac_long.csv'))\n}\n```\n:::\n\n\nAnd we can save the intermediate data from the interpolation process (i.e., data for clipped block groups):\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# wide\nif (file_update_interpolate == TRUE) {\n write_csv(census_data_interpolate %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'intermediate_interpolation_data.csv'))\n}\n\n# long\nif (file_update_interpolate_long == TRUE) {\n write_csv(census_data_interpolate_long %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'intermediate_interpolation_data_long.csv'))\n}\n```\n:::\n\n\n#### Spatial Dataset {#sec-results-save-spatial}\n\nTo save the output in a geospatial format, it may be best to save the data in a wide format, so that all of the attribute data for each *target* area (water system) is in a single row along with its spatial data (i.e. the system boundary information) (saving in long format may create a very large file). The code below saves the results – in wide format – to a geopackage file, which is a spatial file format that is similar to a shapefile.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (file_update == TRUE) {\n st_write(water_system_demographics,\n here('03_data_results',\n 'water_system_demographics_sac.gpkg'),\n append = FALSE)\n}\n\nif (file_update_interpolate == TRUE) {\n st_write(census_data_interpolate,\n here('03_data_results',\n 'intermediate_interpolation_data.gpkg'),\n append = FALSE)\n}\n```\n:::\n\n\n## Explore and Visualize Results {#sec-results-explore}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nFor simplicity, this section will focus on presenting estimated demographics for some of the largest water suppliers in the Sacramento county region (results for small water systems may not be very accurate and should be used with some caution - see @sec-check-pop-estimated-reported and @sec-small-area-estimates for more investigation of the results for small systems).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select systems to plot\n\n## number of systems\nn_systems <- 20\n\n## get list of selected systems\nsystems_top_n <- water_system_demographics %>% \n slice_max(population_total_count, n = n_systems) %>% \n pull(water_system_name)\n```\n:::\n\n\n### Race / Ethnicity\n\n\\[placeholder\\]\n\n- percent by group (bar)\n\n- dot-density\n\n- mapview (non-white)\n\n### Income Distributions\n\n\\[placeholder\\]\n\n- income brackets (50k) (bar)\n\n- median household income by block group (dots)\n\n- dot-density (below threshold value)\n\n- mapview\n\n### Poverty Rates\n\n\\[placeholder\\]\n\n- dot plot\n\n- mapview\n\n- side-by-side map\n\n### Income & Relative Housing Costs\n\nThe [biscale](https://chris-prener.github.io/biscale/) R package [@biscale] can be used to create maps that show how two metrics vary together spatially (bivariate choropleth maps).\n\n@fig-bivariate-all shows the relationship between estimated income and relative housing costs for the top 20 systems by estimated population in Sacramento County.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# Table B25140 - Housing Costs as a Percentage of Household Income in the past 12 months.\n# Shows the count of households paying more than 30% or 50% of their income towards housing costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented).\n\n# set defaults\nbiscale_pal <- 'BlueOr' # 'GrPink' # 'DkViolet2'\nbiscale_dim <- 3\n\n# create classes\nbiscale_data <- bi_class(water_system_demographics %>% \n filter(water_system_name %in% systems_top_n) %>% \n filter(!is.na(median_household_income_hh_weighted)), \n x = households_all_housing_costs_over30pct_percent, \n y = median_household_income_hh_weighted, \n style = \"quantile\", \n dim = biscale_dim)\n\n# create map\nbiscale_map <- ggplot() +\n geom_sf(data = biscale_data, \n mapping = aes(fill = bi_class), \n color = \"white\", \n size = 0.1, \n show.legend = FALSE) +\n bi_scale_fill(pal = biscale_pal, \n dim = biscale_dim) + \n labs(\n title = \"Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \\nand Estimated Median Household Income in Sacramento Water Systems\",\n subtitle = glue(\"Top {n_systems} Water Systems by Population\"),\n caption = glue(\"Data estimated from {acs_year} 5-year ACS Block Groups\")\n # title = \"Estimated Housing Cost as % of Household Income and \\nEstimated Median Household Income in Sacramento Water Systems\", \n # caption = \"% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \\nIncome shows median household income (yellow = missing)\"\n ) +\n # labs(\n # title = \"Housing Cost1 and Income2 in Sacramento Water Systems\",\n # caption = \"1% of households paying more than 30% of their income towards housing costs
2Median household income (yellow = missing)\",\n # subtitle = glue(\"Top {n_systems} systems by population\")\n # ) +\n # add missing polygons back in\n geom_sf(data = water_system_demographics %>% \n filter(water_system_name %in% systems_top_n) %>% \n filter(is.na(median_household_income_hh_weighted)),\n color = \"white\",\n fill = 'gold'\n ) +\n geom_sf(data = counties_ca %>% filter(NAME == 'Sacramento'), \n color = 'grey',\n fill = NA) +\n bi_theme() + \n theme(plot.title = element_text(size=12), # element_markdown(size=12)\n plot.subtitle = element_text(size=10),\n plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))\n\n# create legend\nbiscale_legend <- bi_legend(pal = biscale_pal,\n dim = biscale_dim,\n xlab = \"% Housing Costs \",\n ylab = \"Income \",\n size = 8)\n\n# construct map\nbiscale_plot <- ggdraw() +\n draw_plot(biscale_map, 0, 0, 1, 1) +\n draw_plot(biscale_legend, 0.1, .65, 0.2, 0.2)\n\nbiscale_plot\n```\n\n::: {.cell-output-display}\n![](example_census_race_ethnicity_calculation_files/figure-html/fig-bivariate-all-1.png){#fig-bivariate-all width=768}\n:::\n:::\n\n\n@fig-bivariate-system shows the same variables (relative housing costs and income) for the portions block groups overlapping Sacramento Suburban Water District – this illustrates the data underlying the interpolation process.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# set defaults\nbiscale_pal_system <- 'BlueOr' # 'GrPink' # 'DkViolet2'\nbiscale_dim_system <- 3\n\n# create classes\nbiscale_data_system <- bi_class(census_data_interpolate %>% \n filter(water_system_name == system_plot) %>% \n filter(!is.na(median_household_income)), \n x = households_all_housing_costs_over30pct_percent, \n y = median_household_income, \n style = \"quantile\", \n dim = biscale_dim_system)\n# create map\nbiscale_map_system <- ggplot() +\n geom_sf(data = biscale_data_system , \n mapping = aes(fill = bi_class), \n color = \"white\", \n size = 0.1, \n show.legend = FALSE) +\n bi_scale_fill(pal = biscale_pal_system, \n dim = biscale_dim_system) + \n labs(\n title = glue(\"Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \\nand Estimated Median Household Income in {str_to_title(system_plot)}\"),\n # subtitle = glue(\"\"),\n caption = glue(\"Data from {acs_year} 5-year ACS Block Groups (Yellow = Missing Data)\")#,\n # title = glue(\"Housing Cost and Income \\nin {str_to_title(system_plot)}\"), \n # caption = \"% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \\nIncome shows median household income (yellow = missing)\"#,\n ) +\n # add the missing polygons back in\n geom_sf(data = census_data_interpolate %>% \n filter(water_system_name == system_plot) %>% \n filter(is.na(median_household_income)),\n color = \"white\",\n fill = 'gold'\n ) +\n bi_theme() + \n theme(plot.title = element_text(size=12), # element_markdown(size=12)\n plot.subtitle = element_text(size=10),\n plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))\n\n# create legend\nbiscale_legend <- bi_legend(pal = biscale_pal_system,\n dim = biscale_dim_system,\n xlab = \"% Housing Costs \",\n ylab = \"Income \",\n size = 8)\n\n# construct map\nbiscale_plot_system <- ggdraw() +\n draw_plot(biscale_map_system, 0, 0, 1, 1) +\n draw_plot(biscale_legend, 0.1, .55, 0.2, 0.2)\n\nbiscale_plot_system\n```\n\n::: {.cell-output-display}\n![](example_census_race_ethnicity_calculation_files/figure-html/fig-bivariate-system-1.png){#fig-bivariate-system width=768}\n:::\n:::\n\n\n## Check - Estimated vs Reported Population Estimates {#sec-check-pop-estimated-reported}\n\n\\[TO DO: Create map\\]\n\nBased on the map above, it's apparent that it will be difficult to obtain reasonable estimates for some suppliers, such as the suppliers with very small service areas in the southern portion of the county where the block groups are very large (and the supplier's service are is only a small fraction of the total area of the block group). These issues are explored further in @sec-small-area-estimates.\n\nNote that there are a number of reasons why the estimated population values are likely to differ from the population numbers in the water system dataset (e.g., the depicted boundaries may not be correct or exact, the supplier may have used different methods to count/estimate the population they serve, the time frames for the estimates may be different, etc.). But, there may also be some cases where the numbers differ significantly – depending on the actual analysis being performed, this may mean that further work is needed for certain areas, or could mean that this method may not be sufficient and different methods are needed.\n\nAs a check, we can add a column to the interpolated dataset (which we'll call `population_percent_difference`) to compute the difference between the estimated total population (in the `population_total` field) and the total population listed in the `water_system_population_reported` field (the reported value from the water system dataset, which is named `POPULATION` in the water supplier dataset).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics_check <- water_system_demographics %>% \n left_join(water_systems_sac %>% \n st_drop_geometry() %>% \n select(water_system_name, water_system_population_reported),\n by = 'water_system_name')\n\nwater_system_demographics_check <- water_system_demographics_check %>%\n mutate(population_percent_difference =\n round(100 * (population_total_count - water_system_population_reported) / water_system_population_reported, \n 2), \n .after = water_system_population_reported)\n```\n:::\n\n\nFor water systems with a small population and/or service area, the estimated demographics may not match the reported population numbers in the water system dataset very well. You can see this in @tbl-pop-est-small by comparing the `population_reported` field, which contains the total population values from the water supplier dataset, with the `population_estimated` field, which contains the total population estimated from the census data; the difference between the two is summarized in the `population_percent_difference` field. This probably indicates that, for small areas, some adjustments and/or further analysis may be needed, and the preliminary estimated values should be treated with some caution/skepticism.\n\nNote: See @sec-small-area-estimates below for some more investigation into water systems whose estimated population is at or near zero.\n\n\n::: {#tbl-pop-est-small .cell tbl-cap='10 Smallest Water Systems by Population'}\n\n```{.r .cell-code}\nwater_system_demographics_check %>%\n arrange(water_system_population_reported) %>%\n slice(1:10) %>%\n select(water_system_name, \n population_reported = water_system_population_reported, \n population_estimated = population_total_count, \n population_percent_difference) %>%\n st_drop_geometry() %>%\n kable()\n```\n\n::: {.cell-output-display}\n\n\n|water_system_name | population_reported| population_estimated| population_percent_difference|\n|:---------------------------|-------------------:|--------------------:|-----------------------------:|\n|DELTA CROSSING MHP | 30| 0| -100.00|\n|LAGUNA VILLAGE RV PARK | 32| 20| -37.50|\n|LINCOLN CHAN-HOME RANCH | 33| 4| -87.88|\n|EDGEWATER MOBILE HOME PARK | 40| 0| -100.00|\n|MAGNOLIA MUTUAL WATER | 40| 1| -97.50|\n|FREEPORT MARINA | 42| 3| -92.86|\n|PLANTATION MOBILE HOME PARK | 44| 10| -77.27|\n|TUNNEL TRAILER PARK | 44| 0| -100.00|\n|SEQUOIA WATER ASSOC | 54| 0| -100.00|\n|HAPPY HARBOR (SWS) | 60| 0| -100.00|\n\n\n:::\n:::\n\n\nBut for larger water systems, the estimated population values seem to be more in line with the population numbers in the original dataset. You can see this in @tbl-pop-est-large by, as above, comparing the `population_reported` field, which contains the total population values from the water supplier dataset, with the `population_estimated` field, which contains the total population estimated from the census data; the difference between the two is summarized in the `population_percent_difference` field.\n\n\n::: {#tbl-pop-est-large .cell tbl-cap='10 Largest Water Systems by Population'}\n\n```{.r .cell-code}\nwater_system_demographics_check %>%\n arrange(desc(water_system_population_reported)) %>%\n slice(1:10) %>%\n select(water_system_name, \n population_reported = water_system_population_reported, \n population_estimated = population_total_count, \n population_percent_difference) %>%\n st_drop_geometry() %>%\n kable()\n```\n\n::: {.cell-output-display}\n\n\n|water_system_name | population_reported| population_estimated| population_percent_difference|\n|:----------------------------------|-------------------:|--------------------:|-----------------------------:|\n|CITY OF SACRAMENTO MAIN | 510931| 516189| 1.03|\n|SACRAMENTO SUBURBAN WATER DISTRICT | 184385| 193126| 4.74|\n|SCWA - LAGUNA/VINEYARD | 172666| 145495| -15.74|\n|FOLSOM, CITY OF - MAIN | 68122| 62462| -8.31|\n|CITRUS HEIGHTS WATER DISTRICT | 65911| 68912| 4.55|\n|CALAM - SUBURBAN ROSEMONT | 53563| 57897| 8.09|\n|CALAM - PARKWAY | 48738| 58635| 20.31|\n|CALAM - LINCOLN OAKS | 47487| 42916| -9.63|\n|GOLDEN STATE WATER CO. - CORDOVA | 44928| 48115| 7.09|\n|ELK GROVE WATER SERVICE | 42540| 42647| 0.25|\n\n\n:::\n:::\n\n\n## Considerations for Detailed Population Estimates {#sec-detailed-pop-estimates}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nIf you're primarily only interested in population estimates (possibly including population by race/ethnicity, age, gender, etc.) and need an estimate that's as geographically accurate as possible, it may make more sense to use the block-level population data from the decennial census rather than block group level population data from the ACS. However, since the decennial census only occurs once every 10 years, those estimates won't reflect recent population changes (and will get especially less accurate as we get farther from the last decennial census). But keep in mind that even the 5-year ACS is an average that encompasses previous years' estimates, so it's not necessarily temporally precise either.\n\nIt's also possible to use the block-level decennial population data as a weighing factor for ACS population data (to allocate the population within block-group level ACS data).\n\n\\[TO DO: add example\\]\n\n## Considerations for Small / Rural Area Estimates {#sec-small-area-estimates}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nFor some water systems, the estimated population using the areal interpolation above (@sec-areal-interp) was at or near zero, and it may be useful to look at an example to see what's going on with one of those cases.\n\n(because the water system may encompass only a small portion of one or a few census units, and the entire census unit(s) may not be representative of the small portion(s)), especially those in rural environments (where population densities are lower, population centers tend to be spread out, and census units tend to be larger).\n\n\\[TO DO: insert map\\]\n\nFrom the map above \\[TO DO: insert map\\], you can see that the service area reported for some systems are very small, only covering a small fraction of a single census unit, resulting in a population estimate that is very low. In these cases, it could be that the system area was drawn incorrectly (i.e., maybe it doesn't really depict the entire service area), in which case the reported service area should be revised. Or, it's possible that the population within the given census unit is very un-evenly distributed and instead there's a relatively high density population cluster in the depicted service area, in which case a more sophisticated method than an area-weighted average should be used (e.g., maybe consider the density of buildings, roads, and/or other features associated with inhabited areas).\n\n## Alternative Computation Methods {#sec-alternative-methods}\n\n::: callout-warning\nThis section is in progress.\n:::\n\n### Population Weighted Interpolation {#sec-alternative-interpolate_pw}\n\nThe `tidycensus` package also has a function for population weighted interpolation, [`interpolate_pw`](https://walker-data.com/tidycensus/reference/interpolate_pw.html), but it uses a somewhat different methodology than the population weighted interpolation procedure applied above in @sec-pop-interp.\n\nNote that some water systems may not get an estimated value using this method, even if `NA` values are removed from the source data first (TO DO: check whether this depends on which type of boundary dataset is used - i.e., tigris with cb = FALSE or TRUE).\n\nUsing original `census_data_acs` variable gives multiple `NA`s - it looks like those are small areas:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresults_interpolate_pw <- interpolate_pw(from = census_data_acs %>%\n filter(!is.na(population_total_count)) %>% # population_total_count median_household_income\n select(population_total_count),\n to = water_systems_sac,\n to_id = 'water_system_name',\n extensive = TRUE, # use FALSE for median_household_income\n weights = census_data_decennial,\n # weight_placement = 'surface',\n weight_column = 'population_total_count') %>%\n # rename(median_household_income_interpolate_pw = median_household_income) # rename results field\n rename(population_total_count_interpolate_pw = population_total_count) %>% \n mutate(population_total_count_interpolate_pw = round(population_total_count_interpolate_pw, 0))\n\n# sum(is.na(results_interpolate_pw$median_household_income_interpolate_pw))\nsum(is.na(results_interpolate_pw$population_total_count_interpolate_pw))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 17\n```\n\n\n:::\n:::\n\n\nUsing detailed block group geometry - looks like the same results?:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# # get detailed block group geometry\n# block_groups_detailed <- block_groups(state = 'CA',\n# county = 'Sacramento',\n# cb = FALSE,\n# year = acs_year) %>%\n# st_transform(crs_projected)\n# block_groups_detailed <- block_groups_detailed %>%\n# st_filter(water_systems_sac) %>%\n# select(GEOID)\n# \n# block_groups_detailed <- block_groups_detailed %>%\n# left_join(census_data_acs %>%\n# st_drop_geometry() %>%\n# select(GEOID, population_total_count, median_household_income),\n# by = 'GEOID')\n# \n# # interpolate\n# results_interpolate_pw_detailed <- interpolate_pw(from = block_groups_detailed %>%\n# filter(!is.na(population_total_count)) %>% # population_total_count median_household_income\n# select(population_total_count),\n# to = water_systems_sac,\n# to_id = 'water_system_name',\n# extensive = TRUE, # use FALSE for median_household_income\n# weights = census_data_decennial,\n# weight_placement = 'surface',\n# weight_column = 'population_total_count') %>%\n# # rename(median_household_income_interpolate_pw = median_household_income) # rename results field\n# rename(population_total_count_interpolate_pw = population_total_count)\n# \n# # sum(is.na(results_interpolate_pw_detailed$median_household_income_interpolate_pw))\n# sum(is.na(results_interpolate_pw_detailed$population_total_count_interpolate_pw))\n```\n:::\n\n\nCompare results using `interpolate_pw` to reported population counts:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresults_interpolate_pw <- results_interpolate_pw %>%\n left_join(water_systems_sac %>%\n st_drop_geometry() %>%\n select(water_system_service_connections, water_system_population_reported, water_system_name),\n by = 'water_system_name') %>% \n relocate(water_system_population_reported, .after = population_total_count_interpolate_pw)\n```\n:::\n\n\nView results:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(results_interpolate_pw)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 5\n$ water_system_name \"HOOD WATER MAINTENCE DIST [SWS]…\n$ population_total_count_interpolate_pw 74, 412, 96, NA, 1031, NA, NA, 1…\n$ water_system_population_reported 100, 700, 40, 150, 256, 150, 32,…\n$ water_system_service_connections 82, 199, 34, 64, 128, 83, 28, 50…\n$ geometry MULTIPOLYGON (((-13…\n```\n\n\n:::\n:::\n\n\n## Working with Other Source Datasets {#sec-other-sources}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nIn addition to using census data, it's possible to use other types of *source* datasets to compute characteristics of custom *target* areas like water systems. The process is generally likely to be similar to the processes shown above using census data, but each source dataset may require unique considerations (e.g., to handle missing values, uncertain boundaries, etc.).\n\n### CalEnviroScreen {#sec-calenviroscreen}\n\n\\[TO DO: example computation of weighted average CES scores\\]\n\nNotes to consider:\n\n- Some census tracts are missing CES scores (overall and/or for certain indicators), and have to deal with those missing values somehow\n\n- CES 4.0 is tract-level data, and uses 2010 census boundaries (so boundaries won't match current ACS or decennial boundaries)\n\n- CES 4.0 boundaries are simplified, and boundaries between tracts are not consistent – for some types of analysis (especially when looking at point data - e.g., facilities), it may be better to use the original TIGER dataset (available from either the `tidycensus` or `tigris` R packages)\n\n
\n", + "markdown": "---\ntitle: \"Estimating Demographics of Custom Spatial Features\"\nsubtitle: \"Accessing U.S. Census Bureau Data & Calculating Weighted Averages with Areal- and Population-Weighted Interpolation\"\nnumber-sections: true\ntoc: true\ntoc-depth: 4\nformat:\n html:\n self-contained: false\nbibliography: references.bib\n---\n\n```{=html}\n \n\n```\n\n## Background {#sec-background}\n\n::: callout-note\nFor comments, suggestions, corrections, or questions on anything below, contact [david.altare\\@waterboards.ca.gov](mailto:david.altare@waterboards.ca.gov), or [open an issue](https://github.com/daltare/example-census-race-ethnicity-calculation/issues) on github.\n:::\n\n::: callout-warning\nThis is a draft / work in progress – some parts are still under development, and existing parts may change.\n:::\n\nThis document provides an example of how to use tools available from the [R programming language](https://www.R-project.org/) [@R] to estimate characteristics of any given *target* spatial area(s) (e.g., neighborhoods, project boundaries, water supplier service areas, etc.) based on data from a *source* dataset containing the characteristic data of interest (e.g., census data, CalEnvrioScreen scores, etc.), especially when the boundaries of the *source* and *target* areas overlap but don't necessarily align with each other. It also provides some brief background on the various types of data available from the U.S Census Bureau, and links to a few places to find more in-depth information.\n\nThis particular example estimates demographic characteristics of community water systems in the Sacramento County area (the *target* dataset). It uses the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package [@tidycensus] to access selected demographic data from the U.S. Census Bureau (the *source* dataset) for census units whose spatial extent covers those water systems' service areas, then uses the [`sf`](https://r-spatial.github.io/sf/) package [@sf] package (for working with spatial data) and the [`tidyverse`](https://www.tidyverse.org/) collection of packages [@tidyverse] (for general data cleaning and transformation) to estimate some demographic characteristics of each water system based on that census data. It also uses the [`areal`](https://chris-prener.github.io/areal/) R package [@areal] to check some of the results, and as general guidance on the principles and techniques for implementing areal interpolation.\n\nThis example is just intended to be a simplified demonstration of a possible workflow. For a real analysis, additional steps and considerations – that may not be covered here – may be needed to deal with data inconsistencies (e.g., missing or incomplete data), required level of precision and acceptable assumptions (e.g. more fine-grained datasets or more sophisticated techniques could be used to estimate/model population distributions), or other project-specific issues that might arise.\n\n## Setup {#sec-setup}\n\nThe code block below loads required packages for this analysis, and sets some user-defined options and defaults. If they aren't already installed on your computer, you can install them with the R command `install.packages('package-name')` (and replace `package-name` with the name of the package you want to install).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# packages ----\nlibrary(tidycensus)\nlibrary(tigris)\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(areal)\nlibrary(janitor)\nlibrary(here)\nlibrary(units)\n# library(Polychrome)\nlibrary(knitr)\nlibrary(kableExtra)\nlibrary(tmap)\nlibrary(patchwork)\nlibrary(scales)\nlibrary(digest)\nlibrary(mapview)\nlibrary(biscale)\nlibrary(cowplot)\nlibrary(glue)\nlibrary(ggtext)\n\n# conflicts ----\nlibrary(conflicted)\nconflicts_prefer(dplyr::filter)\n\n# options ----\noptions(scipen = 999) # turn off scientific notation\noptions(tigris_use_cache = TRUE) # use data caching for tigris\n\n# reference system ----\ncrs_projected <- 3310 # set a common projected coordinate reference system to be used throughout this analysis - see: https://epsg.io/3310\n```\n:::\n\n\n## Census Data Overview {#sec-census-overview}\n\nThis section provides some brief background on the various types of data available from the U.S. Census Bureau (a later section - @sec-census-access - demonstrates how to retrieve data from the U.S. Census Bureau using the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package). Most of the information covered here comes from the book [Analyzing US Census Data: Methods, Maps, and Models in R](https://walker-data.com/census-r/index.html), which is a great source of information if you'd like more detail about any of the topics below [@walker2023].\n\n::: callout-note\nIf you're already familiar with Census data and want to skip this overview, go directly to the next section: @sec-system-boundaries\n:::\n\nDifferent census products/surveys contain data on different variables, at different geographic scales, over varying periods of time, and with varying levels of certainty. Therefore, there are a number of judgement calls to make when determining which type of census data to use for an analysis – e.g., which data product to use (Decennial Census or American Community Survey), which geographic scale to use (e.g., Block, Block Group, Tract, etc.), what time frame to use, which variables to assess, etc.\n\nMore detailed information about U.S. Census Bureau's data products and other topics mentioned below is available [here](https://walker-data.com/census-r/the-united-states-census-and-the-r-programming-language.html#the-united-states-census-and-the-r-programming-language).\n\n### Census Unit Geography / Hierarchy {#sec-census-hierarchy}\n\nPublicly available datasets from the U.S Census Bureau generally consist of individual survey responses aggregated to defined census units (e.g., census tracts) that cover varying geographic scales. Some of these units are nested and can be neatly aggregated (e.g., each census tract is composed of a collection of block groups, and each block group is composed of a collection of blocks), while other census units are outside this hierarchy (e.g., Zip Code Tabulation Areas don't coincide with any other census unit). @fig-census-hierarchies shows the relationship of all of the various census units.\n\nCommonly used census statistical units like tracts and block groups have target population size ranges, and can be adjusted every 10 years (with the decennial census) based on population changes. For example, all ACS 5-year datasets prior to 2020 use the 2010 boundaries for tracts, block groups, and blocks, and all ACS 5-year datasets from [2020 onward](https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2020/geography-changes.html) (presumably through 2029) use the 2020 boundaries for those units. [Census tracts](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_13) are generally around 4,000 people, with a range from about 1,200 to 8,000, and [block groups](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_4) generally contain 600 to 3,000 people. [Blocks](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_5) are the smallest census units, and are \"areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads\". For example, a census block may be \"a city block bounded on all sides by streets\", while \"blocks in suburban and rural areas may be larger, more irregular in shape, and bounded by a variety of features, such as roads, streams, and transmission lines\".\n\n::: callout-caution\nCensus boundaries can change over time. Commonly used statistical units like tracts, block groups, and blocks tend to be revised every 10 years (with the decennial census), so it's important to use a census boundary dataset that matches the version of the census demographic data you're retrieving; otherwise, the demographic data may not match geographic areas in your boundary dataset. In some cases, a census unit that exists in a given year of the census data may not exist at all in a different year's dataset, because census units can be split or merged when boundaries are revised.\n\nFor more information, see [here](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_geography_handbook_2020_ch02.pdf) or [here](https://www.census.gov/programs-surveys/acs/geography-acs/geography-boundaries-by-year.html) or [here](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_6) or [here](https://www.census.gov/data/academy/data-gems/2021/compare-2020-census-and-2010-census-redistricting-data.html).\n:::\n\nFor a list of the different geographic units available for each of the different census products/surveys (see @sec-census-datasets) that can be accessed via the `tidycensus` package, go [here](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus).\n\n![Census Unit Hierarchies](https://walker-data.com/census-r/img/screenshots/census-hierarchies.png){#fig-census-hierarchies}\n\n### Census Datasets / Surveys {#sec-census-datasets}\n\nThe Decennial Census is conducted every 10 years, and is intended to provide a complete count of the US population and assist with political redistricting. As a result, it collects a relatively limited set of basic demographic data, but (should) provide a high degree of precision (i.e., in general it should provide exact counts). It is available for geographic units down to the census block (the smallest census unit available – see @sec-census-hierarchy). For information about existing and planned future releases of 2020 census data products, go [here](https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/release/about-2020-data-products.html).\n\nThe American Community Survey (ACS) provides a much larger array of demographic information than the Decennial Census, and is updated more frequently. The ACS is based on a sample of the population (rather than a count of the entire population, as in the Decennial Census), so it represents estimated values rather than precise counts; therefore, each data point is available as an estimate (typically labeled with an \"E\" in census variable codes, which are discussed in @sec-census-variables ) along with an associated margin of error (typically labeled with \"M\" or \"MOE\" in census variable codes) around its estimated value.\n\nThe ACS is available in two formats. The 5-year ACS is a rolling average of 5 years of data (e.g., the 2021 5-year ACS dataset is an average of the ACS data from 2017 through 2021), and is generally available for geographic units down to the census block group (though some 5-year ACS data may only be available at less granular levels). The 1-year ACS provides data for a single year, and is only available for geographies with population greater than 65,000 (e.g., large cities and counties). Therefore, only the 5-year ACS will be useful for any analysis at a relatively fine scale (e.g., anything that requires data at or more detailed than the census tract level, or any analysis that considers smaller counties/cities – by definition, census tracts always contain significantly fewer than 65,000 people).\n\nIn addition to the Decennial Census and ACS data, a number of other census data products/surveys are also available. For example, see the `censusapi` R package ([here](https://github.com/hrecht/censusapi) or [here](https://www.hrecht.com/censusapi/index.html)) for access to over 300 census API endpoints. For historical census data, see the discussion [here](https://walker-data.com/census-r/other-census-and-government-data-resources.html?q=API%20endpoint#other-census-and-government-data-resources) on using NHGIS, IPUMS, and the `ipumsr` package.\n\n### Census Variables / Codes {#sec-census-variables}\n\nEach census product collects data for many different demographic variables, and each variable is generally associated with an identifier code. In order to access census data programmatically, you often need to know the code associated with each variable of interest. When determining which variables to use, you need to consider what census product contains those variables (see @sec-census-datasets) and how they differ in terms of time frame, precision, spatial granularity (see @sec-census-hierarchy), etc.\n\nThe `tidycensus` package offers a convenient generic way to search for variables across different census products using the `load_variables()` function, as described [here](https://walker-data.com/tidycensus/articles/basic-usage.html#searching-for-variables).\n\nThe following websites may also be helpful for exploring the various census data products and finding the variable names and codes they contain:\n\n- Census Reporter (for ACS data): (especially )\n\n- Census Bureau's list of variable codes, e.g.:\n\n - 2020 Census codes: \n\n - 2022 ACS 5 year codes: \n\n- Census Bureau's data interface (for Decennial Census and ACS, and other census datasets): \n\n- National Historical Geographic Information System (NHGIS) (for ACS data and historical decennial Census data): \n\n## Target Data Boundaries (Water Systems) {#sec-system-boundaries}\n\nIn this section, we'll get the service area boundaries for Community Water Systems within the Sacramento County area. This will serve as the *target* dataset – i.e., the set of areas which we'll be estimating the characteristics of – and will also be used to specify the geographic areas of the census data we want to retrieve. We'll also get a dataset of county boundaries which overlap the water service areas in this study, which can also help with specifying what census data to access and/or be used to make maps and visualizations.\n\n### Read Water System Data\n\nIn this case, we'll get the water system dataset from a shapefile that's saved locally, then transform that dataset into a common coordinate reference system for mapping and analysis (which is defined above in the variable `crs_projected`).\n\nThis water system dataset comes from the [California Drinking Water System Area Boundaries dataset](https://gispublic.waterboards.ca.gov/portal/home/item.html?id=fbba842bf134497c9d611ad506ec48cc). For this example, the dataset has been pre-filtered for systems within Sacramento County (by selecting records where the `COUNTY` field is \"SACRAMENTO\") and for Community Water Systems (by selecting records where the `STATE_CLAS` field is \"COMMUNITY\"). Some un-needed fields have also been dropped, remaining fields have been re-orderd.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_systems_sac <- st_read(here('02_data_input', \n 'water_supplier_boundaries_sac', \n 'System_Area_Boundary_Layer_Sac.shp')) %>% \n st_transform(crs_projected) # transform to common coordinate system\n```\n:::\n\n\nWe can use the `glimpse` function (below) to take get a sense of what type of information is available in the water system dataset and how it's structured.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_systems_sac)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 12\n$ WATER_SY_1 \"HOOD WATER MAINTENCE DIST [SWS]\", \"MC CLELLAN MHP\", \"MAGNO…\n$ WATER_SYST \"CA3400101\", \"CA3400179\", \"CA3400130\", \"CA3400135\", \"CA3400…\n$ GLOBALID \"{36268DB3-9DB2-4305-A85A-2C3A85F20F34}\", \"{E3BF3C3E-D516-4…\n$ BOUNDARY_T \"Water Service Area\", \"Water Service Area\", \"Water Service …\n$ OWNER_TYPE \"L\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\",…\n$ COUNTY \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMENTO\", \"SA…\n$ REGULATING \"LPA64 - SACRAMENTO COUNTY\", \"LPA64 - SACRAMENTO COUNTY\", \"…\n$ FEDERAL_CL \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUN…\n$ STATE_CLAS \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\", \"COMMUN…\n$ SERVICE_CO 82, 199, 34, 64, 128, 83, 28, 50, 164, 5684, 14798, 115, 33…\n$ POPULATION 100, 700, 40, 150, 256, 150, 32, 100, 350, 18005, 44928, 20…\n$ geometry MULTIPOLYGON (((-132703 403..., MULTIPOLYGON (…\n```\n\n\n:::\n:::\n\n\nNote that this dataset already includes a `POPULATION` variable that indicates the population served by each water system, which we renamed to `water_system_population_reported` above (note: I'm not exactly how the data in this variable is derived). However, for this analysis we'll be making our own estimate of the population within each system's service area based on U.S. Census Bureau data and the spatial representation of the system boundaries. Given the uncertainty in how the reported population data was derived (including potential temporal differences), the population estimates produced here will likely will not exactly match the reported population data; but, the reported population data may serve as a useful check to make sure our estimates are reasonable.\n\nTo make the water system data easier to work with, we can make some more descriptive field names (note that while it's redundant, we're using the prefix `water_system_` for all field names to distinguish data types when joining this data with other datasets later).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_systems_sac <- water_systems_sac %>% \n rename(water_system_name = WATER_SY_1, \n water_system_number = WATER_SYST,\n water_system_id\t= GLOBALID,\n water_system_boundary_type = BOUNDARY_T,\n water_system_owner_type\t= OWNER_TYPE,\n water_system_county\t= COUNTY,\n water_system_regulating_agency = REGULATING,\n water_system_federal_class = FEDERAL_CL,\n water_system_state_class = STATE_CLAS,\n water_system_service_connections = SERVICE_CO,\n water_system_population_reported = POPULATION)\n```\n:::\n\n\nHere's a view of the structure of the revised dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_systems_sac)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 12\n$ water_system_name \"HOOD WATER MAINTENCE DIST [SWS]\", \"M…\n$ water_system_number \"CA3400101\", \"CA3400179\", \"CA3400130\"…\n$ water_system_id \"{36268DB3-9DB2-4305-A85A-2C3A85F20F3…\n$ water_system_boundary_type \"Water Service Area\", \"Water Service …\n$ water_system_owner_type \"L\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P\", \"P…\n$ water_system_county \"SACRAMENTO\", \"SACRAMENTO\", \"SACRAMEN…\n$ water_system_regulating_agency \"LPA64 - SACRAMENTO COUNTY\", \"LPA64 -…\n$ water_system_federal_class \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\"…\n$ water_system_state_class \"COMMUNITY\", \"COMMUNITY\", \"COMMUNITY\"…\n$ water_system_service_connections 82, 199, 34, 64, 128, 83, 28, 50, 164…\n$ water_system_population_reported 100, 700, 40, 150, 256, 150, 32, 100,…\n$ geometry MULTIPOLYGON (((-132703 …\n```\n\n\n:::\n:::\n\n\n#### Alternative Data Retrieval Method\n\nReading in data from a shapefile is shown above because it's likely one of the more common ways that users will access their *target* boundary data. However, depending on the dataset, there may be other ways to access the data. For example, the code chunk below demonstrates an alternative – using the [`arcgislayers`](https://r.esri.com/arcgislayers/index.html) package [@arcgislayers] – that connects directly to the source dataset (to retrieve the most recent version) and applies the filters needed to reproduce the dataset in the `System_Area_Boundary_Layer_Sac.shp` file. Also, note that storing data in formats other than the common shapefile format – such as the geopackage format – can have some advantages (for example, see [here](http://switchfromshapefile.org/)).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# load arcgislayers package (see: https://r.esri.com/arcgislayers/index.html)\ninstall.packages('pak') # only needed if the pak package is not already installed\npak::pkg_install(\"R-ArcGIS/arcgislayers\", dependencies = TRUE)\n\nlibrary(arcgislayers)\n\n# define link to data source\nurl_feature <- 'https://gispublic.waterboards.ca.gov/portalserver/rest/services/Drinking_Water/California_Drinking_water_system_numberem_Area_Boundaries/FeatureServer/0'\n\n# connect to data source\nwater_system_numberems_feature_layer <- arc_open(url_feature)\n\n# download and filter data from source\nwater_system_numberems_sac <- arc_select(\n water_system_numberems_feature_layer,\n # apply filters\n where = \"COUNTY = 'SACRAMENTO' AND STATE_CLASSIFICATION = 'COMMUNITY'\",\n # select fields\n fields = c('WATER_SYSTEM_NAME', 'WATER_SYSTEM_NUMBER', 'GLOBALID',\n 'BOUNDARY_TYPE', 'OWNER_TYPE_CODE', 'COUNTY',\n 'REGULATING_AGENCY', 'FEDERAL_CLASSIFICATION', 'STATE_CLASSIFICATION',\n 'SERVICE_CONNECTIONS', 'POPULATION')) %>%\n # transform to common coordinate system\n st_transform(crs_projected) %>%\n # rename fields to match names from the shapefile (which automatically truncates field names)\n rename(water_system_name = WATER_SYSTEM_NAME,\n water_system_number = WATER_SYSTEM_NUMBER,\n water_system_id = GLOBALID,\n water_system_boundary_type = BOUNDARY_TYPE,\n water_system_owner_type = OWNER_TYPE_CODE,\n water_system_county = COUNTY,\n water_system_regulating_agency = REGULATING_AGENCY,\n water_system_federal_class = FEDERAL_CLASSIFICATION,\n water_system_state_class = STATE_CLASSIFICATION,\n water_system_service_connections = SERVICE_CONNECTIONS,\n water_system_population_reported = POPULATION)\n```\n:::\n\n\n### Get County Boundaries {#sec-county-boundaries}\n\nWhen accessing census data using the `tidycensus` R package as shown below (in @sec-census-access), it's often useful (though not strictly required) to know which counties overlap the target dataset (note that, even though the dataset is filtered for systems in Sacramento county, there are some systems whose boundaries extend into neighboring counties). County boundaries may also be useful for making maps in later stages of the analysis. We can get a dataset of county boundaries in California from the [TIGER dataset](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html), which can be accessed with R using the [`tigris`](https://github.com/walkerke/tigris) R package [@tigris].\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncounties_ca <- counties(state = 'CA', \n cb = TRUE) %>% # simplified\n st_transform(crs_projected) # transform to common coordinate system\n```\n:::\n\n\nThen, we can get a list of counties that overlap with the boundaries of the Sacramento area community water systems obtained above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncounties_overlap <- counties_ca %>% \n st_filter(water_systems_sac, \n .predicate = st_overlaps)\n\ncounties_list <- counties_overlap %>% pull(NAME)\n```\n:::\n\n\nThe counties in the `counties_list` variable are: San Joaquin, Yolo, Placer, Sacramento.\n\n### Plot Target Data\n\n@fig-suppliers-counties shows the water systems and county boundaries in an interactive map.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmapview(counties_overlap, \n alpha.regions = 0, \n zcol = 'NAME', \n layer.name = 'County', \n legend = FALSE) + \n mapview(water_systems_sac, \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE)\n```\n\n::: {#fig-suppliers-counties .cell-output-display}\n\n```{=html}\n
\n\n```\n\n\nSelected water systems (with county boundaries for reference).\n:::\n:::\n\n\n## Accessing Census Data {#sec-census-access}\n\nThe following sections demonstrate how to retrieve census data from the Decennial Census and the ACS using the [`tidycensus`](https://walker-data.com/tidycensus/index.html) R package.\n\nIn order to use the `tidycensus` R package, you'll need to obtain a personal API key from the US Census Bureau (which is free and available to anyone) by signing up here: . Once you have your API key, you'll need to register it in R by entering the command `census_api_key(key = \"YOUR API KEY\", install = TRUE)` in the console. Note that the `install = TRUE` argument means that the key is saved for all future R sessions, so you'll only need to run that command once on your computer (rather than including it in your scripts). Alternatively, you could save your key to an environment variable and retrieve it using `Sys.getenv()`. Either way will help you avoid the possibility of entering your API key into any scripts that could be shared publicly.\n\n::: callout-caution\nBecause the boundaries of census units (e.g., tracts, block groups, blocks, etc) can change over time, it's important to make sure that the version (year) of the census data you're retrieving matches the version of the census boundary dataset you're using. The methods shown below retrieve the census boundary dataset together with the census demographic data, which ensures that this won't be a potential problem. However, if you use a different workflow that retrieves the geographic boundaries and demographic data via separate processes, you should ensure that the versions are consistent.\n:::\n\n### Decennial Census {#sec-census-access-decennial}\n\nThis section retrieves census data from the Decennial Census, using the `get_decennial` function from the `tidycensus` package. As of this writing, the most recent version of the decennial census data available is from 2020, and we can set that as a variable below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# set year\ndecennial_year <- 2020\n```\n:::\n\n\nNext, we can define the list of demographic variables we'd like to retrieve tabular data for, by saving the census variables we want in the `census_vars_decennial` object (see @sec-census-variables for more information about how to discover variables of interest and find their associated codes). Note that here we're providing descriptive names associated with each variable code, which makes the data easier to work with later, but isn't strictly necessary (i.e., you could just supply the variable codes alone).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# define variables to pull from the decennial census\ncensus_vars_decennial <- c(\n 'population_hispanic_or_latino_count' = 'P2_002N', # Total Hispanic or Latino\n 'population_white_count' = 'P2_005N', # White (Not Hispanic or Latino)\n 'population_black_or_african_american_count' = 'P2_006N', # Black or African American (Not Hispanic or Latino)\n 'population_native_american_or_alaska_native_count' = 'P2_007N', # American Indian and Alaska Native (Not Hispanic or Latino)\n 'population_asian_count' = 'P2_008N', # Asian (Not Hispanic or Latino)\n 'population_pacific_islander_count' = 'P2_009N', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)\n 'population_other_count' = 'P2_010N', # Some other race (Not Hispanic or Latino)\n 'population_multiple_count' = 'P2_011N', # Two or more races (Not Hispanic or Latino)\n 'population_total_count' = 'P2_001N'\n)\n```\n:::\n\n\nThen, we can create an object that we can use to filter our request to the census API so that it will only return census units that overlap with our target areas (the object will be passed to the `filter_by` argument of the `get_decennial` function below). Note that this isn't strictly necessary (you could also apply the filter after making the API request), but may helpful to speed the query and reduce memory usage, especially in the case of large queries.\n\n::: callout-note\nAt the time of this writing, the `filter_by` argument of the tidycensus `get_decennial` function is fairly new, and not yet included in the official documentation.\n\nAlso, the `filter_by` argument is optional, and only appears to accept a simple features (sf) object with a single row / feature (e.g., a single water system), and will not accept an sf object with multiple rows / features. The process below attempts to work around this constraint by joining all of the selected water systems into a single multi-part polygon (i.e., an sf object with a single row). However, if you only want to retrieve data for census units that overlap a single target area (e.g., a single water system), you can skip this step.\n:::\n\n\n::: {.cell}\n\n```{#lst-filter_obj .r .cell-code lst-cap=\"Create object for filtering the API query\"}\nwater_system_numberems_filter <- water_systems_sac %>% \n st_union() %>% \n st_as_sf()\n```\n:::\n\n\nFinally, we can make the data request, using the `get_decennial` function, which accepts several arguments that specify exactly what data to return.\n\nFor this example we're getting data at the 'Block' level (with the `geography = 'block'` argument) for the demographic variables defined above in the `census_vars_decennial` object (which is passed to the `variables` argument). As noted above, block-level data is the most granular level of spatial data available, and should provide the best results when estimating demographics for areas whose boundaries don't align with census unit boundaries. However, depending on the use case, it may require too much time and computational resources to use the most granular spatial data, and may not be necessary to obtain a reasonable estimate. Also, keep in mind that block-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like block groups or tracts).\n\nIn addition to the tabular data associated with the demographic variables in our list, we'll also get the spatial data – i.e., the boundaries of the census blocks – by setting the `geometry = TRUE` argument. When we do this, the tabular demographic data is pre-joined to the spatial data, so the API request returns a single dataset with both the spatial and attribute (demographic) data combined.\n\n::: callout-note\nThe `tidycensus` package generally returns the Census Bureau's [cartographic boundary shapefiles](https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html) by default (as opposed to the [core TIGER/Line shapefiles](https://www.census.gov/geo/maps-data/data/tiger-line.html), which is the default format returned by the `tigris` R package). The default cartographic boundary shapefiles are pre-clipped to the US coastline, and are smaller/faster to process (alternatively you can use `cb = FALSE` to get the core TIGER/Line data) (see [here](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html#better-cartography-with-spatial-overlay)). So the default spatial data returned by `tidycensus` may be somewhat different than the default spatial data returned by the `tigris` package, but in general I find it's best to use the default `tidycensus` spatial data.\n:::\n\n::: callout-warning\nAt the block level, it appears that `tidycensus` only returns the more detailed core TIGER/Line shapefiles (i.e., they are identical to the default block-level geographic data returned by `tigris`). In some cases, that can create minor inconsistencies when working with both blocks and block groups and using the default geographies.\n:::\n\nWe also narrow down the search parameters geographically by specifying the state (with `state = 'CA'`) and counties (`county = counties_list`) we're seeking data for.\n\n::: callout-note\nSupplying a list of counties may not be strictly necessary, especially in cases where you supply the optional `filter_by` argument. However, especially when working with granular data like blocks, supplying the county argument seems to greatly speed the API request.\n:::\n\nAlso, while by default the `tidycensus` package returns data in long/tidy format, we're getting the data in wide format for this example (by specifying `output = 'wide'`) because it'll be easier to work with for the interpolation method described below to estimate demographics for non-census geographies.\n\n\n::: {.cell}\n\n```{#lst-get_decennial .r .cell-code lst-cap=\"Retrieve decennial census data\"}\n# get census data\ncensus_data_decennial <- get_decennial(geography = 'block', # can be 'block', 'block group', 'tract', 'county', etc.\n state = 'CA', \n county = counties_list,\n filter_by = water_system_numberems_filter,\n year = decennial_year,\n variables = census_vars_decennial,\n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) # convert to common coordinate system\n```\n:::\n\n\nThe output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the population of each racial/ethnic group is reported in a separate column. Here's a view of the contents and structure of the Decennial Census data that's returned:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_decennial)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 17,745\nColumns: 12\n$ GEOID \"060670019003011\", \"…\n$ NAME \"Block 3011, Block G…\n$ population_hispanic_or_latino_count 4, 6, 8, 11, 1, 14, …\n$ population_white_count 20, 4, 167, 70, 86, …\n$ population_black_or_african_american_count 2, 2, 0, 8, 9, 18, 0…\n$ population_native_american_or_alaska_native_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_asian_count 19, 5, 2, 1, 23, 8, …\n$ population_pacific_islander_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_other_count 0, 0, 0, 0, 0, 0, 0,…\n$ population_multiple_count 8, 3, 4, 10, 5, 10, …\n$ population_total_count 53, 20, 181, 100, 12…\n$ geometry POLYGON ((-1…\n```\n\n\n:::\n:::\n\n\n### American Community Survey (ACS) {#sec-census-access-acs}\n\nTo get data from the ACS, you can use the `get_acs()` function, which is very similar to the `get_decennial()` function used above. As of this writing, the most recent version of the 5-year ACS data available is the 2018-2022 ACS, and we can set that as a variable below (which makes it easier to update this document in future years).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# set year\nacs_year <- 2022\n```\n:::\n\n\nHowever, since the ACS data contains data on a much broader set of socio-economic metrics, the requested data includes a greatly expanded list of variables, defined in the `census_vars_acs` object (see @sec-census-variables for more information about how to discover variables of interest and find their associated codes). As above, we can provide descriptive names associated with each variable code, which makes the data easier to work with later, but isn't strictly necessary (i.e., you could just supply the variable codes alone).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# define variables to pull from the ACS\ncensus_vars_acs <- c(\n # --- population variables ---\n 'population_total_count' = 'B01003_001',\n 'population_hispanic_or_latino_count' = 'B03002_012', # Total Hispanic or Latino\n 'population_white_count' = 'B03002_003', # White (Not Hispanic or Latino)\n 'population_black_or_african_american_count' = 'B03002_004', # Black or African American (Not Hispanic or Latino)\n 'population_native_american_or_alaska_native_count' = 'B03002_005', # American Indian and Alaska Native (Not Hispanic or Latino)\n 'population_asian_count' = 'B03002_006', # Asian (Not Hispanic or Latino)\n 'population_pacific_islander_count' = 'B03002_007', # Native Hawaiian and Other Pacific Islander (Not Hispanic or Latino)\n 'population_other_count' = 'B03002_008', # Some other race (Not Hispanic or Latino)\n 'population_multiple_count' = 'B03002_009', # Two or more races (Not Hispanic or Latino)\n \n # --- poverty variables ---\n 'poverty_total_assessed_count' = 'B17021_001', # also available from 'B17020_001' (at the tract level only). Total population for whom poverty status is determined. Poverty status was determined for all people except institutionalized people, people in military group quarters, people in college dormitories, and unrelated individuals under 15 years old. These groups were excluded from the numerator and denominator when calculating poverty rates.\n 'poverty_below_level_count' = 'B17021_002', # also available from 'B17020_002' (at the tract level only). Population whose income in the past 12 months is below federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.\n 'poverty_above_level_count' = 'B17021_019', # also available from 'B17020_010' (at the tract level only). Population whose income in the past 12 months is at or above federal poverty level. A family and every individual in it are considered to be in poverty if the family's total income is less than the dollar value of a threshold that varies depending upon size of family, number of children, & age of householder (for 1- & 2- person households). Income is the sum of wage/salary income; net self-employment income; interest/dividends/net rental/royalty income/income from estates & trusts; Social Security/Railroad Retirement income; Supplemental Security Income (SSI); public assistance/welfare payments; retirement/survivor/disability pensions; & all other income.\n \n # --- household variables ---\n 'households_count' = 'B19001_001', # also available from variable 'B19053_001'. A household includes all the people who occupy a housing unit - a house, an apartment, a mobile home, a group of rooms, or a single room that is occupied. People not living in households are classified as living in group quarters.\n 'average_household_size' = 'B25010_001', # A measure obtained by dividing the number of people living in occupied housing units by the total number of occupied housing units. This measure is rounded to the nearest hundredth.\n \n # --- household income variables ---\n 'median_household_income' = 'B19013_001', # also available from 'B19019_001' (at the tract level only). Income in the past 12 months is the sum of wage or salary income; net self-employment income; interest, dividends, or net rental or royalty income or income from estates and trusts; Social Security or Railroad Retirement income; Supplemental Security Income (SSI); public assistance or welfare payments; retirement, survivor, or disability pensions; and all other income.\n 'households_income_below_10k_count' = 'B19001_002', # count of households with income below $10,000 \n 'households_income_10k_15k_count' = 'B19001_003', # count of households with income $10,000 to $15,000 \n 'households_income_15k_20k_count' = 'B19001_004', \n 'households_income_20k_25k_count' = 'B19001_005', \n 'households_income_25k_30k_count' = 'B19001_006', \n 'households_income_30k_35k_count' = 'B19001_007', \n 'households_income_35k_40k_count' = 'B19001_008', \n 'households_income_40k_45k_count' = 'B19001_009', \n 'households_income_45k_50k_count' = 'B19001_010', \n 'households_income_50k_60k_count' = 'B19001_011', \n 'households_income_60k_75k_count' = 'B19001_012', \n 'households_income_75k_100k_count' = 'B19001_013', \n 'households_income_100k_125k_count' = 'B19001_014', \n 'households_income_125k_150k_count' = 'B19001_015', \n 'households_income_150k_200k_count' = 'B19001_016',\n 'households_income_above_200k_count' = 'B19001_017', # count of households with income above $200,000\n\n # --- housing costs variables (% of household income) ---\n # Housing Costs as a Percentage of Household Income in the past 12 months - NOTE: THIS TABLE IS NEW FOR THE 2022 ACS, AND WON'T BE AVAILABLE FOR PREVIOUS YEARS - Table B25140 shows the count of households paying more than 30% of their income towards housing costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented). The table also shows the number of households paying more than 50% of their income toward housing costs.\n # 'households_count' = 'B25140_001', \n 'households_mortgage_total_count' = 'B25140_002',\n 'households_mortgage_housing_costs_over30pct_count' = 'B25140_003',\n 'households_mortgage_housing_costs_over50pct_count' = 'B25140_004',\n 'households_no_mortgage_total_count' = 'B25140_006',\n 'households_no_mortgage_housing_costs_over30pct_count' = 'B25140_007',\n 'households_no_mortgage_housing_costs_over50pct_count' = 'B25140_008',\n 'households_rent_total_count' = 'B25140_010',\n 'households_rent_housing_costs_over30pct_count' = 'B25140_011',\n 'households_rent_housing_costs_over50pct_count' = 'B25140_012',\n \n # --- other income / economic variables ---\n 'per_capita_income' = 'B19301_001' # note: per capita income by race (at block group level) available in table B19301I\n)\n```\n:::\n\n\nFinally, we can make the data request, using the `get_acs` function, which is very similar to the `get_decennial` function described above ( @sec-census-access-decennial). However, for this example we're getting data at the 'Block Group' level (with the `geography = 'block group'` argument), which is the most granular level of spatial data available for ACS data. But, keep in mind that block group-level data may not be available for all variables, and some variables may only be available at less granular spatial scales (like tracts). Note that the `water_system_numberems_filter` object supplied to the `filter_by` argument was created above in @lst-filter_obj.\n\n\n::: {.cell}\n\n```{#lst-get_acs .r .cell-code lst-cap=\"Retrieve ACS data\"}\n# get census data\ncensus_data_acs <- get_acs(geography = 'block group',\n state = 'CA', \n county = counties_list,\n filter_by = water_system_numberems_filter,\n year = acs_year,\n survey = 'acs5',\n variables = census_vars_acs, \n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) # convert to common coordinate system\n```\n:::\n\n\nAs above, the output is an sf object (i.e., a dataframe-like object that also includes spatial data), in wide format, where each row represents a census unit, and the each demographic variable is reported in a separate column. Here's a view of the contents and structure of the 2022 5-year ACS data that's returned (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_acs[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 1,054\nColumns: 21\n$ GEOID \"060670081451\", \"06…\n$ NAME \"Block Group 1; Cen…\n$ population_total_countE 1768, 1881, 1098, 2…\n$ population_total_countM 520, 585, 395, 583,…\n$ population_hispanic_or_latino_countE 38, 327, 376, 782, …\n$ population_hispanic_or_latino_countM 59, 298, 280, 315, …\n$ population_white_countE 1627, 1337, 293, 18…\n$ population_white_countM 521, 475, 191, 460,…\n$ population_black_or_african_american_countE 0, 1, 272, 26, 351,…\n$ population_black_or_african_american_countM 13, 3, 251, 38, 334…\n$ population_native_american_or_alaska_native_countE 41, 0, 0, 26, 0, 0,…\n$ population_native_american_or_alaska_native_countM 58, 13, 13, 42, 13,…\n$ population_asian_countE 45, 0, 105, 58, 144…\n$ population_asian_countM 71, 13, 116, 66, 18…\n$ population_pacific_islander_countE 0, 98, 0, 0, 27, 13…\n$ population_pacific_islander_countM 13, 98, 13, 13, 50,…\n$ population_other_countE 0, 0, 39, 0, 0, 0, …\n$ population_other_countM 13, 13, 63, 13, 13,…\n$ population_multiple_countE 17, 118, 13, 39, 15…\n$ population_multiple_countM 27, 125, 20, 57, 25…\n$ geometry POLYGON ((-…\n```\n\n\n:::\n:::\n\n\nNote that the dataset that's returned includes fields corresponding to Margin of Error (MOE) for each variable we've requested (these are the fields that end with two digits and an M – e.g., \"001M\"), since, as noted above in @sec-census-datasets , the ACS is based on a sample of the population and reports estimated values.\n\n::: callout-tip\nIt is possible to calculate MOEs for derived estimates – e.g., when aggregating groups of census units – and in many cases it may be worthwhile to do that to provide extra context to the data. However, it may be difficult to do for more complex aggregations, such as the areal interpolation shown below. For guidance on how calculate MOEs for some types of derived estimates, see [this document](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf).\n\n`tidycensus` also has functions for calculating derives margins of error based on Census-supplied formulas, including [`moe_sum()`](https://walker-data.com/tidycensus/reference/moe_sum.html), [`moe_product()`](https://walker-data.com/tidycensus/reference/moe_product.html), [`moe_ratio()`](https://walker-data.com/tidycensus/reference/moe_ratio.html), and [`moe_prop()`](https://walker-data.com/tidycensus/reference/moe_prop.html).\n:::\n\nBecause we won't be incorporating those MOEs into the analysis below, we can drop them for this example, then clean up the field names.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# drop MOE fields\ncensus_data_acs <- census_data_acs %>% \n select(-matches('M$')) # the $ specifies \"ends with\"\n\n# clean names\nnames(census_data_acs) <- names(census_data_acs) %>% \n str_remove('E$') %>% # remove 'E' (estimate) from field names\n str_replace('NAM', 'NAME') # add 'E' back to NAME field\n```\n:::\n\n\nHere's a view of the contents and structure of the revised 2022 5-year ACS dataset (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(census_data_acs[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 1,054\nColumns: 21\n$ GEOID \"060670081451\", \"060…\n$ NAME \"Block Group 1; Cens…\n$ population_total_count 1768, 1881, 1098, 27…\n$ population_hispanic_or_latino_count 38, 327, 376, 782, 3…\n$ population_white_count 1627, 1337, 293, 181…\n$ population_black_or_african_american_count 0, 1, 272, 26, 351, …\n$ population_native_american_or_alaska_native_count 41, 0, 0, 26, 0, 0, …\n$ population_asian_count 45, 0, 105, 58, 144,…\n$ population_pacific_islander_count 0, 98, 0, 0, 27, 13,…\n$ population_other_count 0, 0, 39, 0, 0, 0, 0…\n$ population_multiple_count 17, 118, 13, 39, 15,…\n$ poverty_total_assessed_count 1768, 1847, 1098, 27…\n$ poverty_below_level_count 101, 328, 272, 116, …\n$ poverty_above_level_count 1667, 1519, 826, 263…\n$ households_count 680, 718, 405, 905, …\n$ average_household_size 2.59, 2.62, 2.71, 2.…\n$ median_household_income 123500, 66768, 56216…\n$ households_income_below_10k_count 18, 47, 10, 22, 6, 1…\n$ households_income_10k_15k_count 0, 0, 24, 0, 15, 231…\n$ households_income_15k_20k_count 0, 13, 18, 0, 51, 12…\n$ geometry POLYGON ((-1…\n```\n\n\n:::\n:::\n\n\nFor further analysis, we may want to get the statewide data as a baseline for comparison (this could also be done for other scales, like the county level). We can use a similar process to get that data and clean/format it to match the more detailed data obtained above. Note that in this case we're also using the 5-year ACS (even though the 1-year ACS is also available at the statewide level, and would provide more up-to-date data) so that the statewide data will be directly comparable to the block group level data obtained above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_acs_state <- get_acs(geography = 'state',\n state = 'CA', \n year = acs_year,\n survey = 'acs5',\n variables = census_vars_acs, \n output = 'wide', # can be 'wide' or 'tidy'\n geometry = TRUE,\n cache_table = TRUE) %>% \n st_transform(crs_projected) %>% # convert to common coordinate system\n select(-matches('M$')) %>% # the $ specifies \"ends with\"\n # clean names (note this is a little different than the way we renamed fields above, either works)\n rename_with(.fn = ~ str_remove(., # remove 'E' (estimate) from field names\n pattern = 'E$')) %>% \n rename_with(.fn = ~ str_replace(., # add 'E' back to NAME field\n pattern = 'NAM', \n replacement = 'NAME'))\n```\n:::\n\n\n### Plot Census & Supplier Data {#sec-census-plot}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsystem_plot <- 'SACRAMENTO SUBURBAN WATER DISTRICT'\n```\n:::\n\n\n@fig-suppliers-census-map shows the 2022 5-year ACS census units that overlap with one of the water systems (Sacramento Suburban Water District) that we'll compute demographics for below (plotting the census units that overlap all systems tends to be slow in this format).\n\n::: {#fig-suppliers-census-map}\n\n::: {.cell}\n\n```{.r .cell-code}\n# label: fig-suppliers-census-map\n# fig-cap: \"Water system (filled polygon) and boundaries of census units (light blue) that will be used to estimate water system demographics.\"\n\nmapview(water_systems_sac %>% \n filter(water_system_name == system_plot), \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE) +\n mapview(census_data_acs %>% \n st_filter(water_systems_sac %>% \n filter(water_system_name == system_plot)), \n alpha.regions = 0, \n color = 'cyan', \n lwd = 1.3, label = 'NAME', \n layer.name = 'ACS Data', \n legend = FALSE) # zcol = 'NAME'\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\nWater system Sacramento Suburban Water District (filled polygon) and boundaries of census units (light blue) that will be used to estimate water system demographics.\n:::\n\n## Compute Water System Demographics {#sec-estimate-demographics}\n\nNow we can perform the calculations to estimate demographic characteristics for our *target* areas (water system service boundaries in the Sacramento County area) from our *source* demographic dataset (the census data we obtained above). For this example, we'll use the 2022 5-year ACS data that we retrieved above (which is saved in the `census_data_acs` variable) as our source of demographic data, and we'll estimate the following for each water system's service area:\n\n- Population of each racial/ethnic group (using the racial/ethnic categories defined in the census dataset), and each racial/ethnic group's portion of the total service area population\n- Socio-economic variables like poverty rate, median household income, income distributions, and per capita income\n\nThere are multiple ways this estimation can be done. For this example, we'll employ a three step strategy:\n\n1. Estimate values for count-based variables (typically referred to as 'extensive' data types) – e.g., total population, popultion by race/ethnicity, population above / below poverty rate, households by income bracket – for overlapping census unit, using areal interpolation. This is essentially an area weighted average, which estimates how much of each *source* unit's (census unit) count applies to the *target* area (a given water service area), based on the portion of its area that overlaps that target area – for more information about the process, see [this documentation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html) from the `areal` R package. For example, for a census unit that partially overlaps a service area, only a fraction of its count for a given variable will be applied to that service area; for a census unit that completely overlaps a service area, the full count for that variable will be applied to the service area.\n\n The major simplifying assumption of this approach is that the population or count-based variable of interest are evenly distributed within each unit in the *source* data. For example, in this case we're assuming that population (including the total population and the population of each racial/ethic group), households of each income bracket, populations above / below the poverty rate, etc. are evenly distributed within each census block group.\n\n::: callout-tip\nWhile this section uses the block group-level count data from the 5-year ACS, there may be cases where it could be useful or necessary to use more granular block-level population data from the decennial census to estimate population densities and distributions within larger census units, like block groups and tracts. This could especially be the case when estimating characteristics for small areas in rural environments. See @sec-detailed-pop-estimates and/or @sec-small-area-estimates for more information.\n:::\n\n2. Using the estimated count data (populations, households, etc), compute weighted values for variables that describe those populations, using the associated count data as a weighting factor (e.g., population-weighted values for population based data, or household-weighted values for household-based data) – these variables are typically referred to as 'intensive' data types.\n\n::: callout-tip\nAlthough it's possible to use areal interpolation to aggregate these variables as well, the multi-step approach described here can be useful because we know (from the population / household count data) that population densities differ between census units. Since we have a reasonable estimate of the count data (population, households, etc) within each census unit, using a population or household weighted average likely will yield more accurate results than a simple area-weighted average for these variables. For example, for per capita income, we can use the estimated population counts to produce a population weighted average per capita income (rather than an area weighted average per capita income, which is likely less meaningful as it over-weights large census areas with lower population densities). Areal interpolation may be more useful for cases where we generally have no other information about how density varies between the source polygons (unless significantly more effort is invested, such as looking at aerial imagery data)\n:::\n\n3. Aggregate interpolated values at the water system level.\n\n### Prepare Census Data\n\nNote that we already transformed the 2022 5-year ACS dataset into the common projected coordinate reference system used for this example immediately after we downloaded the data using the `get_acs()` function (see @lst-get_acs). This allows us to work with the water system data and the census data together in a common coordinate system.\n\nBefore calculating demographics for the *target* areas, we can do a bit of additional transformation to prepare the census data if needed. For example, we can combine the 'other' and 'multiple' racial/ethnic groupings into one 'other or multiple' racial/ethnic group.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## combine other and multiple\ncensus_data_acs <- census_data_acs %>% \n mutate('population_other_or_multiple_count' = population_other_count + population_multiple_count, \n .after = population_pacific_islander_count) %>% \n select(-c(population_other_count, population_multiple_count))\n```\n:::\n\n\nWe can also calculate the poverty rate for each census unit (which may be useful for presenting results later).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_acs <- census_data_acs %>% \n mutate(poverty_rate_pct_calc_census_unit = case_when(\n poverty_total_assessed_count == 0 ~ 0,\n .default = 100 * poverty_below_level_count / poverty_total_assessed_count\n ), \n .after = poverty_above_level_count)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# We can also drop census units with zero population, since they won't contribute anything to our calculations.\n\n## drop census units with zero population\n# census_data_acs <- census_data_acs %>% \n# filter(population_total > 0)\n```\n:::\n\n\n### Interpolation Step 1: Areal Interpolation (for Count Variables) {#sec-areal-interp}\n\nThere are a couple of ways to implement the areal interpolation method. The example below 'manually' implements the process using functions from the `sf` package, for reasons described below. However, note that there are R packages which make it possible to perform areal interpolation with a single function - for example, the `sf` package's [`st_interpolate_aw`](https://r-spatial.github.io/sf/reference/interpolate_aw.html) function and the [`areal`](https://chris-prener.github.io/areal/) package's [`aw_interpolate`](https://chris-prener.github.io/areal/reference/aw_interpolate.html) function. This example uses a more 'manual' approach because this makes it possible to use the multi-step process described above, and also produces useful intermediate calculated data for mapping and visualization. However, we can use the single-function approach to double check our implementation of the areal interpolation approach for the count data (see @sec-check-areal-interp).\n\nFirst, we clip the census data to the water system boundaries:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_clip <- census_data_acs %>% \n mutate(census_unit_area = st_area(.)) %>% \n st_intersection(water_systems_sac) %>% \n mutate(clipped_area = st_area(.)) %>% \n mutate(areal_weight_factor = drop_units(clipped_area / census_unit_area))\n```\n:::\n\n\n@fig-map-clipped-polygons shows a plot of the census units clipped to the Sacramento Suburban Water District water system, along with the original/complete census units. Note that you can toggle layers on and off (and change their order of appearance) using the layers button in the upper left part of the map (below the zoom buttons).\n\n::: {#fig-map-clipped-polygons}\n\n::: {.cell}\n\n```{.r .cell-code}\nmapview(water_systems_sac %>% \n filter(water_system_name == system_plot), \n zcol = 'water_system_name', \n layer.name = 'Water System', \n legend = FALSE) + \n mapview(census_data_acs %>% \n st_filter(water_systems_sac %>% \n filter(water_system_name == system_plot)), \n alpha.regions = 0.15, \n col.regions = 'grey', \n color = 'black', \n lwd = 1, \n label = 'NAME', \n layer.name = 'ACS Data Full', \n legend = FALSE) +\n mapview(census_data_clip %>% \n filter(water_system_name == system_plot),\n alpha.regions = 0, \n color = 'cyan', \n lwd = 1.3, \n label = 'NAME', \n layer.name = 'ACS Data Clipped', \n legend = FALSE)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\nWater system Sacramento Suburban Water District (filled polygon), boundaries of overalpping census units (grey), and clipped portions of census units (light blue) that will be used to estimate water system demographics.\n:::\n\nNext, we can compute the area-weighted counts for the portions of census units that overlap each water system boundary:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate <- census_data_clip %>% \n mutate(\n across(\n .cols = ends_with('_count'),\n .fns = ~ .x * areal_weight_factor\n )) \n```\n:::\n\n\nAs noted above, it's also possible to use pre-built functions from several R packages to perform areal interpolation in a single step. Since we're using a three-step process, which also implements population weighted averaging for some variables, we're not using those functions directly in this example. However, they can be a useful check to validate our computed count data, but only after we aggregate our data at the system level – see @sec-check-areal-interp for more details.\n\n### Interpolation Step 2: Compute Population Weighted Values (Intensive Variables) {#sec-pop-interp}\n\nCompute population weighted values\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate <- census_data_interpolate %>% \n mutate(average_household_size_weighted = average_household_size * households_count,\n median_household_income_weighted = median_household_income * households_count,\n per_capita_income_weighted = per_capita_income * population_total_count)\n```\n:::\n\n\n::: callout-caution\nTo calculate an aggregated value for a variable like median household income, which depends on the distribution of the underling data, it may be worth considering whether a weighed average value is an appropriate measure. In some cases, it may be more appropriate to use the counts in each income bracket to estimate a median income, and/or present the income distribution rather than a single value.\n\nFor a discussion of the problem and a proposed solution, see [this document](https://www.documentcloud.org/documents/6165014-How-to-Recalculate-a-Median.html#document/p1).\n:::\n\n### Interpolation Step 3: Aggregate by Water System\n\nNext, we need to combine the weighted values calculated above to produce the estimates for each water system, and can also use those combined values to compute some additional metrics for each system (like rates, income distributions, etc.).\n\n#### Combine Results by Water System\n\nFirst, combine the results by summing all of the count-based variables (derived from areal interpolation), and calculating weighted averages for all variables computed in step 2 above. Note that we have to first calculate the denominator for each variable calculated with population weighted interpolation, because some of those variables contain missing values for records where the denominator is present (and if we don't remove the missing values, we get an `NA` for any water system that contains a block group with a missing value for that variable). For example, there are block groups where the median household income is missing, but the total household count is available for that block group – in that case, the weighted average should not include the households in that block group in the denominator; otherwise, the true value will be underestimated.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- census_data_interpolate %>% \n group_by(water_system_name) %>% \n mutate(\n average_household_size_denominator = if_else(is.na(average_household_size), 0, households_count),\n median_household_income_denominator = if_else(is.na(median_household_income), 0, households_count),\n per_capita_income_denominator = if_else(is.na(per_capita_income), 0, population_total_count)\n ) %>% \n summarize(\n across(\n .cols = ends_with('_count'),\n .fns = ~ sum(.x)\n ),\n average_household_size_hh_weighted = \n sum(average_household_size_weighted, na.rm = TRUE) / \n sum(average_household_size_denominator),\n median_household_income_hh_weighted = \n sum(median_household_income_weighted, na.rm = TRUE) /\n sum(median_household_income_denominator),\n per_capita_income_pop_weighted = \n sum(per_capita_income_weighted, na.rm = TRUE) / \n sum(per_capita_income_denominator)\n ) %>% \n ungroup()\n```\n:::\n\n\n#### Check - Count Variables Estimated with Areal Interpolation {#sec-check-areal-interp}\n\nAs noted above, it's also possible to use pre-built functions for areal interpolation. This section demonstrates those functions and uses them as a check of our computed count data.\n\nFrom the `sf` package, we can use the `st_interpolate_aw` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# NOTE: it's only necessary to check the estimated values for one variable - \n# this just checks the total estimated population\n\n# sf package\ncheck_sf <- st_interpolate_aw(x = census_data_acs %>% \n select(population_total_count),\n to = water_systems_sac,\n extensive = TRUE) %>% \n bind_cols(water_systems_sac %>% st_drop_geometry)\n\n# check - should be TRUE if results are equivalent\nall(check_sf %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5) ==\n water_system_demographics %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\nFrom the `areal` package, we can use the `aw_interpolate` function. Note that there are some settings that you may need to modify in the `aw_interpolate` function depending on the type of analysis you're doing. In particular, for more information about the `weight` argument – which can be either `sum` or `total` – see [this section of the documentation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html#calculating-weights-for-extensive-interpolations). For more information about extensive versus intensive interpolations, see [this section of the documenation](https://chris-prener.github.io/areal/articles/areal-weighted-interpolation.html#extensive-and-intensive-interpolations) (as noted above, the method applied here avoids using areal interpolation to calculate intensive variables, because area may not be a good metric for determining how to weight those variables, considering that we can estimate associated counts for populations/households/etc.).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# NOTE: it's only necessary to check the estimated values for one variable - \n# this just checks the total estimated population\n\n# areal package\ncheck_areal <- aw_interpolate(water_systems_sac,\n tid = water_system_name,\n source = census_data_acs,\n sid = GEOID,\n weight = 'total',\n extensive = c('population_total_count'))\n\n# check - should be TRUE if results are equivalent\nall(check_areal %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5) ==\n water_system_demographics %>% \n arrange(water_system_name) %>% \n pull(population_total_count) %>% \n round(5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\n#### Clean & Format Summarized Water System Demographic Data {#sec-results-clean}\n\nWe could stop here, and save the dataset containing the results to an output file (done below - see @sec-results-save). But, it may be useful to do some additional computations and re-formatting before saving the dataset. For example, in this case it may be useful to calculate the racial/ethnic breakdown of each system's population as percentages of the total population (in addition to the total counts computed above), and calculate other rates / distributions.\n\nFirst we can add fields with each racial/ethnic group's estimated percent of the total population within each water system's service area:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>%\n mutate(\n across(\n .cols = starts_with('population_'),\n .fns = ~ round(.x / population_total_count * 100, 2),\n .names = \"{str_replace(.col, '_count', '_percent')}\"\n ),\n .after = population_other_or_multiple_count) %>% \n select(-population_total_percent) # this always equals 1, not needed\n```\n:::\n\n\nWe can also calculate the estimated poverty rate for each water system's service area.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(poverty_rate_percent = case_when(\n poverty_total_assessed_count == 0 ~ 0,\n .default = 100 * poverty_below_level_count / poverty_total_assessed_count\n ), \n .after = poverty_above_level_count)\n```\n:::\n\n\nAnd compute income brackets in 25k increments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(households_income_0_25k_count = \n households_income_below_10k_count + \n households_income_10k_15k_count + \n households_income_15k_20k_count +\n households_income_20k_25k_count,\n households_income_25k_50k_count =\n households_income_25k_30k_count + \n households_income_30k_35k_count +\n households_income_35k_40k_count +\n households_income_40k_45k_count +\n households_income_45k_50k_count,\n households_income_50k_75k_count =\n households_income_50k_60k_count +\n households_income_60k_75k_count,\n .after = households_income_above_200k_count\n ) # note - above 75k is already in 25k increments\n```\n:::\n\n\nAnd compute income brackets in 50k increments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>% \n mutate(households_income_0_50k_count = \n households_income_0_25k_count + \n households_income_25k_50k_count,\n households_income_50k_100k_count =\n households_income_50k_75k_count +\n households_income_75k_100k_count,\n households_income_100k_150k_count =\n households_income_100k_125k_count +\n households_income_125k_150k_count,\n .after = households_income_50k_75k_count\n ) # above 150k is already in 50k increments\n```\n:::\n\n\nAnd compute grouped median household income:\n\n\n::: {.cell}\n\n:::\n\n\nAnd compute \\# and % of households below income thresholds:\n\n\n::: {.cell}\n\n:::\n\n\nAnd, compute the portion of households paying more than 30% / 50% of their income toward housing costs:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# portion of households paying more than 30% / 50% of income on housing\nwater_system_demographics <- water_system_demographics %>%\n mutate(households_all_housing_costs_over30pct_percent = \n 100 * (households_mortgage_housing_costs_over30pct_count + \n households_no_mortgage_housing_costs_over30pct_count +\n households_rent_housing_costs_over30pct_count) / \n households_count, \n .after = households_rent_housing_costs_over50pct_count) %>% \n mutate(households_all_housing_costs_over50pct_percent = \n 100 * (households_mortgage_housing_costs_over50pct_count + \n households_no_mortgage_housing_costs_over50pct_count +\n households_rent_housing_costs_over50pct_count) / \n households_count,\n .after = households_all_housing_costs_over30pct_percent)\n```\n:::\n\n\nFinally, we can round the estimated values to appropriate levels of precision:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics <- water_system_demographics %>%\n mutate(\n across(\n .cols = ends_with('_count'),\n .fns = ~ round(.x, 0)\n )) %>%\n mutate(\n across(\n .cols = ends_with('_percent'),\n .fns = ~ round(.x, 2)\n ))\n```\n:::\n\n\nWe now have a dataset with the selected metrics from the census data (*source* data) estimated for each of the water system service areas (*target* geographic features). Here's a view of the contents and structure of the re-formatted dataset (only the first few fields are shown):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_system_demographics[,1:20])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 21\n$ water_system_name \"B & W RESORT MARI…\n$ population_total_count 0, 22603, 33120, 1…\n$ population_hispanic_or_latino_count 0, 10939, 5245, 34…\n$ population_white_count 0, 3504, 19456, 23…\n$ population_black_or_african_american_count 0, 2663, 3199, 197…\n$ population_native_american_or_alaska_native_count 0, 121, 113, 70, 0…\n$ population_asian_count 0, 4075, 2947, 108…\n$ population_pacific_islander_count 0, 240, 77, 59, 0,…\n$ population_other_or_multiple_count 0, 1060, 2082, 110…\n$ population_hispanic_or_latino_percent 41.43, 48.40, 15.8…\n$ population_white_percent 52.47, 15.50, 58.7…\n$ population_black_or_african_american_percent 0.00, 11.78, 9.66,…\n$ population_native_american_or_alaska_native_percent 0.00, 0.54, 0.34, …\n$ population_asian_percent 4.55, 18.03, 8.90,…\n$ population_pacific_islander_percent 0.00, 1.06, 0.23, …\n$ population_other_or_multiple_percent 1.56, 4.69, 6.29, …\n$ poverty_total_assessed_count 0, 22556, 33034, 1…\n$ poverty_below_level_count 0, 6010, 3389, 313…\n$ poverty_above_level_count 0, 16546, 29645, 6…\n$ poverty_rate_percent 22.60, 26.64, 10.2…\n$ geometry POLYGON ((…\n```\n\n\n:::\n:::\n\n\n@tbl-water-sys-demographics-rev provides a complete view of the cleaned and re-formatted dataset. These results are saved locally in tabular and spatial format in @sec-results-save.\n\n\n::: {#tbl-water-sys-demographics-rev .cell .tbl-cap-location-top tbl-cap='Water System Demographics'}\n\n```{.r .cell-code}\nwater_system_demographics %>%\n kable(caption = 'A Caption') %>%\n scroll_box(height = \"400px\")\n```\n\n::: {.cell-output-display}\n`````{=html}\n
\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
A Caption
water_system_name population_total_count population_hispanic_or_latino_count population_white_count population_black_or_african_american_count population_native_american_or_alaska_native_count population_asian_count population_pacific_islander_count population_other_or_multiple_count population_hispanic_or_latino_percent population_white_percent population_black_or_african_american_percent population_native_american_or_alaska_native_percent population_asian_percent population_pacific_islander_percent population_other_or_multiple_percent poverty_total_assessed_count poverty_below_level_count poverty_above_level_count poverty_rate_percent households_count households_income_below_10k_count households_income_10k_15k_count households_income_15k_20k_count households_income_20k_25k_count households_income_25k_30k_count households_income_30k_35k_count households_income_35k_40k_count households_income_40k_45k_count households_income_45k_50k_count households_income_50k_60k_count households_income_60k_75k_count households_income_75k_100k_count households_income_100k_125k_count households_income_125k_150k_count households_income_150k_200k_count households_income_above_200k_count households_income_0_25k_count households_income_25k_50k_count households_income_50k_75k_count households_income_0_50k_count households_income_50k_100k_count households_income_100k_150k_count households_mortgage_total_count households_mortgage_housing_costs_over30pct_count households_mortgage_housing_costs_over50pct_count households_no_mortgage_total_count households_no_mortgage_housing_costs_over30pct_count households_no_mortgage_housing_costs_over50pct_count households_rent_total_count households_rent_housing_costs_over30pct_count households_rent_housing_costs_over50pct_count households_all_housing_costs_over30pct_percent households_all_housing_costs_over50pct_percent average_household_size_hh_weighted median_household_income_hh_weighted per_capita_income_pop_weighted geometry
B & W RESORT MARINA 0 0 0 0 0 0 0 0 41.43 52.47 0.00 0.00 4.55 0.00 1.56 0 0 0 22.60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40.53 21.84 2.030000 51977.00 40522.00 POLYGON ((-138282.2 13643.2...
CAL AM FRUITRIDGE VISTA 22603 10939 3504 2663 121 4075 240 1060 48.40 15.50 11.78 0.54 18.03 1.06 4.69 22556 6010 16546 26.64 6900 354 339 521 263 367 302 359 355 565 692 876 784 459 235 287 141 1477 1948 1569 3425 2352 694 1620 745 345 1236 95 58 4044 2131 1059 43.06 21.18 3.257806 53040.44 20519.57 POLYGON ((-127001.4 54266.8...
CALAM - ANTELOPE 33120 5245 19456 3199 113 2947 77 2082 15.84 58.74 9.66 0.34 8.90 0.23 6.29 33034 3389 29645 10.26 10529 315 184 101 122 116 469 248 368 449 737 1077 1669 1501 1077 1158 937 723 1650 1814 2373 3483 2578 5544 1861 621 1747 184 106 3238 1678 649 35.36 13.07 3.134530 93741.55 34660.44 POLYGON ((-120906.3 77326.5...
CALAM - ARDEN 10112 3433 2392 1977 70 1082 59 1100 33.95 23.65 19.55 0.69 10.70 0.58 10.87 10034 3130 6904 31.19 3823 201 259 239 167 319 190 142 236 207 440 394 535 228 148 62 58 866 1093 834 1959 1368 376 265 84 46 133 8 3 3426 2124 1170 57.97 31.87 2.623643 49624.62 22770.82 POLYGON ((-123052 64046.06,...
CALAM - ISLETON 34 14 17 0 0 2 0 1 42.06 51.14 0.00 0.00 4.55 0.00 2.25 34 7 27 20.89 16 1 1 0 1 1 0 1 1 0 2 1 1 3 1 0 1 4 3 3 6 4 4 6 4 1 7 2 2 4 1 1 39.45 20.68 2.078994 57361.76 40672.21 POLYGON ((-138730.9 17272.8...
CALAM - LINCOLN OAKS 42916 9056 26529 1486 143 2706 288 2708 21.10 61.82 3.46 0.33 6.31 0.67 6.31 42823 4074 38749 9.51 15621 740 375 308 622 488 616 585 629 645 1035 1641 2442 1889 1272 1555 778 2046 2964 2675 5010 5118 3161 7390 2671 919 3332 503 298 4900 2523 1302 36.46 16.13 2.730281 82035.52 33728.94 POLYGON ((-117495.2 73240.4...
CALAM - PARKWAY 58635 18665 8921 6965 21 19228 1386 3449 31.83 15.21 11.88 0.04 32.79 2.36 5.88 58434 9804 48630 16.78 17667 1081 753 514 713 694 640 713 700 727 1145 1918 2490 1634 1532 1546 865 3061 3475 3064 6536 5554 3166 7163 2719 1049 3418 647 383 7086 3517 1917 38.96 18.96 3.284608 72938.51 26938.14 POLYGON ((-124522.5 52428.5...
CALAM - SUBURBAN ROSEMONT 57897 13791 25062 7725 91 6905 380 3942 23.82 43.29 13.34 0.16 11.93 0.66 6.81 57661 8374 49287 14.52 21045 1156 612 472 744 653 568 582 874 628 1289 2508 3438 2595 1594 1671 1661 2985 3305 3797 6290 7235 4189 8262 2262 730 3425 439 271 9358 4521 2320 34.31 15.78 2.726937 81229.87 34497.37 POLYGON ((-119360.4 58937.6...
CALAM - WALNUT GROVE 12 5 5 0 0 1 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 12 2 10 15.75 5 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 1 2 2 2 0 2 0 0 1 0 0 2 1 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-131705.3 26403.6...
CALIFORNIA STATE FAIR 532 78 262 91 0 48 0 52 14.68 49.25 17.13 0.00 9.10 0.00 9.85 526 152 374 28.89 285 65 13 8 5 9 14 2 0 23 29 30 35 21 11 17 3 91 48 59 140 93 32 0 0 0 0 0 0 285 177 95 62.11 33.45 1.820000 52886.00 33141.00 POLYGON ((-125611.2 65287.3...
CARMICHAEL WATER DISTRICT 39253 6192 25026 2230 68 3326 295 2116 15.78 63.76 5.68 0.17 8.47 0.75 5.39 38700 5000 33700 12.92 15937 570 534 513 472 398 607 522 684 541 996 1595 1782 1724 1200 1678 2122 2088 2751 2591 4839 4373 2924 5256 1399 669 3147 358 177 7534 4056 2068 36.48 18.29 2.405914 96967.64 46901.80 POLYGON ((-117711 65208.06,...
CITRUS HEIGHTS WATER DISTRICT 68912 12380 48148 2092 162 2875 71 3186 17.96 69.87 3.04 0.23 4.17 0.10 4.62 68581 6961 61620 10.15 25633 1012 569 446 769 665 867 841 723 1165 1875 3057 3954 2744 2332 2533 2080 2796 4261 4932 7057 8886 5075 10344 3553 1380 4293 554 286 10996 5759 2620 38.49 16.72 2.653808 82960.78 37323.17 POLYGON ((-114405.5 72735.6...
CITY OF SACRAMENTO MAIN 516189 151211 159508 62060 1249 98585 9242 34334 29.29 30.90 12.02 0.24 19.10 1.79 6.65 508800 77003 431797 15.13 194000 9540 9401 6217 6407 5804 6255 6278 6139 6729 13349 17396 26982 20453 15080 17439 20531 31564 31205 30745 62769 57728 35533 67435 21769 8217 29857 3476 1805 96708 47510 24524 37.50 17.81 2.609594 84694.02 39105.61 POLYGON ((-133314 51929.51,...
DEL PASO MANOR COUNTY WATER DI 5592 687 3967 390 15 119 31 382 12.28 70.95 6.97 0.26 2.13 0.56 6.84 5592 621 4971 11.10 2222 170 45 54 66 21 51 66 237 40 158 278 166 171 120 347 231 336 416 436 752 601 291 922 326 189 572 112 68 729 509 114 42.59 16.67 2.516895 90374.38 40254.83 POLYGON ((-120068.3 65980.9...
DELTA CROSSING MHP 0 0 0 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 0 0 0 17.42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 POLYGON ((-132498.4 40410.2...
EAST WALNUT GROVE [SWS] 3 2 2 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 3 1 3 15.75 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-132506.3 25966.4...
EDGEWATER MOBILE HOME PARK 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-153562.3 7972.28...
EL DORADO MOBILE HOME PARK 139 84 11 15 0 19 0 11 60.26 7.80 10.48 0.00 13.27 0.00 8.19 139 60 79 43.12 48 6 10 0 4 6 1 0 8 1 7 0 1 0 4 0 1 19 15 8 34 9 4 3 0 0 10 5 5 35 17 10 46.70 31.09 2.710000 29468.00 17394.00 POLYGON ((-124341.2 53660.1...
EL DORADO WEST MHP 148 89 12 16 0 20 0 12 60.26 7.80 10.48 0.00 13.27 0.00 8.19 147 63 84 43.12 51 6 10 0 4 6 1 0 8 2 8 0 1 0 5 0 1 20 16 8 37 9 5 3 0 0 10 6 6 38 18 10 46.70 31.09 2.710000 29468.00 17394.00 POLYGON ((-124532.3 53662.9...
ELEVEN OAKS MOBILE HOME COMMUNITY 233 45 94 56 0 37 0 1 19.27 40.19 24.01 0.00 15.91 0.00 0.62 233 87 146 37.48 71 7 2 3 6 10 2 1 1 3 1 13 17 3 0 3 0 17 17 15 34 32 3 8 3 1 21 1 1 42 29 23 46.85 35.36 3.280000 60521.00 18213.00 POLYGON ((-119819.8 71950.9...
ELK GROVE WATER SERVICE 42647 7656 19550 3209 70 8939 388 2835 17.95 45.84 7.53 0.16 20.96 0.91 6.65 42258 3264 38994 7.72 13239 430 202 253 224 328 102 345 292 245 667 1117 1441 1470 1386 1907 2832 1108 1311 1784 2420 3225 2856 7552 1903 628 2861 283 113 2826 1595 864 28.55 12.12 3.179068 122771.00 43429.03 POLYGON ((-118730.1 42496.7...
FAIR OAKS WATER DISTRICT 36003 4655 27050 708 94 1372 12 2113 12.93 75.13 1.97 0.26 3.81 0.03 5.87 35775 2852 32923 7.97 14233 546 332 113 229 208 391 206 469 293 804 1064 2214 1447 1568 1875 2474 1220 1568 1868 2788 4082 3016 7090 1872 845 3092 261 108 4051 1844 768 27.94 12.09 2.480217 107985.74 54435.01 POLYGON ((-112317.5 69577.6...
FLORIN COUNTY WATER DISTRICT 9951 2963 1548 1394 7 2743 866 430 29.78 15.56 14.01 0.07 27.56 8.70 4.32 9835 1285 8550 13.06 2755 84 125 53 154 103 46 86 176 224 258 223 432 297 215 143 137 417 635 481 1051 913 512 981 426 90 675 49 28 1100 476 260 34.48 13.70 3.573005 67048.12 24517.64 POLYGON ((-122791.9 52602.2...
FOLSOM STATE PRISON 3536 1257 652 1390 57 70 34 77 35.55 18.43 39.31 1.60 1.97 0.96 2.17 29 1 28 2.20 23 0 0 0 0 0 0 0 0 0 0 0 0 4 4 12 1 0 0 0 0 1 8 3 1 0 0 0 0 19 0 0 4.67 0.53 2.726311 161047.22 2271.22 POLYGON ((-99838.11 75350.0...
FOLSOM, CITY OF - ASHLAND 3845 318 2934 43 1 125 1 423 8.26 76.32 1.12 0.03 3.26 0.02 10.99 3780 143 3637 3.79 1800 44 17 104 43 34 209 103 74 43 43 158 248 132 80 123 345 208 463 201 670 449 212 594 164 90 847 368 82 358 196 74 40.42 13.70 2.087286 76810.17 56773.97 POLYGON ((-102605.9 74922.1...
FOLSOM, CITY OF - MAIN 62462 8433 35222 1693 105 12934 177 3897 13.50 56.39 2.71 0.17 20.71 0.28 6.24 62115 3405 58710 5.48 22409 807 218 390 477 418 283 329 373 451 670 1181 2255 2382 1747 4083 6344 1892 1855 1851 3747 4106 4129 11491 2728 1179 3590 237 146 7328 3010 1321 26.66 11.81 2.769356 141856.37 58469.35 POLYGON ((-101870.6 66094.5...
FREEPORT MARINA 3 2 1 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 3 1 3 17.42 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 POLYGON ((-130970.7 50553.3...
GALT, CITY OF 21490 9314 9952 520 22 872 20 789 43.34 46.31 2.42 0.10 4.06 0.09 3.67 21341 1404 19937 6.58 6988 139 168 243 210 141 342 161 347 152 550 687 807 1096 504 789 650 761 1143 1237 1904 2044 1601 3724 907 523 1454 109 44 1809 906 414 27.52 14.05 3.048249 90632.93 33685.54 MULTIPOLYGON (((-113921.6 2...
GOLDEN STATE WATER CO - ARDEN WATER SERV 6556 1706 2887 322 0 888 11 742 26.02 44.04 4.91 0.00 13.54 0.16 11.32 6453 1626 4828 25.19 2173 19 82 19 141 53 173 34 179 37 139 351 319 132 172 141 183 262 476 490 738 809 303 728 239 123 131 0 0 1315 599 335 38.56 21.09 2.897716 66579.36 30417.36 POLYGON ((-121143.9 63698.4...
GOLDEN STATE WATER CO. - CORDOVA 48115 9009 26042 3982 229 6050 188 2615 18.72 54.13 8.28 0.48 12.57 0.39 5.43 47835 4408 43427 9.21 18022 509 482 310 496 480 437 389 469 598 1276 1692 2653 2565 1671 1948 2047 1796 2374 2968 4170 5621 4236 7380 2174 836 3506 364 201 7137 2744 1410 29.31 13.58 2.650717 96697.06 42695.41 POLYGON ((-112985.4 62375.3...
HAPPY HARBOR (SWS) 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-139842 11256.85,...
HOLIDAY MOBILE VILLAGE 46 18 7 3 0 15 0 3 38.66 15.12 7.10 0.00 32.49 0.00 6.64 46 10 36 22.33 16 2 1 0 1 0 1 5 1 0 0 2 2 1 0 0 0 4 7 2 11 4 1 2 0 0 2 1 1 12 6 4 44.88 28.55 2.860000 38491.00 16707.00 POLYGON ((-123874.7 52485.3...
HOOD WATER MAINTENCE DIST [SWS] 1 1 0 0 0 0 0 0 69.19 28.71 0.00 0.00 0.00 0.00 2.10 1 0 1 17.42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45.66 25.57 2.550000 56250.00 23510.00 MULTIPOLYGON (((-132506 403...
IMPERIAL MANOR MOBILEHOME COMMUNITY 209 52 129 1 0 6 0 21 24.93 61.63 0.45 0.00 2.93 0.00 10.05 209 45 164 21.48 124 4 26 18 3 0 16 7 5 6 1 4 29 0 0 0 6 51 34 5 84 34 0 9 0 0 89 37 34 27 27 22 50.97 45.07 1.680363 31831.84 32878.17 POLYGON ((-115390.2 74250.3...
KORTHS PIRATES LAIR 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-137314.9 10213.1...
LAGUNA DEL SOL INC 24 5 18 0 0 0 0 0 21.55 75.20 0.00 0.67 1.46 0.00 1.12 24 2 22 6.40 9 0 1 1 0 0 0 0 0 0 0 0 2 0 0 1 2 2 1 0 3 2 1 5 2 2 3 0 0 2 0 0 23.37 23.37 2.640000 95227.00 50793.00 POLYGON ((-104662.2 49197.3...
LAGUNA VILLAGE RV PARK 20 3 2 1 0 11 2 2 12.79 8.48 7.28 0.00 52.62 8.38 10.45 20 2 18 11.79 7 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 2 2 1 3 1 0 1 0 0 3 1 0 32.52 12.26 3.030000 84332.00 32668.00 POLYGON ((-122461.8 48066.6...
LINCOLN CHAN-HOME RANCH 4 2 2 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 4 1 3 15.75 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-136788.6 36526.1...
LOCKE WATER WORKS CO [SWS] 1 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 1 0 1 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-131952.8 27176.6...
MAGNOLIA MUTUAL WATER 1 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 1 0 1 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-137022.9 36118.9...
MC CLELLAN MHP 269 52 108 65 0 43 0 2 19.27 40.19 24.01 0.00 15.91 0.00 0.62 269 101 168 37.48 82 8 2 3 7 11 2 2 1 3 1 15 20 3 0 3 0 20 19 17 39 36 3 9 4 2 25 1 1 48 34 27 46.85 35.36 3.280000 60521.00 18213.00 POLYGON ((-119814.9 72169.0...
OLYMPIA MOBILODGE 290 70 81 18 0 101 16 3 24.12 28.03 6.30 0.00 34.95 5.53 1.08 290 68 222 23.43 114 11 0 6 10 9 3 13 0 0 10 19 8 3 12 5 5 28 25 29 53 36 14 31 22 10 51 12 10 33 9 7 37.35 23.74 2.510000 53786.00 29451.00 POLYGON ((-123342.4 53061.6...
ORANGE VALE WATER COMPANY 17387 2658 12308 241 181 633 86 1281 15.28 70.79 1.39 1.04 3.64 0.49 7.37 17288 1904 15384 11.01 6595 389 111 61 94 226 58 274 120 181 372 752 990 901 626 678 766 655 858 1123 1512 2113 1526 3246 1021 453 1686 315 185 1663 693 305 30.77 14.29 2.608348 92693.71 42509.89 POLYGON ((-108131.3 74330.4...
PLANTATION MOBILE HOME PARK 10 4 1 1 0 3 0 1 38.66 15.12 7.10 0.00 32.49 0.00 6.64 10 2 7 22.33 3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 2 1 0 1 0 0 0 0 0 2 1 1 44.88 28.55 2.860000 38491.00 16707.00 POLYGON ((-124180.4 53321.5...
RANCHO MARINA 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-138041.4 11320.9...
RANCHO MURIETA COMMUNITY SERVI 3239 661 2157 120 7 188 0 106 20.42 66.59 3.71 0.21 5.80 0.00 3.26 3239 199 3040 6.13 1402 59 42 0 6 5 18 74 27 75 44 81 88 118 204 241 319 108 199 125 307 213 323 1029 205 103 270 63 57 103 41 40 22.02 14.30 2.307704 144993.81 66451.34 POLYGON ((-92457.85 52674.7...
RIO COSUMNES CORRECTIONAL CENTER [SWS] 22 6 8 4 1 1 0 2 25.74 37.49 16.82 2.97 4.50 1.81 10.66 4 0 4 0.00 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 23.75 0.00 3.450000 115897.00 11095.00 POLYGON ((-124032.5 32206.2...
RIO LINDA/ELVERTA COMMUNITY WATER DIST 11831 2585 7595 337 17 765 21 512 21.85 64.19 2.85 0.14 6.46 0.18 4.33 11829 1619 10210 13.69 3762 177 156 67 169 56 113 116 114 118 173 297 607 492 431 416 259 569 518 470 1087 1077 922 1918 573 157 773 114 47 1070 519 340 32.07 14.49 3.123012 83603.04 33734.49 POLYGON ((-126609.8 73568.2...
RIVER'S EDGE MARINA & RESORT 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-141102.2 11867.3...
SAC CITY MOBILE HOME COMMUNITY LP 229 82 17 7 0 123 0 0 35.66 7.50 3.27 0.00 53.57 0.00 0.00 229 110 119 48.14 89 11 16 9 10 8 0 0 4 2 7 1 13 4 4 0 0 46 14 8 60 21 8 4 2 2 15 2 0 71 41 30 48.95 35.43 2.530000 22380.00 16689.00 POLYGON ((-124544.3 56147.0...
SACRAMENTO SUBURBAN WATER DISTRICT 193126 43047 97872 17684 834 20602 624 12464 22.29 50.68 9.16 0.43 10.67 0.32 6.45 190984 33399 157585 17.49 72505 3817 3001 3069 2884 3205 3100 3337 2893 2342 5541 6792 10037 6480 4342 5488 6177 12771 14878 12333 27649 22370 10822 23467 7204 2837 12037 2087 1160 37001 21072 10274 41.88 19.68 2.635471 73746.51 35321.18 MULTIPOLYGON (((-122206.9 6...
SAN JUAN WATER DISTRICT 30122 3409 21349 831 287 2762 17 1467 11.32 70.87 2.76 0.95 9.17 0.06 4.87 30014 1718 28297 5.72 10750 389 168 100 275 128 160 111 133 127 472 684 984 854 876 1032 4256 932 658 1156 1591 2141 1730 6210 1754 724 2883 528 357 1658 726 339 27.98 13.21 2.783858 160696.10 72978.42 POLYGON ((-104526.8 73044.7...
SCWA - ARDEN PARK VISTA 8086 990 6016 270 12 396 8 395 12.24 74.40 3.33 0.15 4.90 0.10 4.88 8038 523 7515 6.51 3303 79 36 48 77 65 38 18 49 162 139 187 253 465 208 416 1065 241 330 326 571 579 673 1823 520 112 673 76 23 807 384 225 29.69 10.90 2.424845 139081.65 84548.46 POLYGON ((-120985.4 62883.8...
SCWA - LAGUNA/VINEYARD 145495 27502 38496 16568 246 50411 2220 10052 18.90 26.46 11.39 0.17 34.65 1.53 6.91 145198 14710 130489 10.13 45137 1692 666 742 878 839 1336 850 788 752 2363 3198 6037 5323 5057 6578 8038 3978 4565 5561 8543 11598 10380 24581 7232 2916 7878 861 471 12677 6368 3337 32.04 14.90 3.207447 114494.03 41415.71 MULTIPOLYGON (((-126550 404...
SCWA MATHER-SUNRISE 18249 2708 8114 1553 23 4507 164 1180 14.84 44.47 8.51 0.12 24.70 0.90 6.47 18211 1005 17206 5.52 5503 228 35 97 57 68 39 12 20 36 189 320 533 645 755 1003 1469 416 174 509 590 1042 1399 3756 881 266 855 60 43 893 318 167 22.89 8.66 3.296327 147818.01 47448.37 MULTIPOLYGON (((-112526.7 5...
SEQUOIA WATER ASSOC 0 0 0 0 0 0 0 0 44.60 45.84 0.00 0.00 5.93 0.00 3.63 0 0 0 15.75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24.49 14.65 2.490000 68248.00 38950.00 POLYGON ((-136929.5 36128.1...
SOUTHWEST TRACT W M D [SWS] 174 29 42 24 3 75 1 0 16.58 24.48 13.69 1.55 43.11 0.60 0.00 174 38 136 21.83 57 1 2 7 0 7 0 0 10 12 3 2 5 0 1 2 4 10 29 6 39 10 1 3 1 0 8 0 0 45 29 7 52.53 12.40 3.040000 45671.00 36348.00 MULTIPOLYGON (((-125843.6 5...
SPINDRIFT MARINA 0 0 0 0 0 0 0 0 3.90 89.23 3.23 0.00 0.00 0.00 3.63 0 0 0 35.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28.02 23.19 1.790000 38125.00 33103.00 POLYGON ((-139920.3 11468.3...
TOKAY PARK WATER CO 652 214 134 37 0 239 0 28 32.80 20.55 5.61 0.00 36.69 0.00 4.35 652 113 539 17.29 173 2 2 3 21 0 0 13 13 10 18 27 36 14 4 10 0 27 36 45 64 81 18 81 38 11 44 0 0 48 32 12 40.57 13.58 3.757973 62802.24 19400.05 POLYGON ((-122824.8 54197.9...
TUNNEL TRAILER PARK 0 0 0 0 0 0 0 0 49.74 34.94 0.00 0.00 4.65 0.00 10.67 0 0 0 0.00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20.30 0.00 2.950000 153092.00 42507.00 POLYGON ((-136160.9 24171.2...
VIEIRA'S RESORT, INC 4 2 2 0 0 0 0 0 41.43 52.47 0.00 0.00 4.55 0.00 1.56 4 1 3 22.60 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 40.53 21.84 2.030000 51977.00 40522.00 POLYGON ((-143780.4 18567.4...
WESTERNER MOBILE HOME PARK 32 6 6 9 0 10 0 1 17.59 17.62 28.31 0.55 31.36 0.00 4.57 31 7 24 23.76 10 1 0 0 0 1 0 0 1 0 2 1 1 2 0 1 0 1 2 3 3 4 2 4 2 1 1 0 0 5 3 2 56.87 29.49 3.160000 59296.00 23437.00 POLYGON ((-122657.2 48977.8...
\n\n`````\n:::\n:::\n\n\n
\n\n#### Clean & Format Intermediate (Clipped) Data\n\nFor visualization and exploration, it'll also be useful to apply some additional formatting to the clipped block-group data used in intermediate parts of the interpolation process above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# portion of households paying more than 30% / 50% of income on housing\ncensus_data_interpolate <- census_data_interpolate %>% \n mutate(households_all_housing_costs_over30pct_percent = \n 100 * (households_mortgage_housing_costs_over30pct_count + \n households_no_mortgage_housing_costs_over30pct_count + \n households_rent_housing_costs_over30pct_count) / households_count, \n .after = households_rent_housing_costs_over50pct_count) %>% \n mutate(households_all_housing_costs_over50pct_percent = \n 100 * (households_mortgage_housing_costs_over50pct_count + \n households_no_mortgage_housing_costs_over50pct_count + \n households_rent_housing_costs_over50pct_count) / households_count,\n .after = households_all_housing_costs_over30pct_percent)\n\n# drop water system data except name (water_system_name)\ncensus_data_interpolate <- census_data_interpolate %>% \n select(-census_unit_area, -clipped_area) %>% \n select(-c(water_system_number:water_system_population_reported)) %>% \n select(-c(average_household_size_weighted:per_capita_income_weighted)) %>% \n relocate(water_system_name, .after = NAME) %>% \n relocate(areal_weight_factor, .after = water_system_name)\n```\n:::\n\n\n#### Transform Results to Long Format {#sec-results-transform-long}\n\nFor further analysis and exploration / visualization of the results, it will help to convert the results from wide to long format, and edit the group names so that they can be used as titles.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# pivot from wide to long format\nwater_system_demographics_long <- water_system_demographics %>% \n # select(water_system_name, starts_with('percent_')) %>% # select only the fields with percentages, and the water system name/id\n # convert to long format\n # st_drop_geometry() %>% \n pivot_longer(cols = !c(water_system_name, geometry), \n names_to = 'variable', \n values_to = 'value') %>% \n relocate(geometry, .after = last_col())\n\n# clean variable names and add grouping fields (type, group_type)\nwater_system_demographics_long <- water_system_demographics_long %>% \n mutate(variable = variable %>% \n # str_remove_all(pattern = 'percent_') %>% \n str_replace_all(pattern = '_', replacement = ' ') %>% \n str_replace_all(pattern = ' or ', replacement = ' / ') %>% \n str_to_title(.) %>%\n str_remove_all(pattern = ' / Alaska Native')) %>% \n mutate(variable_type = case_when(\n str_detect(variable, pattern = 'Count') ~ 'Count',\n str_detect(variable, pattern = 'Percent') ~ 'Percent',\n str_detect(variable, pattern = 'Pop Weighted') ~ 'Pop Weighted',\n str_detect(variable, pattern = 'Hh Weighted') ~ 'Hh Weighted',\n .default = NA), \n .after = variable) %>% \n mutate(variable_group_type = case_when(\n str_detect(variable, pattern ='Population') ~ 'Population',\n str_detect(variable, pattern = 'Households') ~ 'Households',\n str_detect(variable, pattern = 'Average Household Size Hh Weighted') ~ 'Household Weighted', \n str_detect(variable, pattern = 'Median Household Income Hh Weighted') ~ 'Household Weighted',\n str_detect(variable, pattern = 'Per Capita Income Pop Weighted') ~ 'Population Weighted',\n str_detect(variable, pattern = 'Poverty') ~ 'Population'),\n .after = variable_type) %>% \n mutate(variable = case_when(\n str_detect(variable, pattern = 'Households Count') ~ 'Households Total',\n .default = str_remove_all(variable, pattern = 'Households'))) %>% \n mutate(variable = case_when(\n str_detect(variable, 'Population Total Count') ~ 'Population Total',\n .default = str_remove_all(variable, 'Population'))) %>%\n mutate(variable = str_remove_all(variable, \n pattern = 'Count')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = 'Percent')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = ' Hh Weighted')) %>% \n mutate(variable = str_remove_all(variable, \n pattern = ' Pop Weighted')) %>% \n mutate(variable = str_replace_all(variable, \n pattern = 'Over30pct', \n replacement = 'Over 30% Income')) %>% \n mutate(variable = str_replace_all(variable, \n pattern = 'Over50pct', \n replacement = 'Over 50% Income')) %>% \n mutate(variable = str_trim(variable)) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'k ',\n replacement = 'k-')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = '0 ',\n replacement = '0-')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Black-',\n replacement = 'Black ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Mortgage ',\n replacement = 'Mortgage - ')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Rent ',\n replacement = 'Rent - ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'All ',\n replacement = 'All Households - ')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Households Total',\n replacement = 'Total Households')) %>% \n mutate(variable = str_replace_all(variable,\n pattern = 'Population Total',\n replacement = 'Total Population')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Poverty ',\n replacement = 'Poverty - ')) %>%\n mutate(variable = str_replace_all(variable,\n pattern = 'Poverty - Rate',\n replacement = 'Poverty Rate'))\n```\n:::\n\n\nHere's a view of the structure of the reformatted data:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(water_system_demographics_long)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 3,472\nColumns: 6\n$ water_system_name \"B & W RESORT MARINA\", \"B & W RESORT MARINA\", \"B &…\n$ variable \"Total Population\", \"Hispanic / Latino\", \"White\", …\n$ variable_type \"Count\", \"Count\", \"Count\", \"Count\", \"Count\", \"Coun…\n$ variable_group_type \"Population\", \"Population\", \"Population\", \"Populat…\n$ value 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 41…\n$ geometry POLYGON ((-138282.2 13643.2..., POLYGON ((…\n```\n\n\n:::\n:::\n\n\nWe can also do the same for the clipped block-group data used in intermediate parts of the interpolation process above.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncensus_data_interpolate_long <- census_data_interpolate %>% \n pivot_longer(cols = !c(GEOID, NAME, water_system_name, \n areal_weight_factor, geometry), \n names_to = 'variable', \n values_to = 'value') %>% \n relocate(geometry, .after = last_col())\n```\n:::\n\n\n### Save Results {#sec-results-save}\n\nOnce we've finished the computations and verified the outputs look reasonable, we can save the results to output files so they can be re-used and shared. The results can be saved in tabular (e.g., csv, excel) and/or spatial (e.g., shapefile, geopackage) formats, which may be helpful for different use cases. Note that you may need to think about exactly what variables to include in the output file(s) and how to format the output datasets (e.g., wide versus long format).\n\nThe chunk of code below (which is hidden by default), just tests to see whether any of the datasets to be saved have been changed since the previous version was saved. In general this is probably not needed for a typical workflow and can be ignored for most use cases – it is just used here to make rendering of this document a little more efficient.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# compute hash for datasets to be saved (i.e., a unique identifier for each dataset), and compare against previous versions\n\n## define file that stores hash (unique identifier for dataset)\nhash_file <- here('03_data_results',\n '_dataset_hash.csv')\n\n## compute hashes (unique identifier for datasets)\nhash_current <- digest(object = water_system_demographics,\n algo = 'md5')\nhash_current_long <- digest(object = water_system_demographics_long,\n algo = 'md5')\nhash_interpolate <- digest(object = census_data_interpolate,\n algo = 'md5')\nhash_interpolate_long <- digest(object = census_data_interpolate_long, \n algo = 'md5')\nhash_table_current <- tibble(\n dataset = c('water_system_demographics', \n 'water_system_demographics_long',\n 'census_data_interpolate',\n 'census_data_interpolate_long'),\n hash = c(hash_current, \n hash_current_long,\n hash_interpolate,\n hash_interpolate_long))\n\n## get the previous hashes from file (if it exists), else create a new file to store the hashes\nif (file.exists(hash_file)) {\n hash_table_previous <- read_csv(file = hash_file)\n} else {\n file.create(hash_file)\n hash_table_previous <- tibble(\n dataset = c('water_system_demographics', \n 'water_system_demographics_long',\n 'census_data_interpolate',\n 'census_data_interpolate_long'),\n hash = c('missing', \n 'missing',\n 'missing', \n 'missing'))\n}\n\n## if new hash is different from previous hash, set flag to update the output file (i.e., write a new version of the file)\nfile_update <- !identical(hash_table_current %>% \n filter(dataset == 'water_system_demographics') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'water_system_demographics') %>% \n pull(hash))\nfile_update_long <- !identical(hash_table_current %>% \n filter(dataset == 'water_system_demographics_long') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'water_system_demographics_long') %>% \n pull(hash))\nfile_update_interpolate <- !identical(hash_table_current %>% \n filter(dataset == 'census_data_interpolate') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'census_data_interpolate') %>% \n pull(hash))\nfile_update_interpolate_long <- !identical(hash_table_current %>% \n filter(dataset == 'census_data_interpolate_long') %>% \n pull(hash),\n hash_table_previous %>% \n filter(dataset == 'census_data_interpolate_long') %>% \n pull(hash))\n\n## write current hashes to file (for comparison with future versions)\nwrite_csv(x = hash_table_current,\n file = hash_file,\n append = FALSE)\n```\n:::\n\n\n#### Tabular Dataset {#sec-results-save-tabular}\n\nThe code below saves the tabular results to a csv file, in both the 'wide' and 'long' formats:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# wide\nif (file_update == TRUE) {\n write_csv(water_system_demographics %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'water_system_demographics_sac.csv'))\n}\n\n# long\nif (file_update_long == TRUE) {\n write_csv(water_system_demographics_long %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'water_system_demographics_sac_long.csv'))\n}\n```\n:::\n\n\nAnd we can save the intermediate data from the interpolation process (i.e., data for clipped block groups):\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# wide\nif (file_update_interpolate == TRUE) {\n write_csv(census_data_interpolate %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'intermediate_interpolation_data.csv'))\n}\n\n# long\nif (file_update_interpolate_long == TRUE) {\n write_csv(census_data_interpolate_long %>%\n st_drop_geometry(), # drop the spatial data since this is a tabular format\n file = here('03_data_results',\n 'intermediate_interpolation_data_long.csv'))\n}\n```\n:::\n\n\n#### Spatial Dataset {#sec-results-save-spatial}\n\nTo save the output in a geospatial format, it may be best to save the data in a wide format, so that all of the attribute data for each *target* area (water system) is in a single row along with its spatial data (i.e. the system boundary information) (saving in long format may create a very large file). The code below saves the results – in wide format – to a geopackage file, which is a spatial file format that is similar to a shapefile.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (file_update == TRUE) {\n st_write(water_system_demographics,\n here('03_data_results',\n 'water_system_demographics_sac.gpkg'),\n append = FALSE)\n}\n\nif (file_update_interpolate == TRUE) {\n st_write(census_data_interpolate,\n here('03_data_results',\n 'intermediate_interpolation_data.gpkg'),\n append = FALSE)\n}\n```\n:::\n\n\n## Explore and Visualize Results {#sec-results-explore}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nFor simplicity, this section will focus on presenting estimated demographics for some of the largest water suppliers in the Sacramento county region (results for small water systems may not be very accurate and should be used with some caution - see @sec-check-pop-estimated-reported and @sec-small-area-estimates for more investigation of the results for small systems).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select systems to plot\n\n## number of systems\nn_systems <- 20\n\n## get list of selected systems\nsystems_top_n <- water_system_demographics %>% \n slice_max(population_total_count, n = n_systems) %>% \n pull(water_system_name)\n```\n:::\n\n\n### Race / Ethnicity\n\n\\[placeholder\\]\n\n- percent by group (bar)\n\n- dot-density\n\n- mapview (non-white)\n\n### Income Distributions\n\n\\[placeholder\\]\n\n- income brackets (50k) (bar)\n\n- median household income by block group (dots)\n\n- dot-density (below threshold value)\n\n- mapview\n\n### Poverty Rates\n\n\\[placeholder\\]\n\n- dot plot\n\n- mapview\n\n- side-by-side map\n\n### Income & Relative Housing Costs\n\nThe [biscale](https://chris-prener.github.io/biscale/) R package [@biscale] can be used to create maps that show how two metrics vary together spatially (bivariate choropleth maps).\n\n@fig-bivariate-all shows the relationship between estimated income and relative housing costs for the top 20 systems by estimated population in Sacramento County.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# Table B25140 - Housing Costs as a Percentage of Household Income in the past 12 months.\n# Shows the count of households paying more than 30% or 50% of their income towards housing costs broken out by three tenure categories (owned with a mortgage, owned without a mortgage, and rented).\n\n# set defaults\nbiscale_pal <- 'BlueOr' # 'GrPink' # 'DkViolet2'\nbiscale_dim <- 3\n\n# create classes\nbiscale_data <- bi_class(water_system_demographics %>% \n filter(water_system_name %in% systems_top_n) %>% \n filter(!is.na(median_household_income_hh_weighted)), \n x = households_all_housing_costs_over30pct_percent, \n y = median_household_income_hh_weighted, \n style = \"quantile\", \n dim = biscale_dim)\n\n# create map\nbiscale_map <- ggplot() +\n geom_sf(data = biscale_data, \n mapping = aes(fill = bi_class), \n color = \"white\", \n size = 0.1, \n show.legend = FALSE) +\n bi_scale_fill(pal = biscale_pal, \n dim = biscale_dim) + \n labs(\n title = \"Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \\nand Estimated Median Household Income in Sacramento Water Systems\",\n subtitle = glue(\"Top {n_systems} Water Systems by Population\"),\n caption = glue(\"Data estimated from {acs_year} 5-year ACS Block Groups\")\n # title = \"Estimated Housing Cost as % of Household Income and \\nEstimated Median Household Income in Sacramento Water Systems\", \n # caption = \"% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \\nIncome shows median household income (yellow = missing)\"\n ) +\n # labs(\n # title = \"Housing Cost1 and Income2 in Sacramento Water Systems\",\n # caption = \"1% of households paying more than 30% of their income towards housing costs
2Median household income (yellow = missing)\",\n # subtitle = glue(\"Top {n_systems} systems by population\")\n # ) +\n # add missing polygons back in\n geom_sf(data = water_system_demographics %>% \n filter(water_system_name %in% systems_top_n) %>% \n filter(is.na(median_household_income_hh_weighted)),\n color = \"white\",\n fill = 'gold'\n ) +\n geom_sf(data = counties_ca %>% filter(NAME == 'Sacramento'), \n color = 'grey',\n fill = NA) +\n bi_theme() + \n theme(plot.title = element_text(size=12), # element_markdown(size=12)\n plot.subtitle = element_text(size=10),\n plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))\n\n# create legend\nbiscale_legend <- bi_legend(pal = biscale_pal,\n dim = biscale_dim,\n xlab = \"% Housing Costs \",\n ylab = \"Income \",\n size = 8)\n\n# construct map\nbiscale_plot <- ggdraw() +\n draw_plot(biscale_map, 0, 0, 1, 1) +\n draw_plot(biscale_legend, 0.1, .65, 0.2, 0.2)\n\nbiscale_plot\n```\n\n::: {.cell-output-display}\n![](example_census_race_ethnicity_calculation_files/figure-html/fig-bivariate-all-1.png){#fig-bivariate-all width=768}\n:::\n:::\n\n\n@fig-bivariate-system shows the same variables (relative housing costs and income) for the portions block groups overlapping Sacramento Suburban Water District – this illustrates the data underlying the interpolation process.\n\n\n::: {.cell}\n\n```{.r .cell-code code-fold=\"true\"}\n# set defaults\nbiscale_pal_system <- 'BlueOr' # 'GrPink' # 'DkViolet2'\nbiscale_dim_system <- 3\n\n# create classes\nbiscale_data_system <- bi_class(census_data_interpolate %>% \n filter(water_system_name == system_plot) %>% \n filter(!is.na(median_household_income)), \n x = households_all_housing_costs_over30pct_percent, \n y = median_household_income, \n style = \"quantile\", \n dim = biscale_dim_system)\n# create map\nbiscale_map_system <- ggplot() +\n geom_sf(data = biscale_data_system , \n mapping = aes(fill = bi_class), \n color = \"white\", \n size = 0.1, \n show.legend = FALSE) +\n bi_scale_fill(pal = biscale_pal_system, \n dim = biscale_dim_system) + \n labs(\n title = glue(\"Estimated % of Households Paying More Than 30% of Income Towards Housing Costs \\nand Estimated Median Household Income in {str_to_title(system_plot)}\"),\n # subtitle = glue(\"\"),\n caption = glue(\"Data from {acs_year} 5-year ACS Block Groups (Yellow = Missing Data)\")#,\n # title = glue(\"Housing Cost and Income \\nin {str_to_title(system_plot)}\"), \n # caption = \"% Housing cost shows the percent of households paying more than 30% of their income towards housing costs \\nIncome shows median household income (yellow = missing)\"#,\n ) +\n # add the missing polygons back in\n geom_sf(data = census_data_interpolate %>% \n filter(water_system_name == system_plot) %>% \n filter(is.na(median_household_income)),\n color = \"white\",\n fill = 'gold'\n ) +\n bi_theme() + \n theme(plot.title = element_text(size=12), # element_markdown(size=12)\n plot.subtitle = element_text(size=10),\n plot.caption = element_text(size=8, hjust = 1)) # element_markdown(size=8, hjust = 1))\n\n# create legend\nbiscale_legend <- bi_legend(pal = biscale_pal_system,\n dim = biscale_dim_system,\n xlab = \"% Housing Costs \",\n ylab = \"Income \",\n size = 8)\n\n# construct map\nbiscale_plot_system <- ggdraw() +\n draw_plot(biscale_map_system, 0, 0, 1, 1) +\n draw_plot(biscale_legend, 0.1, .55, 0.2, 0.2)\n\nbiscale_plot_system\n```\n\n::: {.cell-output-display}\n![](example_census_race_ethnicity_calculation_files/figure-html/fig-bivariate-system-1.png){#fig-bivariate-system width=768}\n:::\n:::\n\n\n## Check - Estimated vs Reported Population Estimates {#sec-check-pop-estimated-reported}\n\n\\[TO DO: Create map\\]\n\nBased on the map above, it's apparent that it will be difficult to obtain reasonable estimates for some suppliers, such as the suppliers with very small service areas in the southern portion of the county where the block groups are very large (and the supplier's service are is only a small fraction of the total area of the block group). These issues are explored further in @sec-small-area-estimates.\n\nNote that there are a number of reasons why the estimated population values are likely to differ from the population numbers in the water system dataset (e.g., the depicted boundaries may not be correct or exact, the supplier may have used different methods to count/estimate the population they serve, the time frames for the estimates may be different, etc.). But, there may also be some cases where the numbers differ significantly – depending on the actual analysis being performed, this may mean that further work is needed for certain areas, or could mean that this method may not be sufficient and different methods are needed.\n\nAs a check, we can add a column to the interpolated dataset (which we'll call `population_percent_difference`) to compute the difference between the estimated total population (in the `population_total` field) and the total population listed in the `water_system_population_reported` field (the reported value from the water system dataset, which is named `POPULATION` in the water supplier dataset).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwater_system_demographics_check <- water_system_demographics %>% \n left_join(water_systems_sac %>% \n st_drop_geometry() %>% \n select(water_system_name, water_system_population_reported),\n by = 'water_system_name')\n\nwater_system_demographics_check <- water_system_demographics_check %>%\n mutate(population_percent_difference =\n round(100 * (population_total_count - water_system_population_reported) / water_system_population_reported, \n 2), \n .after = water_system_population_reported)\n```\n:::\n\n\nFor water systems with a small population and/or service area, the estimated demographics may not match the reported population numbers in the water system dataset very well. You can see this in @tbl-pop-est-small by comparing the `population_reported` field, which contains the total population values from the water supplier dataset, with the `population_estimated` field, which contains the total population estimated from the census data; the difference between the two is summarized in the `population_percent_difference` field. This probably indicates that, for small areas, some adjustments and/or further analysis may be needed, and the preliminary estimated values should be treated with some caution/skepticism.\n\nNote: See @sec-small-area-estimates below for some more investigation into water systems whose estimated population is at or near zero.\n\n\n::: {#tbl-pop-est-small .cell tbl-cap='10 Smallest Water Systems by Population'}\n\n```{.r .cell-code}\nwater_system_demographics_check %>%\n arrange(water_system_population_reported) %>%\n slice(1:10) %>%\n select(water_system_name, \n population_reported = water_system_population_reported, \n population_estimated = population_total_count, \n population_percent_difference) %>%\n st_drop_geometry() %>%\n kable()\n```\n\n::: {.cell-output-display}\n\n\n|water_system_name | population_reported| population_estimated| population_percent_difference|\n|:---------------------------|-------------------:|--------------------:|-----------------------------:|\n|DELTA CROSSING MHP | 30| 0| -100.00|\n|LAGUNA VILLAGE RV PARK | 32| 20| -37.50|\n|LINCOLN CHAN-HOME RANCH | 33| 4| -87.88|\n|EDGEWATER MOBILE HOME PARK | 40| 0| -100.00|\n|MAGNOLIA MUTUAL WATER | 40| 1| -97.50|\n|FREEPORT MARINA | 42| 3| -92.86|\n|PLANTATION MOBILE HOME PARK | 44| 10| -77.27|\n|TUNNEL TRAILER PARK | 44| 0| -100.00|\n|SEQUOIA WATER ASSOC | 54| 0| -100.00|\n|HAPPY HARBOR (SWS) | 60| 0| -100.00|\n\n\n:::\n:::\n\n\nBut for larger water systems, the estimated population values seem to be more in line with the population numbers in the original dataset. You can see this in @tbl-pop-est-large by, as above, comparing the `population_reported` field, which contains the total population values from the water supplier dataset, with the `population_estimated` field, which contains the total population estimated from the census data; the difference between the two is summarized in the `population_percent_difference` field.\n\n\n::: {#tbl-pop-est-large .cell tbl-cap='10 Largest Water Systems by Population'}\n\n```{.r .cell-code}\nwater_system_demographics_check %>%\n arrange(desc(water_system_population_reported)) %>%\n slice(1:10) %>%\n select(water_system_name, \n population_reported = water_system_population_reported, \n population_estimated = population_total_count, \n population_percent_difference) %>%\n st_drop_geometry() %>%\n kable()\n```\n\n::: {.cell-output-display}\n\n\n|water_system_name | population_reported| population_estimated| population_percent_difference|\n|:----------------------------------|-------------------:|--------------------:|-----------------------------:|\n|CITY OF SACRAMENTO MAIN | 510931| 516189| 1.03|\n|SACRAMENTO SUBURBAN WATER DISTRICT | 184385| 193126| 4.74|\n|SCWA - LAGUNA/VINEYARD | 172666| 145495| -15.74|\n|FOLSOM, CITY OF - MAIN | 68122| 62462| -8.31|\n|CITRUS HEIGHTS WATER DISTRICT | 65911| 68912| 4.55|\n|CALAM - SUBURBAN ROSEMONT | 53563| 57897| 8.09|\n|CALAM - PARKWAY | 48738| 58635| 20.31|\n|CALAM - LINCOLN OAKS | 47487| 42916| -9.63|\n|GOLDEN STATE WATER CO. - CORDOVA | 44928| 48115| 7.09|\n|ELK GROVE WATER SERVICE | 42540| 42647| 0.25|\n\n\n:::\n:::\n\n\n## Considerations for Detailed Population Estimates {#sec-detailed-pop-estimates}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nIf you're primarily only interested in population estimates (possibly including population by race/ethnicity, age, gender, etc.) and need an estimate that's as geographically accurate as possible, it may make more sense to use the block-level population data from the decennial census rather than block group level population data from the ACS. However, since the decennial census only occurs once every 10 years, those estimates won't reflect recent population changes (and will get especially less accurate as we get farther from the last decennial census). But keep in mind that even the 5-year ACS is an average that encompasses previous years' estimates, so it's not necessarily temporally precise either.\n\nIt's also possible to use the block-level decennial population data as a weighing factor for ACS population data (to allocate the population within block-group level ACS data).\n\n\\[TO DO: add example\\]\n\n## Considerations for Small / Rural Area Estimates {#sec-small-area-estimates}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nFor some water systems, the estimated population using the areal interpolation above (@sec-areal-interp) was at or near zero, and it may be useful to look at an example to see what's going on with one of those cases.\n\n(because the water system may encompass only a small portion of one or a few census units, and the entire census unit(s) may not be representative of the small portion(s)), especially those in rural environments (where population densities are lower, population centers tend to be spread out, and census units tend to be larger).\n\n\\[TO DO: insert map\\]\n\nFrom the map above \\[TO DO: insert map\\], you can see that the service area reported for some systems are very small, only covering a small fraction of a single census unit, resulting in a population estimate that is very low. In these cases, it could be that the system area was drawn incorrectly (i.e., maybe it doesn't really depict the entire service area), in which case the reported service area should be revised. Or, it's possible that the population within the given census unit is very un-evenly distributed and instead there's a relatively high density population cluster in the depicted service area, in which case a more sophisticated method than an area-weighted average should be used (e.g., maybe consider the density of buildings, roads, and/or other features associated with inhabited areas).\n\n## Alternative Computation Methods {#sec-alternative-methods}\n\n::: callout-warning\nThis section is in progress.\n:::\n\n### Population Weighted Interpolation {#sec-alternative-interpolate_pw}\n\nThe `tidycensus` package also has a function for population weighted interpolation, [`interpolate_pw`](https://walker-data.com/tidycensus/reference/interpolate_pw.html), but it uses a somewhat different methodology than the population weighted interpolation procedure applied above in @sec-pop-interp.\n\nNote that some water systems may not get an estimated value using this method, even if `NA` values are removed from the source data first (TO DO: check whether this depends on which type of boundary dataset is used - i.e., tigris with cb = FALSE or TRUE).\n\nUsing original `census_data_acs` variable gives multiple `NA`s - it looks like those are small areas:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresults_interpolate_pw <- interpolate_pw(from = census_data_acs %>%\n filter(!is.na(population_total_count)) %>% # population_total_count median_household_income\n select(population_total_count),\n to = water_systems_sac,\n to_id = 'water_system_name',\n extensive = TRUE, # use FALSE for median_household_income\n weights = census_data_decennial,\n # weight_placement = 'surface',\n weight_column = 'population_total_count') %>%\n # rename(median_household_income_interpolate_pw = median_household_income) # rename results field\n rename(population_total_count_interpolate_pw = population_total_count) %>% \n mutate(population_total_count_interpolate_pw = round(population_total_count_interpolate_pw, 0))\n\n# sum(is.na(results_interpolate_pw$median_household_income_interpolate_pw))\nsum(is.na(results_interpolate_pw$population_total_count_interpolate_pw))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 17\n```\n\n\n:::\n:::\n\n\nUsing detailed block group geometry - looks like the same results?:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# # get detailed block group geometry\n# block_groups_detailed <- block_groups(state = 'CA',\n# county = 'Sacramento',\n# cb = FALSE,\n# year = acs_year) %>%\n# st_transform(crs_projected)\n# block_groups_detailed <- block_groups_detailed %>%\n# st_filter(water_systems_sac) %>%\n# select(GEOID)\n# \n# block_groups_detailed <- block_groups_detailed %>%\n# left_join(census_data_acs %>%\n# st_drop_geometry() %>%\n# select(GEOID, population_total_count, median_household_income),\n# by = 'GEOID')\n# \n# # interpolate\n# results_interpolate_pw_detailed <- interpolate_pw(from = block_groups_detailed %>%\n# filter(!is.na(population_total_count)) %>% # population_total_count median_household_income\n# select(population_total_count),\n# to = water_systems_sac,\n# to_id = 'water_system_name',\n# extensive = TRUE, # use FALSE for median_household_income\n# weights = census_data_decennial,\n# weight_placement = 'surface',\n# weight_column = 'population_total_count') %>%\n# # rename(median_household_income_interpolate_pw = median_household_income) # rename results field\n# rename(population_total_count_interpolate_pw = population_total_count)\n# \n# # sum(is.na(results_interpolate_pw_detailed$median_household_income_interpolate_pw))\n# sum(is.na(results_interpolate_pw_detailed$population_total_count_interpolate_pw))\n```\n:::\n\n\nCompare results using `interpolate_pw` to reported population counts:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresults_interpolate_pw <- results_interpolate_pw %>%\n left_join(water_systems_sac %>%\n st_drop_geometry() %>%\n select(water_system_service_connections, water_system_population_reported, water_system_name),\n by = 'water_system_name') %>% \n relocate(water_system_population_reported, .after = population_total_count_interpolate_pw)\n```\n:::\n\n\nView results:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nglimpse(results_interpolate_pw)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 62\nColumns: 5\n$ water_system_name \"HOOD WATER MAINTENCE DIST [SWS]…\n$ population_total_count_interpolate_pw 74, 412, 96, NA, 1031, NA, NA, 1…\n$ water_system_population_reported 100, 700, 40, 150, 256, 150, 32,…\n$ water_system_service_connections 82, 199, 34, 64, 128, 83, 28, 50…\n$ geometry MULTIPOLYGON (((-13…\n```\n\n\n:::\n:::\n\n\n## Working with Other Source Datasets {#sec-other-sources}\n\n::: callout-warning\nThis section is in progress.\n:::\n\nIn addition to using census data, it's possible to use other types of *source* datasets to compute characteristics of custom *target* areas like water systems. The process is generally likely to be similar to the processes shown above using census data, but each source dataset may require unique considerations (e.g., to handle missing values, uncertain boundaries, etc.).\n\n### CalEnviroScreen {#sec-calenviroscreen}\n\n\\[TO DO: example computation of weighted average CES scores\\]\n\nNotes to consider:\n\n- Some census tracts are missing CES scores (overall and/or for certain indicators), and have to deal with those missing values somehow\n\n- CES 4.0 is tract-level data, and uses 2010 census boundaries (so boundaries won't match current ACS or decennial boundaries)\n\n- CES 4.0 boundaries are simplified, and boundaries between tracts are not consistent – for some types of analysis (especially when looking at point data - e.g., facilities), it may be better to use the original TIGER dataset (available from either the `tidycensus` or `tigris` R packages)\n\n
\n", "supporting": [ "example_census_race_ethnicity_calculation_files" ], diff --git a/01_document/_quarto.yml b/01_document/_quarto.yml index 85b60d8..6f4dd54 100644 --- a/01_document/_quarto.yml +++ b/01_document/_quarto.yml @@ -1,10 +1,26 @@ project: type: website - # output-dir: docs +website: + site-url: "https://daltare.github.io/example-census-race-ethnicity-calculation/example_census_race_ethnicity_calculation.html" + repo-url: "https://github.com/daltare/example-census-race-ethnicity-calculation" + issue-url: "https://github.com/daltare/example-census-race-ethnicity-calculation/issues" + repo-subdir: "01_document" + repo-link-target: "_blank" + repo-actions: [issue, source] + navbar: + search: true + left: + - icon: github + href: https://github.com/daltare/example-census-race-ethnicity-calculation + aria-label: GitHub + # sidebar: + # search: true + # tools: + # - icon: github + # href: https://github.com/daltare/example-census-race-ethnicity-calculation format: html: theme: - # - default light: flatly dark: darkly execute: