Skip to content

Issue #125 A Guide to Updating Neighbourhood Records in Who's on First (part 1)

Stephen Epps edited this page Aug 9, 2016 · 1 revision

#Updating Who's on First Neighbourhood Records (part 1)

The Who's on First (WOF) project is not pretending to be the authority of truth, but rather a home for data from various sources and a project that we hope generates discussion. The WOF gazetteer houses data for many geographies, including counties (which parent cities) and cities (which parent microhoods, neighbourhoods, and macrohoods). Neighbourhoods are the places where we live and work in cities; the more neighbourhood data, the better.

San Francisco, CA Photo Credit: Travis Wise, Flickr

Our Audience: Those who care about neighbourhoods and have a few hours to a few days to spend diving into QGIS.

Your Skills/Experience: This tutorial was written with the intermediate to advanced GIS user. While this tutorial tries to explain the editing process in detail, some information may not be immediately clear to a beginning user.

Why are neighbourhoods important to mapping?

  • Labelling. When we use a mapping application, we want neighbourhoods to be labeled on the map (in the right place and at the right zoom).
  • Ability to search. Neighbourhoods should be searchable (with the ability to know how large the feature is and fit it into view).
  • Browsing venues. We should be able to browse venues by neighbourhood.

It is important to understand where [placetypes](https://github.co m/whosonfirst/whosonfirst-placetypes) fall within the WOF hierarchy. Neighbourhoods, macrohoods, and microhoods are described below:

  • Neighbourhoods: A centralized community within a larger city, town, or borough.
  • Macrohoods: A grouping of neighbourhoods within a larger city, town, or borough.
  • Microhoods: A geographically localized sub-section of a neighbourhood.

This tutorial addresses the issues we found with San Francisco's neighbourhoods and describes the workflow we took in fixing them. In the end, we hope you can follow along and edit neighbourhoods for your own city!

What are your options when updating neighbourhoods?

  • Notice just one neighbourhood that needs cleaning up? File a Github issue, or email us. [See something, say something.]
  • Notice an open dataset of administrative data? Open an issue in our WOF repository.
  • Still interested but confused about all this Github and WOF-isms? We can send you a "starter kit" that includes your requested neighbourhoods. Email us.
  • Follow the instructions below and send us a pull request.

Key terms:

Check out these key terms if you have any questions about the vocabulary used in this guide.

##The Issue

We received a report in the whosonfirst-data repository that the shape of San Francisco’s Golden Gate Park neighbourhood was both too small and extended into adjacent neighbourhoods. While researching online sources for a better shape, we noticed that most adjacent neighbourhood shapes in WOF could also be improved to align better with the road network and local expectations. After researching neighbourhood shapes online, we downloaded neighbourhood shapes for San Francisco from SF OpenData, a city data website, to compare with the neighbourhood records in our WOF repository. We then filed a new issue to handle all neighbourhood updates for San Francisco.

Before we begin, it is important to understand where WOF neighbourhood records came from and how the geometries were generated.

Typically, Who’s On First sources Quattroshapes geometries for most neighbourhoods globally. However, many neighbourhoods in the United States, including San Francisco, source their default geometry from Zetashapes. The Zetashapes project follows the same basic principles as Quattroshapes, but builds shapes up from Census 2010 features and can draw shapes that are too big, small, or just plain weird. We’ve seen problems with shapes extending in the water and far out into neighboring rural areas. This technique is responsible for the issues that we are correcting in San Francisco.

Drawing neighbourhood shapes is a tricky business. Strangers generally agree on what a neighbourhood is named and its rough shape, but even good friends can argue vehemently about where one neighbourhood ends and another begins - even if there are hard edges between neighbourhoods or they should overlap. Recognizing this, Who’s On First allows multiple alternate geometries for a place, but for practical reasons we need to set just one shape as the default geometry.

To clean up our neighbourhood geometries, we needed to take five steps:

  • #1 Review WOF records for your locality. Get a sense of where they could be improved and where they are acceptable, focusing on names and shapes.
  • #2 Research other neighbourhood sources on the internet.
  • #3 Download authoritative neighbourhood geometries from a reliable, open license data source for your locality.
  • #4 Reconciling WOF records with authoritative neighbourhood geometries. Things to think about: Is the number of neighbourhood records similar? Are the shapes similar or better? Could you draw your own shapes?
  • #5 Update records, either though a clean import or creating a new hybrid using the best of both (and your local knowledge).

##1: Review Who's on First records for your locality

In this section, you will:

  • Use the git checkout command in your terminal to clone necessary repositories
  • Collect a .geojson file for all neighbourhoods in your locality (city)
  • Add your .geojson file to a QGIS document for review

If you'd like to bypass this step, Mapzen is happy to send you a "starter kit" with a .geojson file that includes WOF neighbourhoods records for your locality (please email us).

However, if you do want to build from source...

Note: Windows users should ensure they have Powershell 3.0 before beginning any GitHub work from the terminal. A Powershell 3.0 download can be found here. Additionally, all users should ensure they have setuptools for Python by downloading Python 2.7 (or a more current version) and GDAL 2.1 by downloading QGIS 2.14 (or a more current version).

  • Run git checkout on the WOF Data repository, WOF Properties repository, and WOF Utils repository. Note the PATH of those repos, and update in the script snippet below.
  • Run the install script in whosonfirst-utils repository.
  • Open the wof-csv-to-feature-collection.py script in your Utils repo. Update line 63 with your local filepath.
  • Then, do the following...

Entering the following string in the terminal from the whosonfirst-utils repository's scripts folder allows us to collect San Francisco's neighbourhoods as a single .geojson file:

python wof-csv-to-feature-collection -p /usr/local/mapzen/whosonfirst-data/data -c /usr/local/mapzen/whosonfirst-data/meta/wof-neighbourhood-latest.csv --aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json -o ~/Desktop/Ventura_Neighbourhoods.geojson --slim --slim-template external_editor -f 85922583

San Francisco neighbourhood records in WOF

Image: Data collection script in the terminal.

Note: The trailing number 85922583 at the end of this script is the WOF ID for San Francisco. When running this script, make sure to update that ID with whatever record you need neighbourhood geometries for. An ID is a unique identifier for records in WOF. Each record in WOF has an ID. To find the ID for your city, search the Spelunker and copy the ID.

Voilà! We have a .geojson of WOF neighbourhood records in San Francisco!

To better understand what we're requesting of our command, here is a breakdown of exactly what is included:

  • python Used to invoke python
  • wof-csv-to-feature-collection The python script that collects WOF records.
  • -p /PATH/whosonfirst-data/data sets the path to the local copy of all the WOF data. You will need to update this PATH depending on where you checked the file out.
  • -c /PATH/whosonfirst-data/meta/wof-neighbourhood-latest.csv The metafile for the placetype you are interested in - neighbourhood. You will need to update this PATH depending on where you checked the file out.
  • --aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json Pulling in various aliases for attribute fields for your output file.
  • -o ~/Desktop/SFNeighbourhoods.geojson Your output file.
  • --slim Option parser to limit property export to subset (roughly those in the CSV file) and reduce file size.
  • --slim-template Option parser to trim key names to fit Esri Shapefile format (10 charachter length limit).
  • external_editor Return only necessary attribute fields for neighbourhood edits.
  • -f 85922583 ID of the locality you need neighbourhood records for, found by searching our Spelunker.

From the WOF repository for San Francisco, a total of 156 records for neighbourhoods were collected. QGIS was used to preview Who’s On First neighbourhood shapes (below).

San Francisco neighbourhood records in WOF

Image: San Francisco neighbourhood records in comparison to the San Francisco Bay Area.

You might notice the general shape of San Francisco present in the above below, but it's tough to make out because many neighbourhood geometries extend into San Francisco Bay. Additionally, many of these WOF neighbourhood shapes cross into what most people would consider a different neighbourhood, and, in two cases, include areas in different counties. The good news? The majority of these neighbourhood records contain usable information in their WOF attribute fields.

San Francisco neighbourhood records in WOF

Image: San Francisco neighbourhood records from Who's on First projected in QGIS.

In some cities, we have detailed polygon shapes for most, but not all neighbourhoods. For a handful, we only know the name and the approximate point representing the label centroid. We need to establish a concordance between all WOF records, points and polygons - please add polygons! (Step 4)

##2: Research other neighbourhood sources

In this section, you will:

  • Find new data to import
  • Verify open license

Because Who’s On First is liberally licensed open data, we must be selective about adding new data. We either need to find a new source that is open data with a CC-BY or CC-0 license that allows commercial and derivative works or create new shapes based on local knowledge and by cross-referencing multiple sources. Ideally, this new source should be an improvement over what Who’s On First already knows about the place!

The City and County of San Francisco hosts various neighbourhood-related shapefiles through it's OpenData portal, so we had a few options to choose from.

Just because your locality hosts a neighbourhood dataset, does not mean the neighbourhood geometries are useable. For example, city planning departments often group neighbourhoods together for planning purposes; you can start with these geometries, but they should be double-checked before import. For instance, if a shape is named Name 1 - Name 2 - Name 3 (e.g. Mission-Potrero-SoMa), it should probably be split into three polygons before import, one for each neighbourhood.

Remember - don't blindly trust an authoritative set of neighbourhood shapes. Review a few other neighbourhood sources to compare names, attributes, shape detail, and coverage. Ensuring that you have an accurate set of neighbourhood shapes and adequate attributes will save time when reconciling the data with existing Who's on First records.

We did not choose the planning department shapes, as those were too coarse and used more for statistical groupings (there weren’t as many neighbourhoods as we had already, and their shapes were way too big, more like macrohoods). The Mayor’s Office geometries were built to match local expectations and there was a similar number of places to what was in Who’s On First. Their colloquial shapes matched up with what we thought they should like as San Francisco locals.

Once we verified the data was provided through an open license, we created a new source, sfgov, in our sources repository to give credit to the original author. This dataset was then downloaded to our desktop and added to a new QGIS document to compare with the existing shapes in WOF. Lucky for us, the SF OpenData is already in the WGS84 projection and does not need to be reprojected.

##3: Download authoritative neighbourhood geometries

In this section, you will:

  • Download data
  • Open your data
  • Begin comparison with existing Who's on First records
SF OpenData neighbourhood data projected in QGIS

Image: Only the SF OpenData neighbourhood data projected in QGIS.

Once the data source was added to our source repository, the data was downloaded and placed into a new QGIS document to compare to the geometries of WOF records (above). You can see the clean, non-overlapping geometries in the SF OpenData, unlike our existing WOF geometries (below).

SF OpenData neighbourhood data projected in QGIS

Image: A comparison of SF OpenData neighbourhoods (blue borders) and WOF records (multi-colored), projected in QGIS.

Now that we have WOF data and data provided by the City of San Francisco, we can begin reconciling the two datasets. We will join the two datasets based on a common attribute; in this case the wof:name field from the WOF data was joined to the SF OpenData's name field. The join tool in QGIS can be found by navigating to the properties of the WOF .geojson layer and clicking the "Join" option (below).

Join Properties tab in QGIS

Image: Join Properties tool in QGIS.

In an ideal world, all WOF records would join cleanly to SF OpenData records, but that was not the case. This join method worked for the most part, but because the spellings are not identical between each of the attribute tables, this join needs to be verified and improved by hand. For example, QGIS's join tool did not join a value of Haight Ashbury to a value of Haight-Ashbury or a value of Mission District to a value of Mission. As described below, it's not a matter of which name field is "more correct", but a matter of importing additional names from your authoritative source while preserving the existing wof:name in a eng_x_variant name field.

Alternately, we could perform this join based on location, instead of an attribute field. QGIS has functionality to perform a spatial join (some documentation here), which would be helpful if our WOF geometries were geographically similar to our administrative data. However, because our geometries in San Francisco overlap substantially with the SF OpenData geometries, an attribute join is more likely to give us matching records between the two datasets (generally, neighbourhood names are unique in city). If you are unsure of which join is best for your locality, give them both a try and compare the results.

Who's on First SF OpenData Note
Alamo Square Alamo Square in both, great!
Anza Vista Anza Vista in both, great!
Baja Noe no match, no alternate name spelling, WOF only
Bret Harte no WOF record, let's research.
Haight Ashbury no name match, but does have alternate name: Haight-Ashbury
Cathedral Hill Cathedral Hill in both, great!

Image: Comparison of WOF and SF OpenData name attributes.

This method assigned wof:id values to each SF OpenData record that joined to a WOF record. After comparing, 96 of 117 SF OpenData records were assigned a wof:id. With the records that did not join based on the name field join, we will have to reconcile, adding the wof:id manually whenever possible.

Attributes table after joining datasets

Image: SF OpenData attribute table after joining datasets. Records with NULL values need to be imported by hand.

Examples of neighbourhoods that WOF did not have at the time of import are Bret Harte and Candlestick Point SRA (meaning these records will need a new WOF ID). Since these neighbourhoods were not in the WOF database, we should consider importing them as new neighbourhood records.

##4: Reconciling Who's on First records with authoritative neighbourhood geometries

In this section, you will:

  • Verify data from your authoritative source
  • Develop an action plan to modify data

Remember - not all of the existing neighbourhood records matched to an SF OpenData geometry (96 new records were given existing IDs, but there were 156 existing neighbourhood records). This begs the question: What should happen to the 60 leftover neighbourhood records?

There are three options for the 60 leftover records:

  • Deprecate the record, as it was never a valid neighbourhood to begin with. Research can't verify this name or shape. Sometimes a neighbourhood falls out of common usage, or the error was an error.
  • Downgrade the record as a microhood and give it a parented_by value for the neighbourhood it falls within. People still use this name, but only the residents of those few city blocks.
  • Upgrade the neighbourhood records to a macrohood (see: Sunset, Richmond, Downtown).

###Modifying SF OpenData

Before importing the city-provided geometries, it is important to ensure the new neighbourhood boundaries will work in Who’s On First. While we can easily import the new neighbourhood geometries raw from our source (SF OpenData), we should "trust but verify" our data before the import.

The majority of geometries in the SF OpenData source were imported as-is, though two neighbourhood records were edited prior to import (Rincon Hill and Financial District South). Using our local knowledge and opinions, we adjusted these neighbourhood boundaries slightly.

###Modifying WOF Data

What are our options for records that are only in WOF (not in SF OpenData)?

Once we've retrieved all existing WOF neighbourhood shapes in San Francisco, they were added to a QGIS document (below) and given a new numerical field titled "status". The status values were color-coded to display the following options for each of WOF neighbourhood records:

  • 1 - Neighbourhood. Valid neighbourhood record. Both datasets agree this is a neighbourhood, probably needs a new shape and name.
  • 2 - Reclassify up to macrohood. Neighbourhood records that will become macrohood records.
  • 3 - Reclassify down to microhoods. Neighbourhood records that will be reclassified as microhood records. WOF records not in SF OpenData that are smaller than the surrounding neighbourhood should change placetypes to microhood.
  • 4 - Reclassify down to microhoods... maybe. Needs more investigation. Probably change placetype to a microhood.
  • 5 - Deprecate neighbourhood. Invalid record, should be "deprecated" and "superseded" in WOF. Everyone agrees this neighbourhood is an error.
Assigning records' status in QGIS

Image: Developing an action plan in QGIS by assigning records' status in QGIS, reviewing WOF record matches with new SF OpenData source. Colors represent status value.

When we update WOF neighbourhoods to default to a new geometry, we also need to preserve the earlier Zetashapes geometry as an alt-geometry in WOF. An alt-geometry is a dedicated WOF record that only contains source information and a geometry - check out an example of an alt-geometry here. Alt-geometries use the same wof:id as the record's main geometry, but append -alt-"source".

Because we're mixing data from different sources, we should also modify the shapes so they are more consistent with eachother regardless of the source. We'll revisit this in part 5.

Congratulations! You have just finished collected new neighbourhood shapes for WOF! In the next part of this tutorial, we'll prepare the data for import. But first, a well-deserved break.

To finalize your work and prepare data for import, check out part two!