Skip to content

Multiple values in one field

Paula Zermoglio edited this page Apr 10, 2017 · 5 revisions

How to deal with multiple values in one Darwin Core field

Many times we have multiple values that could go into one same field, under the same Darwin Core term.

Some examples of this would be:

  • More than one state-province or county (e.g., “Queensland/New South Wales”, state-provinces in Australia; “Pondera;Toole”, counties in Montana, USA).
  • More than one sex value (e.g., when the occurrence refers to more than one individual, or a lot, we can have things like “1 female, 2 males”).
  • More than one life stage (e.g., when the occurrence refers to more than one individual, or a lot, we can have things like “adult and juvenile”).
  • More than one measurement (e.g., “total length = 140cm, snout-vent length = 125cm”).

In cases such as those, we face a problem when deciding how to use the Darwin Core terms, or fields:

. How do we use the corresponding fields?

. . Do we capture only one value?

. . Do we capture all values in those fields?

. . If so, how? Should we follow any particular format?

These questions are not trivial, and the answers are not simple, nor are they fixed and homogeneous for every field, as we will see.

What’s published out there

When we look at the published datasets, the most common way in which multiple values are encoded in the Darwin Core fields is by separating them with a comma ‘ , ’. Other less common options found are ‘ ; ’, ‘ or ’, and sometimes ‘ and ’. However, just because it is usually done this way, it does not mean it is a good idea.

The county example:

Let’s take an example for the county field. And let’s suppose our verbatim geographic data captured in a label is:

“US, Montana, Pondera-Toole county, along the Interstate 15”

Commonly, in the published datasets we would find the following:

dwc:country: United States

dwc:stateProvince: Montana

dwc:county: Pondera, Toole                     ----> note that these are two distinct counties

dwc:municipality:                              ----> usually this field would be found left empty

dwc:locality: Along the Interstate 15

What are the downsides of how data is currently published?

Note that if we use way described above for recording the county, that is, if it is populated with multiple values, there can be ambiguity when one looks at the dataset. For instance, by looking only at the county field, some user could wonder whether “Pondera, Toole” means that the location was in one or another of the counties listed, or if it could be that it was in both (and therefore necessarily along their shared border). Sometimes this ambiguity is resolved by capturing further information in the locality, verbatimLocality, or locationRemarks fields. However, if one looks at the county value alone, this ambiguity is not resolvable. Then, is there a way to make it not ambiguous?

What does Darwin Core standard have to say about it?

Let’s take a look at the Darwin Core definition of county:

The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department, etc.) in which the Location occurs.

This definition is silent when it comes to having multiple counties, except that it uses the singular “administrative region”.

Then, if we are to be strict to the intention of the county field, that is, to capture a standard full name for one administrative level below that in stateProvince, multiple counties should not go there.

Ok, understood, but… Then... what do we do with the county information?

In this case, if we were to follow strictly the Darwin Core standard, we should probably capture the county information in the locality field, appended to whatever else was already there.

In our example case, a best practice would then be to populate the fields as follows:

dwc:country: United States

dwc:stateProvince: Montana

dwc:county:                                                     ----> this field would be left empty

dwc:municipality:                                               ----> this field would be left empty

dwc:locality: Ponder-Toole county, Along the Interstate 15

This example for the county field, however, and the solution proposed, does not hold for every term.

Other examples:

Let’s take a look at what Darwin Core standard has to say in some other examples, then.

Here is a list of some separators Darwin Core recommends or suggests to use for populating different terms when there are multiple values:

a. Separate values with “ | ”. This is explicitly recommended as best practice in Darwin Core for some fields.

Examples: dwc:higherGeography, dwc:typeStatus, dwc:identifiedBy, dwc:recordedBy, dwc:preparations, etc.

b. Separate values with “ , ”. This, actually, is not recommended by the standard, but cases can be found in the examples provided by Darwin Core, associated to some term definitions.

Example: dwc:sex (e.g., “8 males, 4 females”).

c. Separate value with “ ; ”. This, actually, is not recommended by the standard, but cases can be found in the examples provided by Darwin Core, associated to some term definitions.

Example: dwc:samplingEffort (e.g., “10 observer-hours; 10 km by foot; 30 km by car”).

d. No separation. This, actually, is not recommended by the standard, but cases can be found in the examples provided by Darwin Core, associated to some term definitions.

Example: dwc:lifeStage (e.g., “2 adults 4 juveniles).

e. Use JSON format. This is explicitly recommended as best practice in Darwin Core for some fields.

Example: dwc:dynamicProperties (e.g., "{"heightInMeters":1.5}", "{"tragusLengthInMeters":0.014, "weightInGrams":120}").

CONCLUSIONS:

And so…? What do we do…??

Well… the answer is, in general, term-specific.

So, best practice would be to:

1. Follow, first, the strict definitions of the Darwin Core terms.

2. Follow the best practices recommended by the standard.

3. Be thoughtful and consistent.

4. Please participate with your questions and comments about this mess! :)

Clone this wiki locally