Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw values in occurrences have information from processed values #841

Closed
charvolant opened this issue Oct 18, 2023 · 5 comments
Closed

Raw values in occurrences have information from processed values #841

charvolant opened this issue Oct 18, 2023 · 5 comments
Assignees
Milestone

Comments

@charvolant
Copy link
Contributor

charvolant commented Oct 18, 2023

See for example https://api.ala.org.au/occurrences/occurrences/da76bbe0-0539-4051-bf08-9080a9f12775

This record has an invalid name match caused by misprocessing and difficulty parsing the supplied name. However, it shows another error where the derived subspecies is inserted into the raw data. This seems to be coming from the service, rather than the SOLR index.

The originally supplied data is

{
  "id": "NSW316781",
  "coreRowType": "http://rs.tdwg.org/dwc/terms/Occurrence",
  "coreTerms": {
    "http://rs.tdwg.org/dwc/terms/disposition": "in collection",
    "http://rs.tdwg.org/dwc/terms/preparations": "sheet",
    "http://rs.tdwg.org/dwc/terms/country": "AUSTRALIA",
    "http://rs.tdwg.org/dwc/terms/habitat": "In a patch of Eucalyptus nitens regrowth.",
    "http://rs.tdwg.org/dwc/terms/collectionCode": "NSW",
    "http://rs.tdwg.org/dwc/terms/taxonRank": "species",
    "http://rs.tdwg.org/dwc/terms/verbatimCoordinateSystem": "Degrees Minutes",
    "http://rs.tdwg.org/dwc/terms/recordNumber": "461",
    "http://rs.tdwg.org/dwc/terms/locality": "headwaters of Bonang River, 2.5 km W of Gunmark [Goonmirk ?] Road & Errinundra Rd junction along Errinundra Rd, N of road (low side)",
    "http://rs.tdwg.org/dwc/terms/verbatimLatitude": "37 18 S",
    "http://rs.tdwg.org/dwc/terms/basisOfRecord": "PreservedSpecimen",
    "http://rs.tdwg.org/dwc/terms/family": "Myrtaceae",
    "http://purl.org/dc/terms/modified": "2022-01-17T20:41:58",
    "http://rs.tdwg.org/dwc/terms/decimalLatitude": "-37.30",
    "http://rs.tdwg.org/dwc/terms/scientificName": "Eucalyptus `hammonds rd'",
    "http://rs.tdwg.org/dwc/terms/recordedBy": "Chesterfield, E.A.",
    "http://rs.tdwg.org/dwc/terms/stateProvince": "Victoria",
    "http://rs.tdwg.org/dwc/terms/genus": "Eucalyptus",
    "http://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters": "10000",
    "http://rs.tdwg.org/dwc/terms/specificEpithet": "`hammonds rd'",
    "http://rs.tdwg.org/dwc/terms/occurrenceID": "NSW:NSW:NSW316781",
    "http://rs.tdwg.org/dwc/terms/eventDate": "1984-05-22",
    "http://rs.tdwg.org/dwc/terms/verbatimTaxonRank": "species",
    "http://rs.tdwg.org/dwc/terms/verbatimLongitude": "148 48 E",
    "http://rs.tdwg.org/dwc/terms/nomenclaturalCode": "ICN",
    "http://rs.tdwg.org/dwc/terms/catalogNumber": "NSW316781",
    "http://rs.tdwg.org/dwc/terms/establishmentMeans": "native",
    "http://rs.tdwg.org/dwc/terms/occurrenceRemarks": "Initially mistaken for Eucalyptus nitens with basal bark very similar to that species, bark on bole with greenish tinge of E. viminalis. This species is reputed (F. Morris - Overseer Orbost district) to cover 30- 40 ha on a flat in the Delegate River, compartment 501, block 3, where it grows with E. radiata.",
    "http://rs.tdwg.org/dwc/terms/reproductiveCondition": "buds|fruits",
    "http://rs.tdwg.org/dwc/terms/decimalLongitude": "148.80",
    "http://rs.tdwg.org/dwc/terms/institutionCode": "NSW",
    "http://rs.tdwg.org/dwc/terms/verbatimCoordinates": "37 18 S, 148 48 E",
    "http://rs.tdwg.org/dwc/terms/occurrenceStatus": "present"
  },
  "extensions": {
    "http://data.ggbn.org/schemas/ggbn/terms/Loan": [],
    "http://rs.gbif.org/terms/1.0/Multimedia": [],
    "http://rs.tdwg.org/dwc/terms/ResourceRelationship": []
  }
}

The information in the solr index is

{
  "responseHeader": {
    "zkConnected": true,
    "status": 0,
    "QTime": 14,
    "params": {
      "q": "id:\"da76bbe0-0539-4051-bf08-9080a9f12775\"",
      "q.op": "OR"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 7.3651896,
    "numFoundExact": true,
    "docs": [
      {
        "id": "da76bbe0-0539-4051-bf08-9080a9f12775",
        "country": "Australia",
        "raw_eventDate": "1984-05-22",
        "raw_locality": "headwaters of Bonang River, 2.5 km W of Gunmark [Goonmirk ?] Road & Errinundra Rd junction along Errinundra Rd, N of road (low side)",
        "habitat": "In a patch of Eucalyptus nitens regrowth.",
        "point-0.02": "-37.3,148.8",
        "point-0.01": "-37.3,148.8",
        "scientificName": "Eucalyptus pauciflora subsp. debeuzevillei",
        "matchType": "canonicalMatch",
        "lat_long": "-37.3,148.8",
        "geohash": "-37.3,148.8",
        "location": "-37.3,148.8",
        "quad": "-37.3,148.8",
        "packedQuad": "-37.3,148.8",
        "establishmentMeans": "native",
        "raw_stateConservation": "Critically Endangered",
        "type": "PhysicalObject",
        "raw_family": "Myrtaceae",
        "phylumID": "https://id.biodiversity.org.au/taxon/apni/51414458",
        "familyID": "https://id.biodiversity.org.au/taxon/apni/51376810",
        "occurrenceStatus": "PRESENT",
        "catalogNumber": "NSW316781",
        "basisOfRecord": "PRESERVED_SPECIMEN",
        "raw_scientificName": "Eucalyptus `hammonds rd'",
        "taxonConceptID": "https://id.biodiversity.org.au/node/apni/2896227",
        "point-0.1": "-37.3,148.8",
        "modified": "2022-01-17T20:41:58",
        "raw_modified": "2022-01-17T20:41:58",
        "raw_establishmentMeans": "native",
        "reproductiveCondition": "buds|fruits",
        "order": "Myrtales",
        "dataResourceName": "NSW AVH feed",
        "recordNumber": "461",
        "raw_basisOfRecord": "PreservedSpecimen",
        "locality": "headwaters of Bonang River, 2.5 km W of Gunmark [Goonmirk ?] Road & Errinundra Rd junction along Errinundra Rd, N of road (low side)",
        "raw_taxonRank": "species",
        "stateProvince": "Victoria",
        "speciesID": "https://id.biodiversity.org.au/node/apni/2897845",
        "collectionCode": "NSW",
        "point-1": "-37,149",
        "occurrenceID": "NSW:NSW:NSW316781",
        "point-0.0001": "-37.3,148.8",
        "raw_recordedBy": "Chesterfield, E.A.",
        "verbatimLatitude": "37 18 S",
        "license": "CC-BY 4.0 (Int)",
        "dataResourceUid": "dr15861",
        "genus": "Eucalyptus",
        "biome": "TERRESTRIAL",
        "subspecies": "Eucalyptus pauciflora subsp. debeuzevillei",
        "common_name_and_lsid": "Jounama Snow Gum|Eucalyptus pauciflora subsp. debeuzevillei|https://id.biodiversity.org.au/node/apni/2896227|Jounama Snow Gum|Plantae|Myrtaceae",
        "scientificNameAuthorship": "(Maiden) L.A.S.Johnson & Blaxell",
        "taxonRank": "subspecies",
        "raw_coordinateUncertaintyInMeters": "10000",
        "genusID": "https://id.biodiversity.org.au/taxon/apni/51360942",
        "collectionName": "National Herbarium of New South Wales",
        "raw_preparations": "sheet",
        "nameType": "SCIENTIFIC",
        "vernacularName": "Jounama Snow Gum",
        "provenance": "Published dataset",
        "raw_decimalLatitude": "-37.30",
        "institutionCode": "NSW",
        "countryCode": "AU",
        "verbatimLongitude": "148 48 E",
        "class": "Equisetopsida",
        "raw_country": "AUSTRALIA",
        "collectionUid": "co54",
        "raw_genus": "Eucalyptus",
        "nomenclaturalCode": "ICN",
        "raw_decimalLongitude": "148.80",
        "orderID": "https://id.biodiversity.org.au/taxon/apni/51376809",
        "names_and_lsid": "Eucalyptus pauciflora subsp. debeuzevillei|https://id.biodiversity.org.au/node/apni/2896227|Jounama Snow Gum|Plantae|Myrtaceae",
        "point-0.001": "-37.3,148.8",
        "verbatimCoordinateSystem": "Degrees Minutes",
        "geodeticDatum": "EPSG:4326",
        "kingdom": "Plantae",
        "specificEpithet": "`hammonds rd'",
        "raw_occurrenceStatus": "present",
        "classID": "https://id.biodiversity.org.au/taxon/apni/51414457",
        "dataProviderUid": "dp36",
        "disposition": "in collection",
        "phylum": "Charophyta",
        "datePrecision": "DAY",
        "raw_stateProvince": "Victoria",
        "species": "Eucalyptus pauciflora",
        "institutionUid": "in50",
        "dataProviderName": "Australia's Virtual Herbarium",
        "verbatimCoordinates": "37 18 S, 148 48 E",
        "institutionName": "The Royal Botanic Gardens & Domain Trust",
        "subspeciesID": "https://id.biodiversity.org.au/node/apni/2896227",
        "occurrenceRemarks": "Initially mistaken for Eucalyptus nitens with basal bark very similar to that species, bark on bole with greenish tinge of E. viminalis. This species is reputed (F. Morris - Overseer Orbost district) to cover 30- 40 ha on a flat in the Delegate River, compartment 501, block 3, where it grows with E. radiata.",
        "stateConservation": "Critically Endangered",
        "family": "Myrtaceae",
        "kingdomID": "https://id.biodiversity.org.au/taxon/apni/51414459",
        "verbatimTaxonRank": "species",
        "cl10936": "Outer Regional Australia",
        "cl110944": "Remote and Natural Area - Schedule 6, National Parks Act",
        "cl10933": "GIPPSLAND",
        "cl410927": "88020",
        "cl10935": "VICTORIA EXC. MELBOURNE",
        "cl10934": "EAST GIPPSLAND",
        "cl927": "Victoria (including Coastal Waters)",
        "cl310927": "2.4899539112931600000000",
        "cl1058": "South East Coast (Victoria)",
        "cl10930": "East Gippsland",
        "cl1059": "SNOWY RIVER",
        "cl111033": "National Park",
        "cl990": "Atlas of Life in the Coastal Wilderness",
        "cl210927": "79080",
        "cl10903": "Nature Conservation Reserve",
        "cl10900": "Non-Indigenous, Native forest",
        "cl10944": "Brodribb",
        "cl10943": "LATROBE - GIPPSLAND",
        "cl10902": "Eucalypt Tall Open",
        "cl10946": "East Gippsland",
        "cl2013": "Victoria",
        "cl916": "East Gippsland",
        "cl959": "East Gippsland (S)",
        "cl10942": "GIPPSLAND - EAST",
        "cl10941": "ORBOST",
        "cl1048": "South Eastern Highlands",
        "cl1049": "Kybeyan-Gourock",
        "cl11033": "Errinundra",
        "cl510927": "2.7714433898839600000000",
        "cl1918": "Primarily Vegetated Natural & Semi-Natural Terrestrial Vegetation Woody Trees Closed",
        "cl110928": "0.0006297303771610000000",
        "cl23": "E. Gippsland - Orbost",
        "cl22": "Victoria",
        "cl110923": "EAST GIPPSLAND",
        "cl110922": "Legislative Council",
        "cl20": "South East Corner",
        "cl110927": "0.0607374948771430000000",
        "cl110925": "VIC",
        "cl620": "Eucalyptus tall open forest",
        "cl2125": "Eucalypt Tall Open Forests",
        "cl2124": "Eucalyptus (+/- tall) open forest with a dense broad-leaved and/or tree-fern understorey (wet sclerophyll)",
        "cl2049": "GER Great Eastern Ranges Initiative",
        "cl10929": "REST OF VIC.",
        "cl10925": "VICTORIA",
        "cl932": "Australia",
        "cl10928": "20",
        "cl10927": "1929",
        "cl10922": "EASTERN VICTORIA",
        "cl10921": "GIPPSLAND",
        "cl10923": "EAST GIPPSLAND SHIRE",
        "cl1068": "GER National Corridor",
        "cl10000": "Eucalypt Tall Open",
        "cl617": "Eucalypt tall open forests",
        "decimalLongitude": 148.8,
        "decimalLatitude": -37.3,
        "distanceFromExpertDistribution": -1,
        "coordinateUncertaintyInMeters": 10000,
        "el790": 48,
        "el891": 0.89,
        "el890": 15.5,
        "el893": 1196,
        "el870": 9.6,
        "el892": 1.42,
        "el674": 731,
        "el894": 21.9,
        "el872": 15,
        "el875": 15.5,
        "el874": 10.3,
        "el876": 5.2,
        "el879": 22.8,
        "el878": 243,
        "el882": 17,
        "el881": 14.8,
        "el862": 22.3,
        "el883": 0.46,
        "el886": 351,
        "el863": 326,
        "el888": 10.2,
        "el866": 30,
        "el865": 1,
        "el887": 42,
        "el867": 0.5,
        "el889": 243,
        "el10978": 10.55,
        "outlierLayerCount": 0,
        "taxonRankID": 8000,
        "decade": 1980,
        "month": 5,
        "year": 1984,
        "lft": 567705,
        "day": 22,
        "rgt": 567705,
        "firstLoadedDate": "2021-06-26T06:00:59.158Z",
        "lastLoadDate": "2023-10-03T23:08:57.626Z",
        "lastProcessedDate": "2023-10-04T01:43:47.685Z",
        "occurrenceYear": [
          "1984-01-01T00:00:00Z"
        ],
        "occurrence_year": [
          "1984-01-01T00:00:00Z"
        ],
        "eventDate": "1984-05-22T00:00:00Z",
        "isInCluster": false,
        "spatiallyValid": true,
        "defaultValuesUsed": true,
        "preparations": [
          "sheet"
        ],
        "recordedBy": [
          "Chesterfield, E.A."
        ],
        "geospatialIssues": [
          "GEODETIC_DATUM_ASSUMED_WGS84",
          "MISSING_GEODETICDATUM",
          "MISSING_GEOREFERENCE_DATE",
          "MISSING_GEOREFERENCEDBY",
          "MISSING_GEOREFERENCEPROTOCOL",
          "MISSING_GEOREFERENCESOURCES",
          "MISSING_GEOREFERENCEVERIFICATIONSTATUS"
        ],
        "speciesSubgroup": [
          "Dicots",
          "Flowering plants"
        ],
        "speciesListUid": [
          "dr655"
        ],
        "speciesGroup": [
          "Plants",
          "Angiosperms",
          "Dicots"
        ],
        "dataHubUid": [
          "dh9"
        ],
        "assertions": [
          "GEODETIC_DATUM_ASSUMED_WGS84",
          "MISSING_GEODETICDATUM",
          "MISSING_GEOREFERENCE_DATE",
          "MISSING_GEOREFERENCEDBY",
          "MISSING_GEOREFERENCEPROTOCOL",
          "MISSING_GEOREFERENCESOURCES",
          "MISSING_GEOREFERENCEVERIFICATIONSTATUS"
        ],
        "contentTypes": [
          "point occurrence data"
        ],
        "_root_": "da76bbe0-0539-4051-bf08-9080a9f12775"
      }
    ]
  }
}

There is no raw_subspecies in the solr document

The data returned by the API call, with assertions removed for brevity is

{
  "raw": {
    "rowKey": "da76bbe0-0539-4051-bf08-9080a9f12775",
    "uuid": "da76bbe0-0539-4051-bf08-9080a9f12775",
    "occurrence": {
      "establishmentMeans": "native",
      "catalogNumber": "NSW316781",
      "occurrenceStatus": "present",
      "basisOfRecord": "PreservedSpecimen",
      "modified": "2022-01-17T20:41:58",
      "reproductiveCondition": "buds|fruits",
      "recordNumber": "461",
      "collectionCode": "NSW",
      "occurrenceID": "NSW:NSW:NSW316781",
      "preparations": "sheet",
      "institutionCode": "NSW",
      "disposition": "in collection",
      "recordedBy": "Chesterfield, E.A.",
      "occurrenceRemarks": "Initially mistaken for Eucalyptus nitens with basal bark very similar to that species, bark on bole with greenish tinge of E. viminalis. This species is reputed (F. Morris - Overseer Orbost district) to cover 30- 40 ha on a flat in the Delegate River, compartment 501, block 3, where it grows with E. radiata.",
      "stateConservation": "Critically Endangered"
    },
    "classification": {
      "scientificName": "Eucalyptus `hammonds rd'",
      "genus": "Eucalyptus",
      "subspecies": "Eucalyptus pauciflora subsp. debeuzevillei",
      "taxonRank": "species",
      "nomenclaturalCode": "ICN",
      "specificEpithet": "`hammonds rd'",
      "subspeciesID": "https://id.biodiversity.org.au/node/apni/2896227",
      "family": "Myrtaceae",
      "verbatimTaxonRank": "species"
    },
    "location": {
      "country": "AUSTRALIA",
      "habitat": "In a patch of Eucalyptus nitens regrowth.",
      "decimalLatitude": "-37.30",
      "terrestrial": true,
      "locality": "headwaters of Bonang River, 2.5 km W of Gunmark [Goonmirk ?] Road & Errinundra Rd junction along Errinundra Rd, N of road (low side)",
      "decimalLongitude": "148.80",
      "stateProvince": "Victoria",
      "verbatimLatitude": "37 18 S",
      "coordinateUncertaintyInMeters": "10000",
      "marine": false,
      "verbatimLongitude": "148 48 E",
      "verbatimCoordinateSystem": "Degrees Minutes"
    },
    "event": {
      "eventDate": "1984-05-22"
    },
    "attribution": {
      "dataResourceUid": "dr15861",
      "dataHubUid": [
        "dh9"
      ]
    },
    "identification": {},
    "measurement": {},
    "assertions": [
      "GEODETIC_DATUM_ASSUMED_WGS84",
      "MISSING_GEODETICDATUM",
      "MISSING_GEOREFERENCE_DATE",
      "MISSING_GEOREFERENCEDBY",
      "MISSING_GEOREFERENCEPROTOCOL",
      "MISSING_GEOREFERENCESOURCES",
      "MISSING_GEOREFERENCEVERIFICATIONSTATUS"
    ],
    "miscProperties": {},
    "queryAssertions": {},
    "defaultValuesUsed": true,
    "spatiallyValid": true,
    "geospatiallyKosher": true,
    "taxonomicallyKosher": "",
    "deleted": false,
    "firstLoaded": "2021-06-26T06:00:59.158Z",
    "dateDeleted": "",
    "lastModifiedTime": "2023-10-03T23:08:57.626Z"
  },
  "processed": {
    "rowKey": "da76bbe0-0539-4051-bf08-9080a9f12775",
    "uuid": "da76bbe0-0539-4051-bf08-9080a9f12775",
    "occurrence": {
      "establishmentMeans": "native",
      "occurrenceStatus": "PRESENT",
      "basisOfRecord": "PRESERVED_SPECIMEN",
      "modified": "2022-01-17T20:41:58",
      "recordedBy": [
        "Chesterfield, E.A."
      ],
      "stateConservation": "Critically Endangered"
    },
    "classification": {
      "scientificName": "Eucalyptus pauciflora subsp. debeuzevillei",
      "matchType": "canonicalMatch",
      "phylumID": "https://id.biodiversity.org.au/taxon/apni/51414458",
      "familyID": "https://id.biodiversity.org.au/taxon/apni/51376810",
      "taxonConceptID": "https://id.biodiversity.org.au/node/apni/2896227",
      "order": "Myrtales",
      "taxonRankID": 8000,
      "speciesID": "https://id.biodiversity.org.au/node/apni/2897845",
      "genus": "Eucalyptus",
      "left": 567705,
      "scientificNameAuthorship": "(Maiden) L.A.S.Johnson & Blaxell",
      "taxonRank": "subspecies",
      "genusID": "https://id.biodiversity.org.au/taxon/apni/51360942",
      "nameType": "SCIENTIFIC",
      "vernacularName": "Jounama Snow Gum",
      "orderID": "https://id.biodiversity.org.au/taxon/apni/51376809",
      "right": 567705,
      "kingdom": "Plantae",
      "classID": "https://id.biodiversity.org.au/taxon/apni/51414457",
      "phylum": "Charophyta",
      "classs": "Equisetopsida",
      "species": "Eucalyptus pauciflora",
      "family": "Myrtaceae",
      "kingdomID": "https://id.biodiversity.org.au/taxon/apni/51414459"
    },
    "location": {
      "country": "Australia",
      "decimalLatitude": -37.3,
      "terrestrial": true,
      "locality": "headwaters of Bonang River, 2.5 km W of Gunmark [Goonmirk ?] Road & Errinundra Rd junction along Errinundra Rd, N of road (low side)",
      "decimalLongitude": 148.8,
      "stateProvince": "Victoria",
      "biome": "TERRESTRIAL",
      "coordinateUncertaintyInMeters": 10000,
      "marine": false,
      "countryCode": "AU",
      "geodeticDatum": "EPSG:4326",
      "verbatimCoordinates": "37 18 S, 148 48 E"
    },
    "event": {
      "year": 1984,
      "month": 5,
      "datePrecision": "DAY",
      "day": 22,
      "eventDate": "1984-05-22"
    },
    "attribution": {
      "dataResourceName": "NSW AVH feed",
      "collectionName": "National Herbarium of New South Wales",
      "license": "CC-BY 4.0 (Int)",
      "dataProviderUid": "dp36",
      "provenance": "Published dataset",
      "dataResourceUid": "dr15861",
      "institutionUid": "in50",
      "dataProviderName": "Australia's Virtual Herbarium",
      "institutionName": "The Royal Botanic Gardens & Domain Trust",
      "collectionUid": "co54"
    },
    "identification": {},
    "measurement": {},
    "miscProperties": {},
    "queryAssertions": {},
    "geospatiallyKosher": true,
    "taxonomicallyKosher": "",
    "deleted": false,
    "dateDeleted": "",
    "lastModifiedTime": "2023-10-04T01:43:47.685Z",
    "el": {
      "el790": 48,
      "el891": 0.89,
      "el890": 15.5,
      "el893": 1196,
      "el870": 9.6,
      "el892": 1.42,
      "el674": 731,
      "el894": 21.9,
      "el872": 15,
      "el875": 15.5,
      "el874": 10.3,
      "el876": 5.2,
      "el879": 22.8,
      "el878": 243,
      "el882": 17,
      "el881": 14.8,
      "el862": 22.3,
      "el883": 0.46,
      "el886": 351,
      "el863": 326,
      "el888": 10.2,
      "el866": 30,
      "el865": 1,
      "el887": 42,
      "el867": 0.5,
      "el889": 243,
      "el10978": 10.55
    },
    "cl": {
      "cl10936": "Outer Regional Australia",
      "cl110944": "Remote and Natural Area - Schedule 6, National Parks Act",
      "cl10933": "GIPPSLAND",
      "cl410927": "88020",
      "cl10935": "VICTORIA EXC. MELBOURNE",
      "cl10934": "EAST GIPPSLAND",
      "cl927": "Victoria (including Coastal Waters)",
      "cl310927": "2.4899539112931600000000",
      "cl1058": "South East Coast (Victoria)",
      "cl10930": "East Gippsland",
      "cl1059": "SNOWY RIVER",
      "cl111033": "National Park",
      "cl990": "Atlas of Life in the Coastal Wilderness",
      "cl210927": "79080",
      "cl10903": "Nature Conservation Reserve",
      "cl10900": "Non-Indigenous, Native forest",
      "cl10944": "Brodribb",
      "cl10943": "LATROBE - GIPPSLAND",
      "cl10902": "Eucalypt Tall Open",
      "cl10946": "East Gippsland",
      "cl2013": "Victoria",
      "cl916": "East Gippsland",
      "cl959": "East Gippsland (S)",
      "cl10942": "GIPPSLAND - EAST",
      "cl10941": "ORBOST",
      "cl1048": "South Eastern Highlands",
      "cl1049": "Kybeyan-Gourock",
      "cl11033": "Errinundra",
      "cl510927": "2.7714433898839600000000",
      "cl1918": "Primarily Vegetated Natural & Semi-Natural Terrestrial Vegetation Woody Trees Closed",
      "cl110928": "0.0006297303771610000000",
      "cl23": "E. Gippsland - Orbost",
      "cl22": "Victoria",
      "cl110923": "EAST GIPPSLAND",
      "cl110922": "Legislative Council",
      "cl20": "South East Corner",
      "cl110927": "0.0607374948771430000000",
      "cl110925": "VIC",
      "cl620": "Eucalyptus tall open forest",
      "cl2125": "Eucalypt Tall Open Forests",
      "cl2124": "Eucalyptus (+/- tall) open forest with a dense broad-leaved and/or tree-fern understorey (wet sclerophyll)",
      "cl2049": "GER Great Eastern Ranges Initiative",
      "cl10929": "REST OF VIC.",
      "cl10925": "VICTORIA",
      "cl932": "Australia",
      "cl10928": "20",
      "cl10927": "1929",
      "cl10922": "EASTERN VICTORIA",
      "cl10921": "GIPPSLAND",
      "cl10923": "EAST GIPPSLAND SHIRE",
      "cl1068": "GER National Corridor",
      "cl10000": "Eucalypt Tall Open",
      "cl617": "Eucalypt tall open forests"
    }
  },
...

raw.classification.subspecies and raw.classification.subspeciesID contain values not in the original data.

@adam-collins
Copy link
Contributor

It has been fine like this for a long time and I do worry such a change will break something in biocache-hubs.

  • subspecies and subspeciesID are only 2 of the fields that need to be changed.
  • Review everything in raw; assertions, rowKey, uuid, firstLoaded, attribution, etc
  • Review everything in processed; attribution, location, modified etc
  • Update biocache-hubs for all of these changes

Not to mention that if a raw field is absent the non-raw field is used intentionally in the RAW section.

It is more useful to deprecate this output format and version a format consistent with the download format, i.e. a format capable of listing of all fields in a flat structure that can reference index/fields for further information.

Of course we could ignore everything else inconsistent and just fix these 2 fields, and do all of this again next time someone raises an issue of any of the other inconsistencies.

@adam-collins
Copy link
Contributor

Only moving subspecies and subspeciesID. pull request #864

@adam-collins adam-collins self-assigned this Dec 1, 2023
@adam-collins adam-collins added this to the 3.4.0 milestone Dec 4, 2023
@peggynewman
Copy link

Nefarious processed subspecies content is no longer appearing in raw: https://biocache-ws-test.ala.org.au/ws/occurrence/da76bbe0-0539-4051-bf08-9080a9f12775
It would be nice to review all of the raw/processed data fields and we are likely to hit on this at some stage soon. Happy to leave it at this. @nielsklazenga with the Darwin Core Compliance work, we should add that we want to review what comprises raw/processed values.

@peggynewman
Copy link

@adam-collins just looking at this again, in test this record has a subspecies value appearing in the UI that's not in the data:
data: https://biocache-ws-test.ala.org.au/ws/occurrences/129c76a1-26ec-4eb7-b608-dd3d7e28ff96
UI: https://biocache-test.ala.org.au/occurrences/129c76a1-26ec-4eb7-b608-dd3d7e28ff96

image image

@adam-collins
Copy link
Contributor

moved the new issue to #903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants