Incremental Loading P2 - 1: Determine new resources #233

cjohns-scottlogic · 2025-01-16T12:27:11Z

As part of phase 2 of incremental loading, it's necessary to determine if any NEW resources have been downloaded - that is one's that haven't been seen before. In order to determine this, the existing log.csv can be scanned to get all the resources. and the collector can check each fetched resource against this set.

The result of this should be a new file from the collector, probably in the 'var' directory, which can be used by the next stage of incremental loading. If no new resources are downloaded, the file will be created, but will be empty.

If the existing log.csv cannot be read, or any other error happens then this file will not be created (and an existing one should be removed) to signal that this information is not available. In this case, incremental loading will not be available.

Tech Approach

Update the collector to read existing log.csv, and get the currently known resources. If log.csv is unavailable or unreadable, print a diagnostic message but continue the collector. In this instance, nothing else will be done in terms of determining new resources.

Keep a list of resources downloaded that are not in the set of known resources.

At the end of the collector, save this list in a file in a suitable file in var (possibly var/collection-name/new-resources.csv?) If this file cannot be created, report a diagnostic warning but continue as usual.

Acceptance Criteria

Code has appropriate tests.

The collector will generate a new file of new resources. If none are new the file will be created, but empty.

If log.csv isn't available or readable or if the output file cannot be created then the collector will run as usual, but output diagnostic messages.

cjohns-scottlogic added this to Infrastructure Jan 14, 2025

cjohns-scottlogic converted this from a draft issue Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Loading P2 - 1: Determine new resources #233

Incremental Loading P2 - 1: Determine new resources #233

cjohns-scottlogic commented Jan 16, 2025

Incremental Loading P2 - 1: Determine new resources #233

Incremental Loading P2 - 1: Determine new resources #233

Comments

cjohns-scottlogic commented Jan 16, 2025