You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of phase 2 of incremental loading, it's necessary to determine if any NEW resources have been downloaded - that is one's that haven't been seen before. In order to determine this, the existing log.csv can be scanned to get all the resources. and the collector can check each fetched resource against this set.
The result of this should be a new file from the collector, probably in the 'var' directory, which can be used by the next stage of incremental loading. If no new resources are downloaded, the file will be created, but will be empty.
If the existing log.csv cannot be read, or any other error happens then this file will not be created (and an existing one should be removed) to signal that this information is not available. In this case, incremental loading will not be available.
Tech Approach
Update the collector to read existing log.csv, and get the currently known resources. If log.csv is unavailable or unreadable, print a diagnostic message but continue the collector. In this instance, nothing else will be done in terms of determining new resources.
Keep a list of resources downloaded that are not in the set of known resources.
At the end of the collector, save this list in a file in a suitable file in var (possibly var/collection-name/new-resources.csv?) If this file cannot be created, report a diagnostic warning but continue as usual.
Acceptance Criteria
Code has appropriate tests.
The collector will generate a new file of new resources. If none are new the file will be created, but empty.
If log.csv isn't available or readable or if the output file cannot be created then the collector will run as usual, but output diagnostic messages.
The text was updated successfully, but these errors were encountered:
As part of phase 2 of incremental loading, it's necessary to determine if any NEW resources have been downloaded - that is one's that haven't been seen before. In order to determine this, the existing log.csv can be scanned to get all the resources. and the collector can check each fetched resource against this set.
The result of this should be a new file from the collector, probably in the 'var' directory, which can be used by the next stage of incremental loading. If no new resources are downloaded, the file will be created, but will be empty.
If the existing log.csv cannot be read, or any other error happens then this file will not be created (and an existing one should be removed) to signal that this information is not available. In this case, incremental loading will not be available.
Tech Approach
Update the collector to read existing log.csv, and get the currently known resources. If log.csv is unavailable or unreadable, print a diagnostic message but continue the collector. In this instance, nothing else will be done in terms of determining new resources.
Keep a list of resources downloaded that are not in the set of known resources.
At the end of the collector, save this list in a file in a suitable file in var (possibly var/collection-name/new-resources.csv?) If this file cannot be created, report a diagnostic warning but continue as usual.
Acceptance Criteria
Code has appropriate tests.
The collector will generate a new file of new resources. If none are new the file will be created, but empty.
If log.csv isn't available or readable or if the output file cannot be created then the collector will run as usual, but output diagnostic messages.
The text was updated successfully, but these errors were encountered: