Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add object tagging - Moodle 310 #619

Merged
merged 22 commits into from
Dec 6, 2024

Conversation

matthewhilton
Copy link
Contributor

@matthewhilton matthewhilton commented Jul 22, 2024

Summary
Allows objectfs to send tags for an object to external stores. Initially only S3 is supported. location and environment tags are added.

For technical reasons, mimetype is also stored but this is stored in metadata instead of the tags (and thus is only for new objects, migration of metadata is not feasible)

The use case for this is to be able to setup conditional lifecycle rules to send orphaned files to deep archive and eventually delete them.

See #11 (comment) for the rationale behind using S3 tags instead of other methods such as file prefixes or separate buckets.

How it works
See TAGGING.md

Cherry picks

Related issues

@matthewhilton
Copy link
Contributor Author

matthewhilton commented Jul 22, 2024

I've been pondering the idea of removing the intermediary table from the equation here.

Originally the idea is we cache the tag values in an intermediary table, so the values can be easily queried on the moodle side for reporting. This allows you to have 'expensive' tags i.e. those that are computationally expensive to calculate, but also be able to query them very easily using the DB for reporting.

However if we want to only have 'cheap' tags, we might be able to get away with just aggregating their values in the report data table (i.e every time report is run, it recalculates the values and sums the counts together), and then instead of querying db for the cached tags, it would simply re-calculate it the moment the object is about to get updated.

The downside of this is the report generating gets a tad bit more complex and heavy because we need to constantly re-calculate the tag values (which can come from various random places in the db) as opposed to a single tag cache intermediary table.

Another downside is we wouldn't then be able to track how many have actually been tagged in the external store, whereas with an intermediary table we at least know if its in there, its value is 99% of the time the same in the external store.

I'm leaning towards the simpler solution which is just to have them in the intermediary table.

@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 4 times, most recently from bcd9fa2 to 9327509 Compare July 29, 2024 05:38
@matthewhilton matthewhilton changed the base branch from MOODLE_310_STABLE to fix-codestandards July 29, 2024 05:39
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 3 times, most recently from f5928ea to 9c7e9bd Compare July 29, 2024 22:06
@matthewhilton
Copy link
Contributor Author

matthewhilton commented Jul 29, 2024

This is mostly ready to go, will give it a test for these scenarios:

Manual testing

  • New site install - enable tagging and confirm tags are uploaded when objects are replicated
  • Upgraded site with existing object - enable tagging and confirm tags are uploaded via migration and also when new objects are uploaded
  • Site without overwrite - confirm cannot overwrite objects and tags remain intact
  • Site with overwrite - confirm can overwrite and tags are overwritten and updated
  • Tagging enabled without permissions in S3 - Confirm exceptions are thrown correctly, and check API class alerts issue
  • Object report - generates object report with correct tag value counts.

@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 3 times, most recently from b009b8c to b2d42f1 Compare July 31, 2024 21:57
Base automatically changed from fix-codestandards to MOODLE_310_STABLE August 1, 2024 02:19
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 4 times, most recently from 36e1894 to ee3f161 Compare August 2, 2024 01:21
@matthewhilton matthewhilton marked this pull request as ready for review August 2, 2024 01:27
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch from ee3f161 to 4517840 Compare August 4, 2024 22:52
@matthewhilton matthewhilton changed the title Add object tagging Add object tagging - Moodle 310 Aug 4, 2024
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch from 4517840 to 9920ae5 Compare August 5, 2024 01:26
Copy link

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @matthewhilton this is looking great from the code perspective. I've pointed out a few things I've noticed, but these are mostly just some extra considerations and discussion points. I've focused more on reviewing the code and haven't tested this end to end myself - so I'm partly relying on your test results for that.

My only real concern here is the perform at scale, mostly when back filling with the adhoc task, but this is an optional feature so I think it should be fine to roll out and test as needed.

classes/local/store/object_file_system.php Outdated Show resolved Hide resolved
classes/local/store/object_file_system.php Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
classes/local/store/s3/file_system.php Outdated Show resolved Hide resolved
classes/local/store/s3/client.php Outdated Show resolved Hide resolved
classes/local/tag/tag_manager.php Outdated Show resolved Hide resolved
db/install.xml Outdated Show resolved Hide resolved
classes/local/report/objectfs_report.php Show resolved Hide resolved
TAGGING.md Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
@matthewhilton
Copy link
Contributor Author

I've not addressed all the comments yet, but just an fyi pushed an update to add a bit more visibility to the migration progress, now there is a nice table in the admin settings (for a lack of a better place to put it), let me know if you see any tweaks that should be made here, or if there are better styling options available (I couldn't find any):

image

Additionally pushed a CI fix, since moodle plugin ci looks to have updated on my local and was throwing errors where it wasn't before. Note the Github CI is failing due to an upstream nvm issue which i've been tracking but is not fixed yet (so it will stay red for now)

Copy link

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewhilton - feels like you went above and beyond with the changes so far. I've left a couple of new comments on these.

settings.php Outdated Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
classes/local/store/object_file_system.php Outdated Show resolved Hide resolved
classes/task/update_object_tags.php Show resolved Hide resolved
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 2 times, most recently from d50151c to d0f301b Compare August 20, 2024 04:56
@matthewhilton
Copy link
Contributor Author

Just marking as draft temporarily, will be adding some features we discussed internally and then will re-open it.

@matthewhilton matthewhilton mentioned this pull request Oct 3, 2024
@matthewhilton matthewhilton marked this pull request as draft November 14, 2024 00:25
@matthewhilton matthewhilton marked this pull request as ready for review November 21, 2024 01:11
@matthewhilton
Copy link
Contributor Author

Merging this one as it has been thoroughly reviewed and tested. Will squash into a single commit and cherry pick over to the other stables.

@matthewhilton matthewhilton merged commit 847c111 into MOODLE_310_STABLE Dec 6, 2024
18 checks passed
@matthewhilton matthewhilton deleted the wip-add-object-tagging branch December 6, 2024 00:27
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
matthewhilton added a commit that referenced this pull request Dec 6, 2024
* feat: add object tagging

* test: fix unit test count checking

* tagging: don't wait for object lock

* tagging: improve migration controls and progress visibility

* feat: move tagging status reports to check api

* feat: display header with status report

* refactor: integrate tagpushedtime into single update query

* refactor: store tags against object id instead of hash

* chore: organise tagging lang strings

* bugfix: fix mysql query compatibility

* tagging: move mimetype to metadata, add location/orphan tag source

* tagging: check environment config length

* settings: use admin_setting_check if available

* report: remove object size from tag count report

* tagging: ignore if cannot get lock

* ci: small fixups

* tagging: switch  to admin setting for tagging environment

* refactor: get object tag sync status count details separately

* refactor: tweak defaults and add tagging adhoc task spawn limit

* tests: reset static file storage before tests

* bugfix: fix test

* ci: fixup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants