-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] argilla
: add support to distribution
#5187
[FEATURE] argilla
: add support to distribution
#5187
Conversation
…5013) # Description This PR is the first one related with distribution task feature, adding the following changes: * Added `distribution` JSON column to `datasets` table: * This column is non-nullable so a value is always required when a dataset is created. * By default old datasets will have the value `{"strategy": "overlap", "min_submitted": 1}`. * Added `distribution` attribute to `DatasetCreate` schema: * None is not a valid value. * If no value is specified for this attribute `DatasetOverlapDistributionCreate` with `min_submitted` to `1` is used. * `DatasetOverlapDistributionCreate` only allows values greater or equal than `1` for `min_submitted` attributed. * Now the context `create_dataset` function is receiving a dictionary instead of `DatasetCreate` schema. * Moved dataset creation validations to a new `DatasetCreateValidator` class. Update of `distribution` attribute for datasets will be done in a different issue. Closes #5005 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactor (change restructuring the codebase without changing functionality) - [ ] Improvement (change adding some improvement to an existing functionality) - [ ] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [x] Adding new tests and passing old ones. - [x] Check that migration works as expected with old datasets and SQLite. - [x] Check that migration works as expected with old datasets and PostgreSQL. **Checklist** - [ ] I added relevant documentation - [ ] follows the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Paco Aranda <[email protected]>
…nly (#5148) # Description Add changes to `responses_submitted` relationship to avoid problems with existent `responses` relationship and avoid a warning message that SQLAlchemy was reporting. Refs #5000 **Type of change** - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** - [x] Warning is not showing anymore. - [x] Test are passing. **Checklist** - I added relevant documentation - follows the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
# Description This PR adds changes to the endpoints to get the dataset progress and current user metrics in the following way: ## `GET /datasets/:dataset_id/progress` I have changed the endpoint to support the new business logic behind the distribution task. Responding with only `completed` and `pending` type of records and using `total` as the sum of the two types of records. Old response without distribution task: ```json { "total": 8, "submitted": 2, "discarded": 2, "conflicting": 1, "pending": 3 } ``` New response with the changes from this PR supporting distribution task: * The `completed` attribute will have the count of all the records with status as `completed` for the dataset. * The `pending` attribute will have the count of all the records with status as `pending` for the dataset. * The `total` attribute will have the sum of the `completed` and `pending` attributes. ```json { "total": 5 "completed": 2, "pending": 3, } ``` @damianpumar some changes are required on the frontend to support this new endpoint structure. ## `GET /me/datasets/:dataset_id/metrics` Old response without distribution task: ```json { "records": { "count": 7 }, "responses": { "count": 4, "submitted": 1, "discarded": 2, "draft": 1 } } ``` New response with the changes from this PR supporting distribution task: * `records` section has been eliminated because is not necessary anymore. * `responses` `count` section has been renamed to `total`. * `pending` section has been added to the `responses` section. ```json { "responses": { "total": 7, "submitted": 1, "discarded": 2, "draft": 1, "pending": 3 } } ``` The logic behind these attributes is the following: * `total` is the sum of `submitted`, `discarded`, `draft` and `pending` attribute values. * `submitted` is the count of all responses belonging to the current user in the specified dataset with `submitted` status. * `discarded` is the count of all responses belonging to the current user in the specified dataset with `discarded` status. * `draft` is the count of all responses belonging to the current user in the specified dataset with `draft` status. * `pending` is the count of all records with `pending` status for the dataset that has not responses belonging to the current user. @damianpumar some changes are required on the frontend to support this new endpoint structure as well. Closes #5139 **Type of change** - Breaking change (fix or feature that would cause existing functionality to not work as expected) **How Has This Been Tested** - [x] Modifying existent tests. - [x] Running test suite with SQLite and PostgreSQL. **Checklist** - I added relevant documentation - follows the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Paco Aranda <[email protected]> Co-authored-by: Damián Pumar <[email protected]>
…notated datasets (#5171) # Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> This PR changes the current validator when updating the distribution task to allow updating the distribution task settings for datasets with records without ANY response. cc @nataliaElv **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
…/argilla/add-record-status-property
Co-authored-by: José Francisco Calvo <[email protected]>
# Description After investigate timeouts for PostgreSQL I have found that timeouts should not affect errors when a SERIALIZABLE transactions is rollbacked because another concurrent update error is raised. So the only way to support concurrent updates with PostgreSQL and SERIALIZABLE transactions is to capture errors and retry the transaction. This code has the following changes: * Start using `backoff` library to retry any of the possible CRUD context functions updating responses and record statuses, using SERIALIZABLE database sessions. * This change has the side effect of working with PostgreSQL and SQLite at the same time. * I have set a fixed time of 15 seconds as maximum time for retrying with exponential backoff. * I have moved search engine updates outside of the transaction block. * This should mitigate errors on high concurrency scenarios for PostgreSQL and SQLite: * For SQLite we have the additional setting to set a timeout if necessary. * I have changed `DEFAULT_DATABASE_SQLITE_TIMEOUT` value to `5` seconds so the backoff logic will handle possible problems with locked database errors and SQLite. Refs #5000 **Type of change** - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** - [x] Manually testing with PostgreSQL and SQLite, running benchmarks using 20 concurrent requests. - [x] Running test suite for PostgreSQL and SQLite. **Checklist** - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
…/argilla/add-support-to-distribution
- [x] Update progress bar styles - [x] Show two decimals in the progress bar of the dataset list - [x] Remove donut chart and replace with small cards - [x] Replace my progress bar with team progress bar - [x] Show submitted info when panel is collapsed --------- Co-authored-by: Damián Pumar <[email protected]> Co-authored-by: David Berenstein <[email protected]>
This reverts commit 9dba7ef.
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> This PR reviews and improves the es mapping definition for responses by storing responses as a list of user responses. This change brings some improvements: - Scaling better when the number of annotators increases (the index mapping remains the same) - Simplify queries without users - Support compute distributions on response values ( it can build aggregations on top of question response values) **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: Francisco Aranda <[email protected]>
…/argilla/add-support-to-distribution
for more information, see https://pre-commit.ci
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> Closes #5229 **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Documentation update **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sara Han <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5187 +/- ##
===========================================
- Coverage 91.54% 90.41% -1.13%
===========================================
Files 135 137 +2
Lines 5865 5749 -116
===========================================
- Hits 5369 5198 -171
- Misses 496 551 +55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-5187-ki24f765kq-no.a.run.app |
Description
This PR adds support to configure the task distribution strategy when creating or updating datasets.
We can create datasets with specific task distribution setup
or update an existing dataset (without any user response)
Closes #5033
Closes #5034
Refs: #5246
Type of change
How Has This Been Tested
Checklist