Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT / BUGFIX] argilla: manage responses for deleted users #5070

Closed

Conversation

frascuchon
Copy link
Member

Description

This PR adds support to SDK to work with responses from deleted users. In those cases, the user_id is None. The current SDK will set the DELETED_USER.id value, which can be used to identify responses from deleted users.

Notes:

Even if we're defining a DELETE_USER constant, the responses are using the user_id. So, not sure if this user constant makes more sense if we use user instances on responses instead of their IDs.

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • New feature (non-breaking change which adds functionality)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • Test A
  • Test B

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@frascuchon frascuchon requested review from jfcalvo and burtenshaw June 20, 2024 10:18
Copy link
Contributor

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One high level question: how are we dealing wiht the situation where a record contains responses from multiple deleted users.

For example, when we try to export them here, they will be combined into one.

@frascuchon
Copy link
Member Author

One high level question: how are we dealing wiht the situation where a record contains responses from multiple deleted users.

For example, when we try to export them here, they will be combined into one.

Thanks @burtenshaw. Yes, this solution may not work for those cases. I'm thinking more about the export format and I think we should store data in a more lightweight, maybe we don't need to store responses with users if the SDK has mechanisms to assign responses to users in an easier way. This can also help to create "portables" dataset snapshots (moving between argilla servers)

@burtenshaw
Copy link
Contributor

burtenshaw commented Jun 21, 2024

One high level question: how are we dealing wiht the situation where a record contains responses from multiple deleted users.
For example, when we try to export them here, they will be combined into one.

Thanks @burtenshaw. Yes, this solution may not work for those cases. I'm thinking more about the export format and I think we should store data in a more lightweight, maybe we don't need to store responses with users if the SDK has mechanisms to assign responses to users in an easier way. This can also help to create "portables" dataset snapshots (moving between argilla servers)

Ok. It makes sense to move that to a new PR on export mapping, where we also deal with this.

How should we solve this finish PR? One thing I thought of was switching from a nested dictionaries to arrays so that we export:

{
    "responses": [ 0,1,1 ],
    "response_user_ids": [ "uuid","uuid","uuid"],
    "...":"..."
}

That way, in this situation the response_user_ids could be non-unique and response values are usable.

@frascuchon
Copy link
Member Author

After changes included in:

Maybe the simplest solution would be to delete responses on cascade until we decide the approach when deleting users. cc @burtenshaw @jfcalvo

@frascuchon frascuchon requested a review from burtenshaw June 21, 2024 14:57
@jfcalvo
Copy link
Member

jfcalvo commented Jun 21, 2024

Maybe the simplest solution would be to delete responses on cascade until we decide the approach when deleting users.

@frascuchon I agree with that.

# Pull Request Template
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

Since the `user_id` is a nullable column, the search engine must tackle
responses for those scenarios. This PR changes the search engine and
skips responses when the user_id is None.

This can be tackled in a better way in the future once we have a clearer
view of how to deal with deleted users.


Refs: #5070 

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Bug fix (non-breaking change which fixes an issue)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- follows the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
@frascuchon frascuchon added this to the v2.0.0 milestone Jun 25, 2024
@burtenshaw
Copy link
Contributor

burtenshaw commented Jun 25, 2024

Maybe the simplest solution would be to delete responses on cascade until we decide the approach when deleting users.

@frascuchon It's not ideal but I can see why it makes sense. What's the status of this PR?

@frascuchon
Copy link
Member Author

Finally, this PR won't apply. See #5126

@frascuchon frascuchon closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] [SDK] [v2] StopIteration error when retrieving records with existing responses.
3 participants