Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT][REFACTOR] SDK: allow to remove settings #5584

Merged
merged 16 commits into from
Nov 6, 2024

Conversation

frascuchon
Copy link
Member

@frascuchon frascuchon commented Oct 8, 2024

Description

This PR allows users to change all the settings before creating the dataset, and metadata and vector, when the dataset is created.

I included the remove method instead of a new settings.add(override=True) since this change requires a lot of internal refactor with the current design.

An example of how to use could be to change inferred settings from hub before create the final dataset

settings = rg.Settings.from_hub("google/frames-benchmark")

settings.fields.remove("answer")

for field in settings.fields:
    if field.name.startswith("wiki"):
        settings.fields.remove(field)
        settings.metadata.add(rg.TermsMetadataProperty(field.name))

settings.questions.add(rg.TextQuestion(name="answer", title="Answer"))

dataset = rg.Dataset.from_hub("google/frames-benchmark", settings=settings)

Or adding new metadata or vector settings when the dataset is created:

dataset = client.dataset("my-dataset")

dataset.metadata.add([rg.TermsMetadata(name="split")])
dataset.update() # this line sends the change to the argilla server

Updated

You can remove a property by name:

settings.fields.remove("text")

or by property instance

for field in settings.fields:
   if field.name.startwith("wiki"):
      settings.fields.remove(field)

And you can override existing properties to change the property type, using the new settings.add method:

settings = Settings.from_hub(...)

# change some settings definitions before create the dataset
settings.add(rg.TermsMetadata("wikipedia_1")) # this will remove the existing wikipedia_1 property and will create a new terms metadata one

Type of change

  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@frascuchon frascuchon requested a review from burtenshaw October 8, 2024 08:37
@burtenshaw
Copy link
Contributor

Just so I understand, there is still a singular way to remove settings? And this is just an example of iteration:

for field in settings.fields:
    if field.name.startswith("wiki"):
        settings.fields.remove(field)
        settings.metadata.add(rg.TermsMetadataProperty(field.name))

for example, fields can be deleted by their name:

settings.fields.remove("wiki_field")

@burtenshaw
Copy link
Contributor

I included the remove method instead of a new settings.add(override=True) since this change requires a lot of internal refactor with the current design.

I get that this requires a refactor, but I think that with two steps, it makes the feature highly niche and expert.

Since we're working completely locally now, can we not add the settings.add(override=True) on top of this implementation, so that the add method round robins all attributes and tries to delete before adding the new attribute.

# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR implements a method to override settings properties in the
settings resource:

```python
settings = rg.Settings.from_hub("google/frames-benchmark")

settings.add(rg.TextQuestion(name="answer", title="Answer"))

dataset = rg.Dataset.from_hub("google/frames-benchmark", settings=settings, name=f"frames-benchmark-{uuid4()}")
```

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)
- Improvement (change adding some improvement to an existing
functionality)
- Documentation update

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@frascuchon
Copy link
Member Author

Just so I understand, there is still a singular way to remove settings? And this is just an example of iteration:

Yes, sorry. I didn't include it in the example. You can also remove the setting by name:

settings.fields.remove("text")

is a valid statement for this PR

@burtenshaw
Copy link
Contributor

@frascuchon Should we put all the add and remove options in the description and the get feedback on the API?

@burtenshaw
Copy link
Contributor

@frascuchon We haven't got feedback on this. I think we should move forward with the current implementation. Or ask again.

@frascuchon
Copy link
Member Author

@frascuchon We haven't got feedback on this. I think we should move forward with the current implementation. Or ask again.

I will add some related tests and documents. I vote to merge it.

@frascuchon
Copy link
Member Author

frascuchon commented Oct 17, 2024

@frascuchon We haven't got feedback on this. I think we should move forward with the current implementation. Or ask again.

I will add some related tests and documents. I vote to merge it.

Done @burtenshaw. Here are the comments:

  • tests (I changed the raised exception to SettingsError to align the rest of the code)
  • docs

@frascuchon frascuchon force-pushed the refactor/allow-remove-settings branch from a407b25 to c2de471 Compare October 18, 2024 11:16
@frascuchon
Copy link
Member Author

@burtenshaw take a look at the latest changes, please

@burtenshaw
Copy link
Contributor

@frascuchon This looks good to me.

sorry for the delay.

@frascuchon frascuchon merged commit af2033c into develop Nov 6, 2024
7 checks passed
@frascuchon frascuchon deleted the refactor/allow-remove-settings branch November 6, 2024 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants