Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: token classification tutorial #5183

Merged

Conversation

sdiazlor
Copy link
Contributor

@sdiazlor sdiazlor commented Jul 9, 2024

Pull Request Template

Closes #5173

Type of change

  • Documentation update

How Has This Been Tested

Checklist

  • I added relevant documentation

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sdiazlor
Copy link
Contributor Author

sdiazlor commented Jul 9, 2024

@davidberenstein1957 After querying a dataset, the best way to iterate over the retrieved records is using .to_list. I think it's useful to change it in the how-to guides instead of the current list(). WDYT?

Copy link

github-actions bot commented Jul 9, 2024

Docs for this PR have been deployed hidden from versioning: https://argilla-io.github.io/argilla/docs_update-admonitions-token-tutorial

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jul 9, 2024

@sdiazlor I see there is something wrong with the doc workflow. I will check what is going wrong here. I think it might be the GH actions extension we are using because the message seems to have been printed correctly within the actions workflow.

@davidberenstein1957
Copy link
Member

@sdiazlor, thanks for handling this. I think it will offer a good basis for people to get started.

Some high level comments, will potentially leave some more in the notebook :)

  • not 100% sure what you mean by list vs to_list. Ideally, we should unify communication on how we intend the API to be used. In that case the Records.to_list would be the preferred way and we should try to use this in the rest of the guides too.
  • I think we should try to format cells in the notebook to adhere to line width etc, this can be done by right-clicking on a notebook and then selecting "format". Perhaps you can add ipynb files to the ruff pre.commit formatter?
  • I think we can remove "Task" from the naming of both the tutorials.

@sdiazlor
Copy link
Contributor Author

sdiazlor commented Jul 9, 2024

@davidberenstein1957 Thanks for your comments!

I meant that when doing this, it is difficult to iterate over the records and retrieve the needed data(fields, suggestions, responses, etc.).
filtered_records = list(dataset.records(status_filter))

Instead, as in the tutorials, it's more helpful to use.
filtered_records = dataset.records(status_filter).to_list(flatten=True)

But your answer solved the doubt, so I'll update it.

Thank you for the formatting reminder, ipynb files were not included.

@davidberenstein1957 davidberenstein1957 linked an issue Jul 9, 2024 that may be closed by this pull request
@sdiazlor
Copy link
Contributor Author

@davidberenstein1957 I applied the changes. However, it doesn't let me update the length too much, especially for the scripts due to pre-commit. We should update the length in pyproject.toml from 120 to the default 88.

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jul 11, 2024

@sdiazlor, feel free to change it to line-width=88 in another PR :)

#5211

argilla/mkdocs.yml Outdated Show resolved Hide resolved
argilla/docs/tutorials/index.md Outdated Show resolved Hide resolved
@davidberenstein1957 davidberenstein1957 merged commit dd54b3e into main Jul 11, 2024
7 checks passed
@davidberenstein1957 davidberenstein1957 deleted the docs/5173-docs-token-classification-tutorial branch July 11, 2024 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOCS] Token classification tutorial
2 participants