diff --git a/argilla/docs/tutorials/image_classification.ipynb b/argilla/docs/tutorials/image_classification.ipynb index 58dc0faf37..a48e66af99 100644 --- a/argilla/docs/tutorials/image_classification.ipynb +++ b/argilla/docs/tutorials/image_classification.ipynb @@ -211,7 +211,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the `ylecun/mnist` dataset from [the Hugging Face Hub](https://huggingface.co/datasets/ylecun/mnist). Specifically, we will use the `train` split and get `100` examples. \n", + "Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the `ylecun/mnist` dataset from [the Hugging Face Hub](https://huggingface.co/datasets/ylecun/mnist). Specifically, we will use `100` examples. Because we are dealing with a potentially large image dataset, we will set `streaming=True` to avoid loading the entire dataset into memory and iterate over the data to lazily load it.\n", "\n", "!!! tip\n", " When working with Hugging Face dataset you can set `Image(decode=False)` so that we can get [public image URLs](https://huggingface.co/docs/datasets/en/image_load#local-files), however, this depends on the dataset." diff --git a/argilla/docs/tutorials/image_preference.ipynb b/argilla/docs/tutorials/image_preference.ipynb index 0577861421..a814d7dcc0 100644 --- a/argilla/docs/tutorials/image_preference.ipynb +++ b/argilla/docs/tutorials/image_preference.ipynb @@ -219,14 +219,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Add records" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the `openbmb/RLAIF-V-Dataset` dataset from [the Hugging Face Hub](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). Specifically, we will use the `train` split and get `100` examples. Because we are dealing with a large dataset, we will set `streaming=True` to avoid loading the entire dataset into memorym and iterate over the data to lazily load it.\n", + "## Add records\n", + "\n", + "Even if we have created the dataset, it still lacks the information to be annotated (you can check it in the UI). We will use the `openbmb/RLAIF-V-Dataset` dataset from [the Hugging Face Hub](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). Specifically, we will use `100` examples. Because we are dealing with a potentially large image dataset, we will set `streaming=True` to avoid loading the entire dataset into memory and iterate over the data to lazily load it.\n", "\n", "!!! tip\n", " When working with Hugging Face dataset you can set `Image(decode=False)` so that we can get [public image URLs](https://huggingface.co/docs/datasets/en/image_load#local-files), however, this depends on the dataset." @@ -370,7 +365,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Log into Argilla\n" + "### Log to Argilla\n" ] }, { @@ -387,12 +382,15 @@ "outputs": [], "source": [ "hf_dataset = hf_dataset.add_column(\"id\", range(len(hf_dataset)))\n", - "dataset.records.log(records=hf_dataset[:100], mapping={\n", + "dataset.records.log(records=hf_dataset, mapping={\n", " \"image_data_uri\": \"image\",\n", " \"idx\": \"id\",\n", " \"question\": \"question\",\n", " \"chosen\": \"chosen\",\n", " \"rejected\": \"rejected\",\n", + " \"task_type\": \"task_type\",\n", + " \"question_vector\": \"question_vector\",\n", + " \"origin_dataset\": \"origin_dataset\"\n", "})" ] }, @@ -400,7 +398,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Voilà! We have added the suggestions to the dataset, and they will appear in the UI marked with a ✨. " + "Voilà! We have also added the suggestions to the dataset for the `chosen` `rejected` pairs, and they will appear in the UI marked with a ✨. " ] }, { @@ -425,301 +423,6 @@ " Check this [how-to guide](../how_to_guides/annotate.md) to know more about annotating in the UI." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train your model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "After the annotation, we will have a robust dataset to train the main model. In our case, we will fine-tune using transformers and the . However, you can select the one that best fits your requirements. So, let's start by retrieving the annotated records.\n", - "\n", - "!!! note\n", - " Check this [how-to guide](../how_to_guides/query.md) to know more about filtering and querying in Argilla. Also, you can check the Hugging Face docs on [fine-tuning an image classification model](https://huggingface.co/docs/transformers/en/tasks/image_classification)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Formatting the data" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "dataset = client.datasets(\"image_classification_dataset\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "status_filter = rg.Query(filter=rg.Filter((\"response.status\", \"==\", \"submitted\")))\n", - "\n", - "submitted = dataset.records(status_filter).to_list(flatten=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We then need to convert our base64 images to a format that the model can understand so we will convert them to PIL images again." - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [], - "source": [ - "def base64_to_pil(base64_string):\n", - " image_data = re.sub('^data:image/.+;base64,', '', base64_string)\n", - " image = Image.open(io.BytesIO(base64.b64decode(image_data)))\n", - " return image" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, let's apply that to the whole dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "submitted_pil_image = [\n", - " {\n", - " \"id\": sample[\"id\"],\n", - " \"image\": base64_to_pil(sample[\"image\"]),\n", - " \"label\": sample[\"image_label.responses\"][0],\n", - " }\n", - " for sample in submitted\n", - "]\n", - "submitted_pil_image[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We now need to ensure our images are forwarded with the correct dimensions. Because the original MNIST dataset is greyscale and the VIT model expects RGB, we need to add a channel dimension to the images. We will do this by stacking the images along the channel axis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def greyscale_to_rgb(img) -> Image:\n", - " return Image.merge('RGB', (img, img, img))\n", - "\n", - "submitted_pil_image_rgb = [\n", - " {\n", - " \"image\": greyscale_to_rgb(sample[\"image\"]),\n", - " \"label\": sample[\"label\"],\n", - " }\n", - " for sample in submitted_pil_image\n", - "]\n", - "submitted_pil_image_rgb[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will load the `ImageProcessor` for fine-tuning the model. This processor will handle the image resizing and normalization in order to be compatible with the model we intend to use." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "checkpoint = \"google/vit-base-patch16-224-in21k\"\n", - "processor = AutoImageProcessor.from_pretrained(checkpoint)\n", - "\n", - "submitted_pil_image_rgb_processed = [\n", - " {\n", - " \"pixel_values\": processor(sample[\"image\"], return_tensors='pt')[\"pixel_values\"],\n", - " \"label\": sample[\"label\"],\n", - " }\n", - " for sample in submitted_pil_image_rgb\n", - "]\n", - "submitted_pil_image_rgb_processed[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can now convert the images to a Hugging Face datasets Dataset that is ready for fine-tuning." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "prepared_ds = Dataset.from_list(submitted_pil_image_rgb_processed)\n", - "prepared_ds = prepared_ds.train_test_split(test_size=0.2)\n", - "prepared_ds" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The actual training" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We then need to define our data collator, which will ensure the data is unpacked and stacked correctly for the model. We wi" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def collate_fn(batch):\n", - " return {\n", - " 'pixel_values': torch.stack([torch.tensor(x['pixel_values'][0]) for x in batch]),\n", - " 'labels': torch.tensor([int(x['label']) for x in batch])\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we can define our training metrics. We will use the accuracy metric to evaluate the model's performance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metric = load_metric(\"accuracy\")\n", - "def compute_metrics(p):\n", - " return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We then load our model and configure the labels that we will use for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = AutoModelForImageClassification.from_pretrained(\n", - " checkpoint,\n", - " num_labels=len(labels),\n", - " id2label={int(i): int(c) for i, c in enumerate(labels)},\n", - " label2id={int(c): int(i) for i, c in enumerate(labels)}\n", - ")\n", - "model.config" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, we define the training arguments and start the training process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "training_args = TrainingArguments(\n", - " output_dir=\"./image-classifier\",\n", - " per_device_train_batch_size=16,\n", - " evaluation_strategy=\"steps\",\n", - " num_train_epochs=1,\n", - " fp16=False, # True if you have a GPU with mixed precision support\n", - " save_steps=100,\n", - " eval_steps=100,\n", - " logging_steps=10,\n", - " learning_rate=2e-4,\n", - " save_total_limit=2,\n", - " remove_unused_columns=True,\n", - " push_to_hub=False,\n", - " load_best_model_at_end=True,\n", - ")\n", - "\n", - "trainer = Trainer(\n", - " model=model,\n", - " args=training_args,\n", - " data_collator=collate_fn,\n", - " compute_metrics=compute_metrics,\n", - " train_dataset=prepared_ds[\"train\"],\n", - " eval_dataset=prepared_ds[\"test\"],\n", - " tokenizer=processor,\n", - ")\n", - "\n", - "train_results = trainer.train()\n", - "trainer.save_model()\n", - "trainer.log_metrics(\"train\", train_results.metrics)\n", - "trainer.save_metrics(\"train\", train_results.metrics)\n", - "trainer.save_state()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As the training data had a better-quality, we can expect a better model. So, we can update the remainder of our original dataset with the new model's suggestions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipe = pipeline(\"image-classification\", model=model, image_processor=processor)\n", - "\n", - "def run_inference(batch):\n", - " predictions = pipe(batch[\"image\"])\n", - " batch[\"image_label\"] = [prediction[0][\"label\"] for prediction in predictions]\n", - " batch[\"image_label.score\"] = [prediction[0][\"score\"] for prediction in predictions]\n", - " return batch\n", - "\n", - "hf_dataset = hf_dataset.map(run_inference, batched=True)\n", - "dataset.records.log(records=hf_dataset[:100], mapping={\"image_data_uri\": \"image\"})" - ] - }, { "cell_type": "markdown", "metadata": {},