Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI as a Provider for Descriptive Text Generation #828

Merged
merged 14 commits into from
Dec 13, 2024

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Nov 21, 2024

Description of the Change

In #785, we updated to GPT-4o mini in our OpenAI ChatGPT Provider. This model is multi-modal, which means you can do things with images, video, or audio, not just text.

So far we haven't take advantage of that but this PR brings OpenAI as a Provider for the Descriptive Text Generator Feature. Currently this Feature only runs on the Azure AI Vision Provider, so this brings a second option for that Feature.

Making requests to this model is the same as all of our text generation requests, other than we send the image URL in that request. We have a default prompt that is used and that can be modified from the settings screen, as needed. I tried to keep this prompt fairly generic but open to suggestions on improvements there. It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:

You are an assistant that generates descriptions of images that are used on a website. You will be provided with an image and will describe the main item you see in the image, giving details but staying concise. There is no need to say "the image contains" or similar, just describe what is actually in the image. This text will be important for screen readers, so make sure it is descriptive and accurate but not overly verbose

OpenAI requires images to be at least 512x512, so we return an error message if any image below that threshold is used. It also supports passing in the full image URL or a base64 encoded version of the image. For now I've used the image URL but we could look to go the encoded route, which would make things work in environments where images are publicly accessible (like locally). The downside here is it's slower and more expensive, as it uses more tokens.

Closes #826

Descriptive Text Generator settings screen

How to test the Change

  1. Go to Tools > ClassifAI > Image Processing > Descriptive Text Generator
  2. Select OpenAI as your Provider and add proper credentials
  3. Ensure at least one Descriptive text fields is turned on
  4. Go to your Media Library and choose an image without alt text and run the descriptive text scan. Ensure the text is saved properly
  5. Try this from the single attachment page, using the metabox and ensure this works
  6. Upload a new image and ensure alt text is added during that process
  7. Test other methods as desired, like bulk processing on the Media Library list view or the WP-CLI command
  8. Can also test adding custom prompts to ensure they work

Changelog Entry

Added - Add OpenAI ChatGPT as a Provider for the Descriptive Text Generator Feature.

Credits

Props @dkotter, @jeffpaul

Checklist:

@dkotter dkotter added this to the 3.2.0 milestone Nov 21, 2024
@dkotter dkotter self-assigned this Nov 21, 2024
@dkotter dkotter requested review from jeffpaul and a team as code owners November 21, 2024 22:46
@github-actions github-actions bot added the needs:code-review This requires code review. label Nov 21, 2024
@jeffpaul jeffpaul requested review from a team and faisal-alvi and removed request for a team and jeffpaul November 26, 2024 15:28
iamdharmesh
iamdharmesh previously approved these changes Dec 12, 2024
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @dkotter. code looks good and it tests well.

It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:

Yes, I noticed the same. Since users can directly modify the prompt from the settings and each site caters to a different niche, the best results can be achieved by customizing the prompt as per their requirements, so this is fine.

However, if we want to provide some initial help, we could add separate sample prompts for each (alt text, caption, and description) somewhere in our documentation and include a link in the settings. This way, users who need only one of these can directly copy it into the custom prompt and start using it. What do you think?

@github-actions github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Dec 12, 2024
@github-actions github-actions bot removed the needs:refresh This requires a refreshed PR to resolve. label Dec 12, 2024
@dkotter
Copy link
Collaborator Author

dkotter commented Dec 12, 2024

However, if we want to provide some initial help, we could add separate sample prompts for each (alt text, caption, and description) somewhere in our documentation and include a link in the settings. This way, users who need only one of these can directly copy it into the custom prompt and start using it. What do you think?

@iamdharmesh Good idea. I've added a new page to our documentation where we can add prompt examples and I've added three prompts for this Feature:

  1. Generate just alt text
  2. Generate just image captions
  3. Generate just image descriptions

I then link to this documentation beneath the custom prompt settings.

Not needed here but probably worth a followup to do this same thing anywhere we add prompts, adding new examples to our docs and linking to those.

@dkotter dkotter requested a review from iamdharmesh December 12, 2024 18:04
@iamdharmesh iamdharmesh merged commit 8ebd7d6 into develop Dec 13, 2024
18 checks passed
@iamdharmesh iamdharmesh deleted the feature/826 branch December 13, 2024 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenAI provider for Image Processing > Descriptive Text Generator (aka image alt text)
2 participants