-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenAI as a Provider for Descriptive Text Generation #828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @dkotter. code looks good and it tests well.
It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:
Yes, I noticed the same. Since users can directly modify the prompt from the settings and each site caters to a different niche, the best results can be achieved by customizing the prompt as per their requirements, so this is fine.
However, if we want to provide some initial help, we could add separate sample prompts for each (alt text, caption, and description) somewhere in our documentation and include a link in the settings. This way, users who need only one of these can directly copy it into the custom prompt and start using it. What do you think?
@iamdharmesh Good idea. I've added a new page to our documentation where we can add prompt examples and I've added three prompts for this Feature:
I then link to this documentation beneath the custom prompt settings. Not needed here but probably worth a followup to do this same thing anywhere we add prompts, adding new examples to our docs and linking to those. |
Description of the Change
In #785, we updated to GPT-4o mini in our OpenAI ChatGPT Provider. This model is multi-modal, which means you can do things with images, video, or audio, not just text.
So far we haven't take advantage of that but this PR brings OpenAI as a Provider for the Descriptive Text Generator Feature. Currently this Feature only runs on the Azure AI Vision Provider, so this brings a second option for that Feature.
Making requests to this model is the same as all of our text generation requests, other than we send the image URL in that request. We have a default prompt that is used and that can be modified from the settings screen, as needed. I tried to keep this prompt fairly generic but open to suggestions on improvements there. It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:
OpenAI requires images to be at least 512x512, so we return an error message if any image below that threshold is used. It also supports passing in the full image URL or a base64 encoded version of the image. For now I've used the image URL but we could look to go the encoded route, which would make things work in environments where images are publicly accessible (like locally). The downside here is it's slower and more expensive, as it uses more tokens.
Closes #826
How to test the Change
Tools > ClassifAI > Image Processing > Descriptive Text Generator
Descriptive text fields
is turned onChangelog Entry
Credits
Props @dkotter, @jeffpaul
Checklist: