You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sure, I think it can be achieved by both Grounding DINO and Florence-2, we will consider support it using Florence-2 in the future release. But we only ground phrases in one specific frame and track the object in the following frame, if you need more complex applications, you may modify it by yourself.
For custom video input, does the supported text prompt have to be a word (i.e. a word representing a certain category), can it be a sentence?
The text was updated successfully, but these errors were encountered: