Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Embedding search for channel #21

Open
7 tasks
niracler opened this issue Jan 25, 2024 · 0 comments
Open
7 tasks

[Feature request] Embedding search for channel #21

niracler opened this issue Jan 25, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@niracler
Copy link
Owner

niracler commented Jan 25, 2024

Description

use embedding to search the channel message

TODO

  • Nyaruko CLI :
    • Export history message from channel
    • Auto embedding base on Openai API (maybe use LangChain or llamaIndex will be better )
    • Insert vector and message metadata to Vectorize and R2
    • Automated update vector
  • /search command : to let the bot search from channel database
  • Use trafilatura to parse the url from channel message

Example

$ python embedding_search.py 动物森友会
+------+---------------------+--------------------------------------------------------------+----------------+
|      | date                | text                                                         |   similarities |
|------+---------------------+--------------------------------------------------------------+----------------|
| 1041 | 2021-04-03 06:18:40 | 尼尔 森友会                                                  |       0.843896 |
|  836 | 2021-10-16 02:37:16 | 动森直面会中文视频                                           |       0.826405 |
|      |                     | https://www.youtube.com/watch?v=rI_jWfNd2dc                  |                |
| 1208 | 2019-11-10 00:05:56 | 云养动物好像很有趣啊                                         |       0.822377 |
|  489 | 2023-06-16 09:33:15 | 看着小猫的生活就想起西西佛神话里面的西西佛                   |       0.802677 |
|  369 | 2023-08-16 02:15:54 | 家猫会不会无聊寂寞                                           |       0.797062 |
|   13 | 2023-12-14 13:17:59 | 参加了🤗                                                     |       0.796492 |
| 1177 | 2020-02-12 10:27:45 | 国人吃野生动物这种事情之所以屡屡不绝,和头脑中根深蒂固的中医 |       0.796363 |
|      |                     | 观念亦有直接关系。养生、食疗、进补、药膳、以形补形、补气血…… |                |
|      |                     | 伪科学死灰复燃,现在不加以遏制,以后也还会发生类似的事情。   |                |
|      |                     | 科学才是唯一的道路。                                         |                |
|  801 | 2021-11-07 13:46:21 | 想不到,今年的年度游戏竟然还是动森跟火纹。                   |       0.796246 |
|      |                     | 动森是之前没玩够,火纹是因为某个最大最恶事件导致我要重玩的。 |                |
|  837 | 2021-10-16 02:37:16 | 不会吧,难道我的年度游戏又要变成动森了吗?                   |       0.795871 |
|  423 | 2023-07-29 14:11:22 | 鬼畜到能称之为精神污染的头像了~~                           |       0.794144 |
+------+---------------------+--------------------------------------------------------------+----------------+

Reference

@niracler niracler added the enhancement New feature or request label Jan 25, 2024
@niracler niracler self-assigned this Jan 25, 2024
@niracler niracler changed the title [Feature request] Embedding search [Feature request] Embedding search for channel Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant