Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for automatic summarization #17

Open
gabriele-tomassetti opened this issue Mar 21, 2020 · 5 comments
Open

Adding support for automatic summarization #17

gabriele-tomassetti opened this issue Mar 21, 2020 · 5 comments

Comments

@gabriele-tomassetti
Copy link
Member

The library can extract any manual excerpt that is contained in the article (i.e., the short summary that usually is shown in Facebook or Twitter). However, it can be useful to also generate an automatic summary for long articles.
The issue is that there does not seem to be nothing really effective and light on resources to do that. So, the end result may vary in quality.

@Mochitto
Copy link

An LLM could likely help with this.
A simple solution is allowing users to configure the reader with their API key and just contact OpenAI API to get summaries.

@gabriele-tomassetti
Copy link
Member Author

We should probably implement a basic interface to let people choose how to obtain the summary, like we do for converting to plain text. So, users can choose to use a LLM.

Just like the original library, this was designed for people that wanted a light and privacy-oriented solution to get an article free of clutter. So, I do not think that a LLM would be good a fit for integration in this library. To be fair, I never found a good way to do this algorithmically, hence why we should give users a simple way to do what they want.

@Mochitto
Copy link

Mochitto commented Jul 17, 2024

A fair concern I totally agree with.
I was also thinking about it being an opt-in and at discretion of users (since they're using their API keys).

An extraction based algorithm could be added, sorting out important parts of text after ranking the sentences, but quality would probably vary a lot and complexity spiral out of control, with i18n in mind.

Maybe there could be some nice text analysis visualization tools that could aid in skimming through the text more quickly, instead of creating summaries (even a simple highlight on the longest sentences, or of field-specific terms based on frequency scores).

Abstraction summarizations, with privacy in mind, could be implemented in some years, if we get local-based LLMs.

@PeterHagen
Copy link

I use SmartReader in combination with ReverseMarkdown (to convert the Html to Markdown) and pass the text to a local running LLM with Ollama. You can choose which LLM to use. For me, Gemma2 works quite well. I do want my responses to be in Dutch.

If you want to be able to use different languages, I would suggest to use a LLM, instead of creating some algorithm yourself.

@gabriele-tomassetti
Copy link
Member Author

If you want to be able to use different languages, I would suggest to use a LLM, instead of creating some algorithm yourself.

This is probably the best solution, since I never found a good way to do this algorithmically, it is better to just give a hook to help users do as they want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants