Adding support for automatic summarization #17

gabriele-tomassetti · 2020-03-21T07:55:41Z

The library can extract any manual excerpt that is contained in the article (i.e., the short summary that usually is shown in Facebook or Twitter). However, it can be useful to also generate an automatic summary for long articles.
The issue is that there does not seem to be nothing really effective and light on resources to do that. So, the end result may vary in quality.

Mochitto · 2024-07-16T13:46:49Z

An LLM could likely help with this.
A simple solution is allowing users to configure the reader with their API key and just contact OpenAI API to get summaries.

gabriele-tomassetti · 2024-07-17T08:49:13Z

We should probably implement a basic interface to let people choose how to obtain the summary, like we do for converting to plain text. So, users can choose to use a LLM.

Just like the original library, this was designed for people that wanted a light and privacy-oriented solution to get an article free of clutter. So, I do not think that a LLM would be good a fit for integration in this library. To be fair, I never found a good way to do this algorithmically, hence why we should give users a simple way to do what they want.

Mochitto · 2024-07-17T09:38:01Z

A fair concern I totally agree with.
I was also thinking about it being an opt-in and at discretion of users (since they're using their API keys).

An extraction based algorithm could be added, sorting out important parts of text after ranking the sentences, but quality would probably vary a lot and complexity spiral out of control, with i18n in mind.

Maybe there could be some nice text analysis visualization tools that could aid in skimming through the text more quickly, instead of creating summaries (even a simple highlight on the longest sentences, or of field-specific terms based on frequency scores).

Abstraction summarizations, with privacy in mind, could be implemented in some years, if we get local-based LLMs.

PeterHagen · 2024-09-14T09:28:00Z

I use SmartReader in combination with ReverseMarkdown (to convert the Html to Markdown) and pass the text to a local running LLM with Ollama. You can choose which LLM to use. For me, Gemma2 works quite well. I do want my responses to be in Dutch.

If you want to be able to use different languages, I would suggest to use a LLM, instead of creating some algorithm yourself.

gabriele-tomassetti · 2024-09-15T08:58:51Z

If you want to be able to use different languages, I would suggest to use a LLM, instead of creating some algorithm yourself.

This is probably the best solution, since I never found a good way to do this algorithmically, it is better to just give a hook to help users do as they want.

gabriele-tomassetti added the enhancement label Mar 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for automatic summarization #17

Adding support for automatic summarization #17

gabriele-tomassetti commented Mar 21, 2020

Mochitto commented Jul 16, 2024

gabriele-tomassetti commented Jul 17, 2024

Mochitto commented Jul 17, 2024 •

edited

Loading

PeterHagen commented Sep 14, 2024

gabriele-tomassetti commented Sep 15, 2024

Adding support for automatic summarization #17

Adding support for automatic summarization #17

Comments

gabriele-tomassetti commented Mar 21, 2020

Mochitto commented Jul 16, 2024

gabriele-tomassetti commented Jul 17, 2024

Mochitto commented Jul 17, 2024 • edited Loading

PeterHagen commented Sep 14, 2024

gabriele-tomassetti commented Sep 15, 2024

Mochitto commented Jul 17, 2024 •

edited

Loading