We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
怎么按句子的原始顺序,而不是按分数呢?
The text was updated successfully, but these errors were encountered:
最初没有考虑与存储,要改只能改源码,加上了
Sorry, something went wrong.
因为是支持列表输入的,所以可以加个后处理。
代码:
from nlg_yongzhuo.data_preprocess.text_preprocess import cut_sentence from nlg_yongzhuo import mmr docs ="和投票目标的等级来决定新的等级.简单的说。" \ "是上世纪90年代末提出的一种计算网页权重的算法! " \ "当时,互联网技术突飞猛进,各种网页网站爆炸式增长。" \ "业界急需一种相对比较准确的网页重要性计算方法。" \ "是人们能够从海量互联网世界中找出自己需要的信息。" \ "百度百科如是介绍他的思想:PageRank通过网络浩瀚的超链接关系来确定一个页面的等级。" \ "Google把从A页面到B页面的链接解释为A页面给B页面投票。" \ "Google根据投票来源甚至来源的来源,即链接到A页面的页面。" \ "一个高等级的页面可以使其他低等级页面的等级提升。" \ "具体说来就是,PageRank有两个基本思想,也可以说是假设。" \ "即数量假设:一个网页被越多的其他页面链接,就越重)。" \ "质量假设:一个网页越是被高质量的网页链接,就越重要。" \ "总的来说就是一句话,从全局角度考虑,获取重要的信。" docs = cut_sentence(docs) docs_dict = {doc: idx for idx, doc in enumerate(docs)} sums_mmr = mmr.summarize(docs, num=6) sums_mmr = [(docs_dict.get(sm[-1], 0), )+sm for sm in sums_mmr] for sm in sums_mmr: print(sm) """ (1, 2.578644298198872, '是上世纪90年代末提出的一种计算网页权重的算法') (3, 2.436700360083381, '业界急需一种相对比较准确的网页重要性计算方法') (6, 2.3708550245965707, 'PageRank通过网络浩瀚的超链接关系来确定一个页面的等级') (15, 2.297897240552656, '总的来说就是一句话,从全局角度考虑,获取重要的信') (2, 2.289880627933277, ' 当时,互联网技术突飞猛进,各种网页网站爆炸式增长') (4, 2.139471851912774, '是人们能够从海量互联网世界中找出自己需要的信息') """
No branches or pull requests
怎么按句子的原始顺序,而不是按分数呢?
The text was updated successfully, but these errors were encountered: