Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handles broken RSS feeds that may not include an <rss> declaration. #179

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

kopertop
Copy link

An example feed is: http://stocknewsnow.com/feed/. The feed is a semi-valid
RSS feed, it's just missing the global declaration.

@danmactough
Copy link
Owner

Wow. I'm glad you were able figure out what was broken about that feed. But I'm not sure it's reasonable to expect a parser library to handle a feed that is so fundamentally broken. No complaints at all about how you implemented this. And thank you for the PR. I'm going to close this for now, but I'm open to discussion.

@kopertop
Copy link
Author

What about an option to allow a user to define the feed type if none are found? Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?

@danmactough
Copy link
Owner

danmactough commented Aug 29, 2016

What about an option to allow a user to define the feed type if none are found?

That sounds doable.

Or an option to allow "cleaning up" a feed before it's read, which would detect this sort of error?

This is probably already doable by hooking into the SAX events emitted, but I've been thinking about adding more hooks for precisely this kind of thing. Something like:

var feedparser = require("feedparser")();

stream.pipe(feedparser).pipe(outputHanlder);
feedparser.on("start-node", function (content) {
if (content.match(/^<channel/) {
this._isRequiresRssClosing = true;
return "<rss>" + content;
}
else {
return content;
}
});
feedparser.on("end-node", function (content) {
if (this._isRequiresRssClosing) {
return content + " </rss>";
}
else {
return content;
}
});

That's rather painfully explicit (and I probably wouldn't send plain text but rather a parsed node), but you get the picture.

@kopertop
Copy link
Author

I'd prefer to see an option to provide "defaults" such as the default parser type, but anything that works would be fine. For now I'm just using my Fork of Feedparser, since my goal of using this library really was to support any feed, no matter how terribly broken it may be.

I deal with almost 20k feeds, most of which are pretty fundamentally broken.

@danmactough danmactough reopened this Oct 26, 2016
@kopertop kopertop force-pushed the handle-broken-rss-feeds branch from 0146a96 to 6444e61 Compare June 13, 2019 16:27
Before this fix, if the summary tag appeared BEFORE the description tag,
it was accepted as the description (rather than the full-text which may
have appeared below the summary tag in the description tag)
Some international feeds use <articles> and <article> rather than <items>
and <item> to separate their articles
* master:
  Support "article" as a tag for feeds
* 'master' of github.com:danmactough/node-feedparser:
  2.2.10
  Update mri
  Update readable-stream
  Update npm audit fixes
  Remove unused Makefile
  Update examples to use node-fetch in place of request
  Update mocha v7
  Update eslint v6
  Replace iconv with iconv-lite
  Update travis config; drop support for unmaintained node versions
* master:
  2.2.10
  Update mri
  Update readable-stream
  Update npm audit fixes
  Remove unused Makefile
  Update examples to use node-fetch in place of request
  Update mocha v7
  Update eslint v6
  Replace iconv with iconv-lite
  Update travis config; drop support for unmaintained node versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants