-
Notifications
You must be signed in to change notification settings - Fork 6
Home
UnderpantsGnome edited this page Sep 12, 2010
·
3 revisions
Hpricot Scrub is a wrapper around Hpricot that adds methods to scrub HTML tags from a document.
To Install
gem install hrpicot_scrub </pre>
Now you can use the following to remove all tags from an HTML doc
require 'rubygems' require 'hpricot_scrub'
doc = Hpricot(open(‘http://slashdot.org/’).read)
text = doc.scrub
Scrub the doc based on a config hash ([source:/examples/config.yml sample config])
doc.scrub(hash) </pre>
Strip all hrefs, leaving the text inside in tact
(doc/:a).strip </pre>
The gem version also has a couple of new convenience methods on String
String#scrub(config={}) String#scrub!(config={}) </pre>
>> str = '<a href="http://example.com/">example.com</a>' => "<a href="http://example.com/">example.com</a>" >> str.scrub => "example.com" >> str => "<a href="http://example.com/">example.com</a>" >> str.scrub! => "example.com" >> str => "example.com" </pre>