XMLParserHTML provides SAX and DOM parsers in Pharo for HTML that convert possibly malformed HTML into well-formed XML.
Metacello new
baseline: 'XMLParserHTML';
repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
load.
A simple example on how to use the XML parser for HTML:
...
results in the following XML output
...
This library together with XPath enables you to do web scrapping from the confort of the Pharo toolset.
You can learn more about how to do it reading the Scrapping with XPath booklet.
This project was migrated from http://smalltalkhub.com/#!/~PharoExtras/XMLParserHTML