node.js - Crawler over unstructured html with nodejs -
i need crawl/scrap static unstructured html, i'm trying content nodejs code, tried cheerio , xpath unsuccessfully.
http://static.puertos.es/pred_simplificada/predolas/tablas/cnt/pas.html
the xpath of first element /html/body/center/center/table/tbody/tr[3] , need every td text in tr.
if try tbody node
var parser = new parse5.parser(); var document = parser.parse(response.tostring()); var xhtml = xmlser.serializetostring(document); var doc = new dom().parsefromstring(xhtml); var select = xpath.usenamespaces({"x": "http://www.w3.org/1999/xhtml"}); var nodes = select("//x:tbody", doc);
i receive []
nodes.
with cheerio try iterate tr elements mentioned above unsuccessfully.
var $ = cheerio.load(response); $('tr').each(function(i, e) { console.log("content %j", $(e)); });
it seams cheerio not working unstructured , without css html. so, tried workaround using yql following that tutorial
select * html url='http://static.puertos.es/pred_simplificada/predolas/tablas/cnt/pas.html' , xpath='//html/body/center/center/table/tbody'
with yql i'm getting needed, integrate node-yql
Comments
Post a Comment