node.js - Crawler over unstructured html with nodejs -

- June 15, 2013

i need crawl/scrap static unstructured html, i'm trying content nodejs code, tried cheerio , xpath unsuccessfully.

http://static.puertos.es/pred_simplificada/predolas/tablas/cnt/pas.html

the xpath of first element /html/body/center/center/table/tbody/tr[3] , need every td text in tr.

if try tbody node

      var parser = new parse5.parser();       var document = parser.parse(response.tostring());       var xhtml = xmlser.serializetostring(document);       var doc = new dom().parsefromstring(xhtml);       var select = xpath.usenamespaces({"x": "http://www.w3.org/1999/xhtml"});       var nodes = select("//x:tbody", doc);

i receive [] nodes.

with cheerio try iterate tr elements mentioned above unsuccessfully.

var $ = cheerio.load(response); $('tr').each(function(i, e) {     console.log("content %j", $(e)); });

it seams cheerio not working unstructured , without css html. so, tried workaround using yql following that tutorial

select * html url='http://static.puertos.es/pred_simplificada/predolas/tablas/cnt/pas.html' , xpath='//html/body/center/center/table/tbody'

with yql i'm getting needed, integrate node-yql

Search This Blog

Jal

node.js - Crawler over unstructured html with nodejs -

Comments

Post a Comment

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -