html - Python not progressing a list of links -

- May 15, 2014

so, need more detailed data have dig bit deeper in html code of website. wrote script returns me list of specific links detail pages, can't bring python search each link of list me, stops @ first one. doing wrong?

 beautifulsoup import beautifulsoup  import urllib2  lxml import html  import requests   #open site  html_page = urllib2.urlopen("http://www.sitetoscrape.ch/somesite.aspx")  #inform beautifulsoup soup = beautifulsoup(html_page)  #search specific links link in soup.findall('a', href=re.compile('/d/part/of/thelink/ineed.aspx')):     #print found links     print link.get('href')     #complete links     complete_links = 'http://www.sitetoscrape.ch' + link.get('href')     #print complete links     print complete_links # #everything works fine point #  page = requests.get(complete_links) tree = html.fromstring(page.text)  #details name = tree.xpath('//dl[@class="services"]')  in name:     print i.text_content()

also: tutorial can recommend me learn how put output in file , clean up, give variable names, etc?

i think want list of links in complete_links instead of single link. @pynchia , @lemonhead said you're overwritting complete_links every iteration of first loop.

you need 2 changes:

append links list , use loop , scrap each link

# [...] same code here  links_list = [] link in soup.findall('a', href=re.compile('/d/part/of/thelink/ineed.aspx')):     print link.get('href')     complete_links = 'http://www.sitetoscrape.ch' + link.get('href')     print complete_links     link_list.append(complete_links)  # append new link list

scrap each accumulated link in loop

for link in link_list:     page = requests.get(link)     tree = html.fromstring(page.text)      #details     name = tree.xpath('//dl[@class="services"]')      in name:         print i.text_content()

ps: recommend scrapy framework tasks that.

Search This Blog

Jal

html - Python not progressing a list of links -

Comments

Post a Comment

Popular posts from this blog

java - unable show chart in xls document using jasper reports -

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -