html - Python not progressing a list of links -


so, need more detailed data have dig bit deeper in html code of website. wrote script returns me list of specific links detail pages, can't bring python search each link of list me, stops @ first one. doing wrong?

 beautifulsoup import beautifulsoup  import urllib2  lxml import html  import requests   #open site  html_page = urllib2.urlopen("http://www.sitetoscrape.ch/somesite.aspx")  #inform beautifulsoup soup = beautifulsoup(html_page)  #search specific links link in soup.findall('a', href=re.compile('/d/part/of/thelink/ineed.aspx')):     #print found links     print link.get('href')     #complete links     complete_links = 'http://www.sitetoscrape.ch' + link.get('href')     #print complete links     print complete_links # #everything works fine point #  page = requests.get(complete_links) tree = html.fromstring(page.text)  #details name = tree.xpath('//dl[@class="services"]')  in name:     print i.text_content() 

also: tutorial can recommend me learn how put output in file , clean up, give variable names, etc?

i think want list of links in complete_links instead of single link. @pynchia , @lemonhead said you're overwritting complete_links every iteration of first loop.

you need 2 changes:

  • append links list , use loop , scrap each link

    # [...] same code here  links_list = [] link in soup.findall('a', href=re.compile('/d/part/of/thelink/ineed.aspx')):     print link.get('href')     complete_links = 'http://www.sitetoscrape.ch' + link.get('href')     print complete_links     link_list.append(complete_links)  # append new link list 
  • scrap each accumulated link in loop

    for link in link_list:     page = requests.get(link)     tree = html.fromstring(page.text)      #details     name = tree.xpath('//dl[@class="services"]')      in name:         print i.text_content() 

ps: recommend scrapy framework tasks that.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -