Python 抓取网页链接

大小: 2KB

文件类型: .py

金币: 1

下载: 0 次

发布日期: 2021-05-28
语言: Python
标签: python

高速下载

资源简介

Python 抓取网页下载链接

资源截图

小图大图

代码片段和文件信息

########################################################
# Find gudaiyanqing xiaoshuo on http://www.bookben.com #
########################################################

# -*- coding: utf-8 -*-
import time
import urllib.request
from bs4 import BeautifulSoup

num = 0
web = “http://m.bookben.com“
url = “http://m.bookben.com/gudaiyanqing“
result = “######Get the update of novel website on “ + url + “\n“ + “\n“
date_mark = time.strftime（‘%Y-%m-%d‘time.localtime（time.time（）））
time_mark = time.strftime（‘%Y-%m-%d-%H-%M-%S‘time.localtime（time.time（）））

#Get the update of bookben.com website
main_page = urllib.request.urlopen（url）.read（）.decode（‘gb2312‘errors=‘replace‘）
main_soup = BeautifulSoup（main_page “lxml“）
main_classes = main_soup.findAll（‘li‘ class_=‘li_bg‘）

for main_links in main_classes:
    for main_link in main_links.find_all（‘a‘）:
        novel_name = main_link.get_text（）
        novel_url = web + main_link.get（‘href‘）
        num = num + 1
        print（str（num） + novel_name + novel_url）
        
        novel_page = urllib.request.urlopen（novel_url）.read（）.decode（‘gb2312‘errors=‘replace‘）
        novel_soup = BeautifulSoup（novel_page “lxml“）
        novel_date = novel_sou

共有条评论

Python 抓取网页链接

资源简介

资源截图

代码片段和文件信息

评论

相关资源