资源简介

使用爬虫抓取2018年的全国电影票房数据,通过numpy,pandas,matplotlib分析绘制图标数据

资源截图

代码片段和文件信息

# encoding=utf-8

from pymongo import MongoClient
import requests
import time
import json
from datetime import datetime timedelta


def download(date):
    ‘‘‘
    download data
    :param date:
    :return:
    ‘‘‘
    url = “https://zgdypw.cn/pors/w/webStatisticsDatas/api/{}/searchDayBoxOffice“.format(date)
    headers = {
        “Accept“: “application/json text/plain */*“
        “Accept-Encoding“: “gzip deflate br“
        “Accept-Language“: “zh-CNzh;q=0.9en;q=0.8“
        “Cache-Control“: “no-cache“
        “Connection“: “keep-alive“
        “Host“: “zgdypw.cn“
        “Pragma“: “no-cache“
        “Referer“: “https://zgdypw.cn/“
        “User-Agent“: “Mozilla/5.0(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/67.0.3396.99 Safari/537.36“
    }
    proxies = {
        “http“: “http://HA316W644LD61S7D:6DB767CAAC501ABD@http-dyn.abuyun.com:9020“
        “https“: “http://HA316W644LD61S7D:6DB767CAAC501ABD@http-dyn.abuyun.com:9020“
    }
    response = requests.get(url headers=headers proxies=proxies)
    response.encoding = “utf-8“
    if response.status_code == 200:
        data = json.loads(response.text)[“data“]
        # total
        dayBoxOffice = data[“dayBoxOffice“]
        with open(“dayBoxOffice.csv“ “a+“ encoding=“utf-8“) as file:
            file.write(“{0}{1}{2}{3}{4}\n“.format(
                dayBoxOffice[“businessDay“]
                dayBoxOffice[“cinemaCount“]
                dayBoxOffice[“totalAudience“]
                dayBoxOffice[“totalBoxoffice“]
                dayBoxOffice[“totalSession“]
            ))
        # CinemaChains
        top10CinemaChains = data[“top10CinemaChains“]
        with open(“top10CinemaChains.csv“ “a+“ encoding=“utf-8“) as file:
            for item in top10CinemaChains:
                file.write(“{0}{1}{2}{3}{4}{5}\n“.format(
                    date
                    item[“cinemaChainName“]
                    item[“dayAudience“]
                    item[“daySession“]
                    item[“rank“]
                    item[“totalSales“]
                ))
        # Cinemas
        top10Cinemas = data[“top10Cinemas“]
        with open(“top10Cinemas.csv“ “a+“ encoding=“utf-8“) as file:
            for item in top10Cinemas:
                file.write(“{0}{1}{2}{3}{4}{5}\n“.format(
                    date
                    item[“cinemaName“]
                    item[“dayAudience“]
                    item[“daySession“]
                    item[“rank“]
                    item[“totalSales“]
                ))
        # Citys
        top10Citys = data[“top10Citys“]
        with open(“top10Citys.csv“ “a+“ encoding=“utf-8“) as file:
            for item in top10Citys:
                file.write(“{0}{1}{2}{3}{4}{5}\n“.format(
                    date
                    item[“cityName“]
                    item[“dayAudience“]
           

 属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件     355288  2018-12-26 17:45  movies\.ipynb_checkpoints\movie1-checkpoint.ipynb

     文件     136926  2018-12-26 19:16  movies\.ipynb_checkpoints\movie2-checkpoint.ipynb

     文件     921020  2018-12-27 10:08  movies\.ipynb_checkpoints\movie3-checkpoint.ipynb

     文件     557404  2018-12-27 10:01  movies\.ipynb_checkpoints\movie4-checkpoint.ipynb

     文件     455002  2018-12-27 09:42  movies\.ipynb_checkpoints\movie5-checkpoint.ipynb

     文件    1193765  2018-12-27 14:06  movies\.ipynb_checkpoints\movie6-checkpoint.ipynb

     文件      16047  2018-12-26 16:53  movies\dayBoxOffice.csv

     文件       4164  2018-12-26 17:24  movies\download.py

     文件     355288  2018-12-27 12:22  movies\movie1.ipynb

     文件     137815  2018-12-27 09:07  movies\movie2.ipynb

     文件     921020  2018-12-27 10:08  movies\movie3.ipynb

     文件     557404  2018-12-27 10:01  movies\movie4.ipynb

     文件     455002  2018-12-27 09:42  movies\movie5.ipynb

     文件    1193765  2018-12-27 14:06  movies\movie6.ipynb

     文件     170164  2018-12-26 16:54  movies\top10CinemaChains.csv

     文件     243167  2018-12-26 16:55  movies\top10Cinemas.csv

     文件     146211  2018-12-26 16:55  movies\top10Citys.csv

     文件     194667  2018-12-27 09:07  movies\top10Films.csv

     文件          0  2018-12-26 16:03  movies\__init__.py

     目录          0  2018-12-27 12:19  movies\.ipynb_checkpoints

     目录          0  2018-12-27 14:06  movies

----------- ---------  ---------- -----  ----

              8014119                    21


评论

共有 条评论