使用python访问网页

发布时间：2019-09-07 08:03:53编辑：auto阅读（3231）

python版本：3

访问页面:

import urllib.request

url="https://blog.csdn.net/qq_33160790"
req=urllib.request.Request(url)
resp=urllib.request.urlopen(req)
data=resp.read().decode('utf-8')

print(data)

效果：
这里写图片描述

抓取csdn页面中文章的链接：
xpath语法可以看这篇文章：
http://www.w3school.com.cn/xpath/xpath_syntax.asp

from lxml import etree
import requests

url='https://blog.csdn.net/qq_33160790'
resp=requests.get(url)
if resp.status_code==requests.codes.ok:
        html=etree.HTML(resp.text)
        hrefs=html.xpath('////span[@class="link_title"]/a/@href')
        for href in hrefs:
                print href

效果：
这里写图片描述

打印出所有文章url：

from lxml import etree
import requests

for i in range(1,23):   #23 is equal to pagelist-1
        #print(i)
        url='https://blog.csdn.net/qq_33160790/article/list/'+str(i)
        resp=requests.get(url)
        if resp.status_code==requests.codes.ok:
                html=etree.HTML(resp.text)
                hrefs=html.xpath('////span[@class="link_title"]/a/@href')
                for href in hrefs:
                        print href

这里写图片描述

刷csdn点击脚本：
PS：url和23结合实际修改

from lxml import etree
import requests
import urllib.request

for i in range(1,23):   #23 is equal to pagelist-1
        #print(i)
        url='https://blog.csdn.net/qq_33160790/article/list/'+str(i)
        resp=requests.get(url)
        if resp.status_code==requests.codes.ok:
                html=etree.HTML(resp.text)
                hrefs=html.xpath('////span[@class="link_title"]/a/@href')
                for href in hrefs:
                        print (href)
                        req=urllib.request.Request(href)
                        data=urllib.request.urlopen(req).read()

关键字：

上一篇： python中用logging实现日志滚

下一篇： Python 写的 Google Map



搜索

热门推荐

最新文章

博主信息

姓名：Run
职业：谜
邮箱：383697894@qq.com
定位：上海 · 松江

扫我打开

友情链接

百度 淘宝 腾讯 慕课网 CSDN 博客园 51cto博客