python課第16天網路爬蟲

今天上課像講了很多，又像什麼都沒講
之前都在補充知識與介紹功能，今天開始做搜尋的感覺

擷取.JPG

soup.prettify()用這個功能，能把網址解析成HTML格式

以代理伺服器的方式連線
import urllib3
# proxy = urllib3.ProxyManager('http://75.89.101.60:80', headers={'connection': 'keep-alive'})
proxy = urllib3.ProxyManager('http://210.201.86.72:8080')
resp = proxy.request('get', 'http://httpbin.org/ip')
print(resp.data)

YAHOO會擋爬蟲以虛擬瀏覽器規避
import urllib3
headers={"User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"}
http=urllib3.PoolManager()
r=http.request('GET','https://tw.yahoo.com',headers=headers)
print(r.status)
y=r.data.decode('utf-8')
print(y)

ky0dd

阿京小站

ky0dd 發表在痞客邦留言(0) 人氣( 5 )

全站分類：進修深造
個人分類：待業職訓日誌

▲top

請先登入以發表留言。

阿京小站

日語心理儀隊工作的學習札記

訂閱日本新知~

Kyo

音樂

參觀人氣

阿京小站

日語 心理 儀隊 工作的學習札記