采集代理IP并验证是否有效

本来之前朋友也写过,但是他找不到了,无奈,只有自己动手了。

这次是用py2写的,采集的数据来源地址为http://www.xicidaili.com/

程序会自动采集IP和端口,并且验证是否有效,如果有效,会保存到Good.txt文件中

由于是免费的,所以好多都用不了,不过自己用还是足够了

运行效果图如下:

运行效果图

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#encoding:utf-8
import urllib2
import re
from bs4 import BeautifulSoup
import urllib
import socket
import multiprocessing


def GetProxy(page):
url = 'http://www.xicidaili.com/wt/%d' % page
headers = { 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0' }
req = urllib2.Request(url, headers=headers)
res = urllib2.urlopen(req)
html = res.read()
html = html.decode("UTF-8")
soup = BeautifulSoup(html, 'html.parser')
trs = soup.find_all('tr')
items = []
for i in range(1, len(trs)):
try:
tds = trs[i].find_all("td")
if len(tds) == 10:
Proxy = tds[1].get_text() + ':' + tds[2].get_text()
verif(Proxy)
except:
print '出错了'

def verif(proxy):
socket.setdefaulttimeout(3)
proxys = []
url = "http://ip.chinaz.com/getip.aspx"
good_list = open("good.txt",'a')
proxy_host = "http://" + proxy
proxy_temp = {"http":proxy_host}
try:
res = urllib.urlopen(url,proxies=proxy_temp).read()
res_unicode = res.decode("utf-8")
print res_unicode.encode("gbk")
good_list.write(proxy + "\n")
good_list.close()
except:
print proxy + '————当前代理无效'.decode("utf-8")


if __name__ == '__main__':
text = '''
____ _ _ _
| _ \ _ (_) | | | |
| |_) | _ _ (_) _ ___ ___ | |__| |
| _ < | | | | | | / __|/ _ \| __ |
| |_) || |_| | _ | || (__| __/| | | |
|____/ \__, |(_)|_| \___|\___||_| |_|
__/ |
|___/

数据来源:http://www.xicidaili.com/nn/
'''
print('*' * 50)
print(text + '\033[0m\n')
print('*' * 50)
pool = multiprocessing.Pool()
pool.map(GetProxy, [i for i in range(1, 1000)])
-------------本文结束感谢您的阅读-------------