代码语言
.
CSharp
.
JS
Java
Asp.Net
C
MSSQL
PHP
Css
PLSQL
Python
Shell
EBS
ASP
Perl
ObjC
VB.Net
VBS
MYSQL
GO
Delphi
AS
DB2
Domino
Rails
ActionScript
Scala
代码分类
文件
系统
字符串
数据库
网络相关
图形/GUI
多媒体
算法
游戏
Jquery
Extjs
Android
HTML5
菜单
网页交互
WinForm
控件
企业应用
安全与加密
脚本/批处理
开放平台
其它
【
Python
】
扫描代理
作者:
pengyouya123
/ 发布于
2014/6/17
/
1027
不适用于windows。linux,osx均可,用于批量获取与验证代理,有些函数在外部定义没有给出,自行替换
# -*- coding: utf-8 -*- # Desc: Grab proxy ip # Date: 2014/06/13 import os import urllib from util import * from bs4 import BeautifulSoup log = getUniqueLog() regions = ['China', 'America', 'Brazil', 'Japan', 'Twaiwan', 'Thailand', 'Bahrein'] baseUrl = 'http://www.proxy360.cn/Region/' # if bypass in limit time return True def ping(ip): cmd = "ping -Q -c 1 -W 2000 %s 1>/dev/null 2>&1" % ip response = os.system(cmd) if response == 0: return True else: return False def filterIp(fname): result = [] with open(fname) as fp: for line in fp: line = line.strip() if (line is None) or (line == ''): continue segments = line.split(' ') ip = segments[0] if ping(ip): result.append(ip) print result class Proxy(): def __init__(self, address, port, hideprop, country, pubdate ): self.address = address self.port = port self.country = country self.pubdate = pubdate self.hideprop = hideprop def valid(self): return ping(self.address) def convertDict(self): return {'address':self.address, 'port':self.port, 'country':self.country, 'pubdate':self.pubdate, 'hideprop':self.hideprop} def initWithDict(proxyDict): return Proxy(proxyDict['address'], proxyDict['port'], proxyDict['country'], proxyDict['hideprop'], proxyDict['pubdate']) def __str__(self): # return self.address + self.port + self.country + self.hideprop + self.pubdate return '%s:%s %s %s %s' % (self.address, self.port, self.country, self.hideprop, self.pubdate) # proxy360 def fetchProxies(region='China'): print '[Region: %s]' % region url = baseUrl + urllib.quote(region) print 'Fetching page ...' page = fetchPage(url) if page is None: print 'Fetch page failed' return print 'Analysising html ...' soup = BeautifulSoup(page) if soup is None: print 'parse html failed' return try: nodes = soup.find_all('div', class_ ='proxylistitem') validProxies = [] total = len(nodes) cnt = 0 validCnt = 0 for node in nodes: cnt += 1 print 'Dealing with (%d/%d:%d)th item ...' % (cnt, total, validCnt) proxyItems = node.find_all('span', class_ = 'tbBottomLine') proxy = Proxy(proxyItems[0].text.strip(), proxyItems[1].text.strip(), proxyItems[2].text.strip(), proxyItems[3].text.strip(), proxyItems[4].text.strip()) print proxy if (proxy.valid()): validProxies.append(proxy.convertDict()) validCnt += 1 except Exception as e: print e finally: print 'Save proxies ...' if validProxies and (len(validProxies)>0): saveJson(validProxies, 'proxy_' + region.lower() + '.json') print 'Congratulation!' # end fetch proxy # proxy360.json def testProxies(fname): with open(fname) as fp: data = json.load(fp) result = [] for item in data: try: ip = item['address'] port = item['port'] print ip socket = '%s:%s' % (ip, port) proxy_handler = urllib2.ProxyHandler({'http':socket}) proxy_auth_handler = urllib2.ProxyBasicAuthHandler() opener = urllib2.build_opener(proxy_handler, proxy_auth_handler) print opener.open('http://20140507.ip138.com/ic.asp', timeout=3).read() result.append(ip) except Exception as e: #print e pass print result # bypass def checkProxy(ip, port): try: print ip, port socket = '%s:%s' % (ip, port) proxy_handler = urllib2.ProxyHandler({'http':socket}) proxy_auth_handler = urllib2.ProxyBasicAuthHandler() opener = urllib2.build_opener(proxy_handler, proxy_auth_handler) #print opener.open('http://20140507.ip138.com/ic.asp', timeout=3).read() print opener.open('http://www.twitter.com', timeout=4).read() return True except Exception as e: return False return True # load from file: ip:port def loadProxies(fname): with open(fname) as fp: result = [] for line in fp: line = line.strip() if line == '': continue segments = line.split(':') ip = segments[0] port = segments[1] if checkProxy(ip, port): result.append({ip:port}) print result if __name__ == '__main__': timer = Timer() setupOpener() fetchProxies('America') #loadProxies('proxy2.txt') timer.stop()
试试其它关键字
扫描代理
同语言下
.
比较两个图片的相似度
.
过urllib2获取带有中文参数的url内容
.
不下载获取远程图片的宽度和高度及文件大小
.
通过qrcode库生成二维码
.
通过httplib发送GET和POST请求
.
Django下解决小文件下载
.
遍历windows的所有窗口并输出窗口标题
.
根据窗口标题调用窗口
.
python 抓取搜狗指定公众号
.
pandas读取指定列
可能有用的
.
C#实现的html内容截取
.
List 切割成几份 工具类
.
SQL查询 多列合并成一行用逗号隔开
.
一行一行读取txt的内容
.
C#动态修改文件夹名称(FSO实现,不移动文件)
.
c# 移动文件或文件夹
.
c#图片添加水印
.
Java PDF转换成图片并输出给前台展示
.
网站后台修改图片尺寸代码
.
处理大图片在缩略图时的展示
pengyouya123
贡献的其它代码
(
1
)
.
扫描代理
Copyright © 2004 - 2024 dezai.cn. All Rights Reserved
站长博客
粤ICP备13059550号-3