代码语言
.
CSharp
.
JS
Java
Asp.Net
C
MSSQL
PHP
Css
PLSQL
Python
Shell
EBS
ASP
Perl
ObjC
VB.Net
VBS
MYSQL
GO
Delphi
AS
DB2
Domino
Rails
ActionScript
Scala
代码分类
文件
系统
字符串
数据库
网络相关
图形/GUI
多媒体
算法
游戏
Jquery
Extjs
Android
HTML5
菜单
网页交互
WinForm
控件
企业应用
安全与加密
脚本/批处理
开放平台
其它
【
Python
】
JSON日志通过python处理并导入Hive
作者:
/ 发布于
2016/11/23
/
683
[spark@Master Py_logproc]$ pwd /home/spark/opt/Log_Data/Py_logproc [spark@Master Py_logproc]$ cat json2hive_python_recordasarray_basic.py # -*- encoding:utf-8 -*- #!/usr/bin/env python import sys sys.path.append('/home/spark/opt/hive-1.2.1/lib/py') from hive_service import ThriftHive from hive_service.ttypes import HiveServerException from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol import json import warnings warnings.filterwarnings("ignore") def hiveExe(sql): try: transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = ThriftHive.Client(protocol) transport.open() client.execute(sql) transport.close() except Thrift.TException, tx: print '%s' % (tx.message) if __name__=="__main__": import sys reload(sys) sys.setdefaultencoding( "utf-8" ) if len(sys.argv)==1: print "need argv" else: print sys.argv for json_array in open('/home/spark/opt/Log_Data/Py_logproc/log_tmpdir/yemaopythonlog'): yemao_array = json.loads(json_array) for yemao in yemao_array: print yemao['time'] if not yemao.has_key('_reason'): id = yemao['id'] time = yemao['time'] url_from = yemao['url_from'] url_current = yemao['url_current'] url_to = yemao['url_to'] options = yemao['options'] ip = yemao['ip'] uid = yemao['uid'] new_visitor = yemao['new_visitor'] province = yemao['province'] city = yemao['city'] site = yemao['site'] device = yemao['device'] browser = yemao['browser'] phone = yemao['phone'] token = yemao['token'] dorm = yemao['dorm'] order_phone = yemao['order_phone'] order_dormitory = yemao['order_dormitory'] order_amount = yemao['order_amount'] order_id = yemao['order_id'] uname = yemao['uname'] site_id = yemao['site_id'] address = yemao['address'] dorm_id = yemao['dorm_id'] dormentry_id = yemao['dormentry_id'] tag = yemao['tag'] rid = yemao['rid'] cart_quantity = yemao['cart_quantity'] response = yemao['response'] paytype = yemao['paytype'] if yemao.has_key('data'): data = yemao['data'] else: data = '0' data = '"'+str(data)+'"' if yemao.has_key('info'): info = yemao['info'] else: info = '0' if yemao.has_key('status'): status = yemao['status'] else: status = '0' log_date = int(sys.argv[1]) if __name__ == '__main__': insert_sql="insert into yemao_logpy(id,time,url_from,url_to,url_current,ip,dorm_id,browser,log_date) values ('%s', '%s', '%s','%s','%s','%s','%s','%s', %d)" % (id,time,url_from,url_current,url_to,ip,dorm_id,browser,log_date) print insert_sql hiveExe(insert_sql) print 'yemao_array_python2hive done' [spark@Master Py_logproc]$
试试其它关键字
同语言下
.
比较两个图片的相似度
.
过urllib2获取带有中文参数的url内容
.
不下载获取远程图片的宽度和高度及文件大小
.
通过qrcode库生成二维码
.
通过httplib发送GET和POST请求
.
Django下解决小文件下载
.
遍历windows的所有窗口并输出窗口标题
.
根据窗口标题调用窗口
.
python 抓取搜狗指定公众号
.
pandas读取指定列
可能有用的
.
比较两个图片的相似度
.
过urllib2获取带有中文参数的url内容
.
不下载获取远程图片的宽度和高度及文件大小
.
通过qrcode库生成二维码
.
通过httplib发送GET和POST请求
.
Django下解决小文件下载
.
遍历windows的所有窗口并输出窗口标题
.
根据窗口标题调用窗口
.
python 抓取搜狗指定公众号
.
pandas读取指定列
贡献的其它代码
Label
Copyright © 2004 - 2024 dezai.cn. All Rights Reserved
站长博客
粤ICP备13059550号-3