代码语言
.
CSharp
.
JS
Java
Asp.Net
C
MSSQL
PHP
Css
PLSQL
Python
Shell
EBS
ASP
Perl
ObjC
VB.Net
VBS
MYSQL
GO
Delphi
AS
DB2
Domino
Rails
ActionScript
Scala
代码分类
文件
系统
字符串
数据库
网络相关
图形/GUI
多媒体
算法
游戏
Jquery
Extjs
Android
HTML5
菜单
网页交互
WinForm
控件
企业应用
安全与加密
脚本/批处理
开放平台
其它
【
Sqoop
】
用SQOOP从SQLSERVER导入数据到(HDFS,HIVE,HBASE)
作者:
筱雨
/ 发布于
2017/3/9
/
700
假定: 用户database:InsideTSQL2008, tablename:Employees 注意:如果需要导入的table “No primary key could be found”, 则需要“specify one with --split-by or perform a quential import with '-m 1'.” 准备:从Microsoft官网下载jdbc驱动, 将sqljdbc_3.0/enu/下的sqljdbc4.jar和sqljdbc.jar包导入到$SQOOP_HOME/lib下。 下载地址:http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=2505 1. import all-tables ${Installation}/bin/sqoop import-all-tables\ --connect 'jdbc:sqlserver://172.16.102.30; username=sa;\ password=1qaz!QAZ; database=InsideTSQL2008' -m 1 \ --target-dir /user/grid/mydb/ 2. import some tables ${Installation}/bin/sqoop import --verbose \ --driver com.microsoft.sqlserver.jdbc.SQLServerDriver\ --connect 'jdbc:sqlserver://172.16.102.30; username=sa;\ password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees --target-dir /user/grid/Employees_db -m 1 3. list-tables $SQOOP_HOME/bin/sqoop list-tables \ --connect 'jdbc:sqlserver://172.16.102.30; username=sa; \ password=1qaz!QAZ; database=InsideTSQL2008' 4. create-hive-table sqoop import --warehouse-dir "/user/hive/MyWareHouse" \ --hive-overwrite --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa;password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees --hive-table tblFoo --hive-import -m 1 注意:导入hive以后因为Hive官方默认的配置问题,是看不到已导入的tables的,需要做如下修改: $HIVE_HOME/conf/hive-site.xml中修改配置如下: 修改一 将: <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> 修改为: <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=/home/grid/hive/metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> 注意:databaseName的地址为metastore_db存放的地方,默认在$HIVE_HOME/下面。 修改二 官方配置文档有错误,做如下修改: 将: <property> <name>hive.server2.thrift.sasl.qop</name> <value>auth</auth> 跟正为: <property> <name>hive.server2.thrift.sasl.qop</name> <value>auth</value> 修改以上两项后重启hive 即可。 5. list-databases sqoop list-databases --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008' 6. list-columns-of-a-table sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --table Employees --columns "empid, lastname, firstname, title"\ -m 1 7. Storing data in SequenceFiles, and setting the generated class name to com.foo.Employee sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --table Employees --class-name com.foo.Employee \ --as-sequencefile -m 1 8.Specifying the delimiters to use in a text-mode import sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees --fields-terminated-by ',' \ --lines-terminated-by '\n' --optionally-enclosed-by '\"' -m 9. import data like "select ... where..." and with user-defined-delimiters sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees --where "mgrid = 5" --fields-terminated-by ','\ --lines-terminated-by '\n' --optionally-enclosed-by '\"' -m 1 10. Changing the splitting column from the default(default is the primary key) sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --table Employees --where "mgrid = 5" --split-by empi\ --fields-terminated-by ',' --lines-terminated-by '\n' \ --optionally-enclosed-by '\"' 11. append data to an existed table qoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees --where "mgrid <> 5" --fields-terminated-by ','\ --lines-terminated-by '\n' --optionally-enclosed-by '\"' \ --split-by empid --append 12 uses validation to validate the import using the table row count and number of rows copied into HDFS sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --table Employees --validate -m 1 13. Query Imports sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --query "SELECT a.empid, a.lastname FROM Employees AS a WHERE\ a.empid > 2 AND \$CONDITIONS" -split-by a.empid \ --target-dir /user/grid/query_results 注意,--target-dir,-split-by(或m 1)不能少,$CONDITIONS必须有,且为保险起见加反斜杠; 程序执行时会将以上命令展开为: SELECT a.empid, a.lastname FROM Employees AS a WHERE a.empid > 2 AND (1 = 0) 14. delete-target-dir sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\ --table Employees -m 1 --delete-target-dir Delete the import target directory if it exists. 注意,不需要指定要删除哪个directory, “delete-target-dir”删除的就是需要import数据的那个directory 15. codegen sqoop codegen --connect 'jdbc:sqlserver://172.16.102.30;\ username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \ --table Employees 注意 codegen的argument, 不含delete-target-dir, 请参考codegen的common argument, codegen arguments, Output line formatting arguments, Input parsing arguments和Hive arguments 16. import tables with sqoop class in programs code from : http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.sqoop/sqoop/1.3.0-cdh3u1/com/cloudera/sqoop/manager/SQLServerManager.java package com.cloudera.sqoop.manager; import java.io.IOException; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import com.cloudera.sqoop.SqoopOptions; import com.cloudera.sqoop.mapreduce.JdbcExportJob; import com.cloudera.sqoop.mapreduce.SQLServerExportOutputFormat; import com.cloudera.sqoop.util.ExportException; //Manages connections to SQLServer databases. Requires the SQLServer JDBC driver. public class [More ...] SQLServerManager extends GenericJdbcManager { public static final Log LOG = LogFactory.getLog( SQLServerManager.class.getName()); // driver class to ensure is loaded when making db connection. private static final String DRIVER_CLASS = "com.microsoft.sqlserver.jdbc.SQLServerDriver"; public [More ...] SQLServerManager(final SqoopOptions opts) { super(DRIVER_CLASS, opts); } // Export data stored in HDFS into a table in a database. @Override public void [More ...] exportTable(ExportJobContext context) throws IOException, ExportException { context.setConnManager(this); JdbcExportJob exportJob = new JdbcExportJob(context, null, null, SQLServerExportOutputFormat.class); exportJob.runExport(); } // SQLServer does not support the CURRENT_TIMESTAMP() function. Instead it has the notion of keyword CURRENT_TIMESTAMP that resolves to the current time stamp for the database system. @Override public String [More ...] getCurTimestampQuery() { return "SELECT CURRENT_TIMESTAMP"; } }
试试其它关键字
同语言下
.
通过sqoop增量传送oracle数据到hive
.
列表oracle下的所有表
.
rdbms data to hdfs
.
sqoop command –help
.
sqoop help
.
重写已经存在hive表的数据
.
自动建立hive表,并且导入数据到相应目录
.
删除已存在表
.
测试连接
.
用SQOOP从SQLSERVER导入数据到(HDFS,HIVE,HBASE)
可能有用的
.
通过sqoop增量传送oracle数据到hive
.
列表oracle下的所有表
.
rdbms data to hdfs
.
sqoop command –help
.
sqoop help
.
重写已经存在hive表的数据
.
自动建立hive表,并且导入数据到相应目录
.
删除已存在表
.
测试连接
.
用SQOOP从SQLSERVER导入数据到(HDFS,HIVE,HBASE)
筱雨
贡献的其它代码
(
28
)
.
查看存储过程修改时间,最近执行时间
.
json下钻函数,解决多层嵌套问题
.
向上公告走马灯
.
JAVA发送HttpClient请求及接收请求结果过程
.
根据图片在ftp路径获取InputStream流
.
断点续传(springMvc),可支持html5在线播放
.
IE提交表单记录历史,点击返回信息仍在。
.
用SQOOP从SQLSERVER导入数据到(HDFS,HIVE,HBASE)
.
泛型KMP算法
.
iframe框架根据内容自动伸缩高度
Copyright © 2004 - 2024 dezai.cn. All Rights Reserved
站长博客
粤ICP备13059550号-3