sqoop

常用命令

命令	类	说明
import	ImportTool	将数据导入到集群
export	ExportTool	将集群数据导出
codegen	CodeGenTool	获取数据库中某张表数据生成Java并打包Jar
create-hive-table	CreateHiveTableTool	创建Hive表
eval	EvalSqlTool	查看SQL执行结果
import-all-tables	ImportAllTablesTool	导入某个数据库下所有表到HDFS中
job	JobTool	用来生成一个sqoop的任务，生成后，该任务并不执行，除非使用命令执行该任务。
list-databases	ListDatabasesTool	列出所有数据库名
list-tables	ListTablesTool	列出某个数据库下所有表
merge	MergeTool	将HDFS中不同目录下面的数据合在一起，并存放在指定的目录中
metastore	MetastoreTool	记录sqoop job的元数据信息，如果不启动metastore实例，则默认的元数据存储目录为：.sqoop，如果要更改存储目录，可以在配置文件sqoop-site.xml中进行更改。
help	HelpTool	打印sqoop帮助信息	~/
version	VersionTool	打印sqoop版本信息

命令对应的参数

import 对应参数

公共参数

参数	说明
–charset-from
–charset-to
–connect	指定 JDBC 连接的 uri
–connection-manager	指定要使用的连接管理类
–connection-param-file	Specify connection parameters file
–driver	Manually specify JDBC driver class to use
–g200
–g200-connect
–g200-database
–g200-password
–g200-schema
–g200-table
–g200-username
–hadoop-home	Override $HADOOP_MAPRED_HOME_ARG
–hadoop-mapred-home	Override $HADOOP_MAPRED_HOME_ARG
–help	打印帮助信息
–khan
–khan-connect
–khan-database
–khan-password
–khan-schema
–khan-table
–khan-username
–mongodb
–mongodb-connect
–mongodb-override
-P	从控制台输入密码
–password	连接数据库的密码
–password-file	Set authentication password file path
–relaxed-isolation	Use read-uncommitted isolation for imports
–skip-dist-cache	Skip copying jars to distributed cache
–username	连接数据库的用户名
–verbose	在控制台打印出详细信息

import控制参数

参数	说明
–append	将数据追加到hdfs中已经存在的dataset中。使用该参数，sqoop将把数据先导入到一个临时目录中，然后重新给文件命名到一个正式的目录中，以避免和该目录中已存在的文件重名。
–as-avrodatafile	将数据导入到一个Avro数据文件中
–as-sequencefile	将数据导入到一个sequence文件中
–as-textfile	将数据导入到一个普通文本文件中，生成该文本文件后，可以在hive中通过sql语句查询出结果。
–boundary-query	边界查询，也就是在导入前先通过SQL查询得到一个结果集，然后导入的数据就是该结果集内的数据，格式如：–boundary-query ‘select id,creationdate from person where id = 3’，表示导入的数据为id=3的记录，或者select min(), max() from <table name>，注意查询的字段中不能有数据类型为字符串的字段，否则会报错：java.sql.SQLException: Invalid value for getLong() 目前问题原因还未知
–columns <col,col,col…>	指定要导入的字段值，格式如：–columns id,username
–compression-codec	Hadoop压缩编码，默认是gzip
–delete-target-dir	Imports data in delete mode
–direct	快速模式，利用了数据库的导入工具，如mysql的mysqlimport，可以比jdbc连接的方式更为高效的将数据导入到关系数据库中。
–direct-split-size	在使用上面direct直接导入的基础上，对导入的流按字节数分块，特别是使用直连模式从PostgreSQL导入数据的时候，可以将一个到达设定大小的文件分为几个独立的文件。
-e,–query	从查询结果中导入数据，该参数使用时必须指定–target-dir、–hive-table，在查询语句中一定要有where条件且在where条件中需要包含$CONDITIONS，示例：–query ‘select * from person where $CONDITIONS ‘ –target-dir
–fetch-size	Set number ‘n’ of rows to fetch from the database when more rows are needed
–inline-lob-limit	设定大对象数据类型的最大值
-m,–num-mappers	启动N个map来并行导入数据，默认是4个，最好不要将数字设置为高于集群的最大Map数
–mapreduce-job-name	Set name for generated mapreduce job
–merge-key	Key column to use to join results
–split-by	表的列名，用来切分工作单元，一般后面跟主键ID
–table	指定关系数据库的表名
–target-dir	指定hdfs路径
–validate	Validate the copy using the configured validator
–validation-failurehandler	Fully qualified class name for ValidationFa ilureHandler
–validation-threshold	Fully qualified class name for ValidationTh reshold
–validator	Fully qualified class name for the Validator
–warehouse-dir	与–target-dir不能同时使用，指定数据导入的存放目录，适用于hdfs导入，不适合导入hive目录
–where	从关系数据库导入数据时的查询条件，示例：–where ‘id = 2′
-z,–compress	压缩参数，默认情况下数据是没被压缩的，通过该参数可以使用gzip压缩算法对数据进行压缩，适用于SequenceFile, text文本文件, 和Avro文件

增量导入参数

参数	说明
–check-column	Source column to check for incremental change
–incremental	Define an incremental import of type ‘append’ or ‘lastmodified’
–last-value	Last imported value in the incremental check column

输出行格式参数

参数	说明
–enclosed-by	给字段值前加上指定的字符
–escaped-by	对字段中的双引号加转义符
–fields-terminated-by	设定每个字段是以什么符号作为结束，默认为逗号
–lines-terminated-by	设定每行记录之间的分隔符，默认是\n
–mysql-delimiters	Mysql默认的分隔符设置，字段之间以逗号分隔，行之间以\n分隔，默认转义符是\，字段值以单引号包裹。
–optionally-enclosed-by	给带有双引号或单引号的字段值前后加上指定字符。

输入解析参数

参数	说明
–input-enclosed-by	对字段值前后加上指定字符
–input-escaped-by	对含有转移符的字段做转义处理
–input-fields-terminated-by	字段之间的分隔符
–input-lines-terminated-by	行之间的分隔符
–input-optionally-enclosed-by	给带有双引号或单引号的字段前后加上指定字符

Hive 参数

参数	说明
–create-hive-table	默认是false，即，如果目标表已经存在了，那么创建任务失败。
–hive-database	Sets the database name to use when importing to hive
–hive-delims-replacement	用自定义的字符串替换掉数据中的\r\n和\013 \010等字符
–hive-drop-import-delims	在导入数据到hive时，去掉数据中的\r\n\013\010这样的字符
–hive-home	hive的安装目录，可以通过该参数覆盖之前默认配置的目录
–hive-import	将数据从关系数据库中导入到hive表中
–hive-overwrite	覆盖掉在hive表中已经存在的数据
–hive-partition-key	创建分区，后面直接跟分区名，分区字段的默认类型为string
–hive-partition-value	导入数据时，指定某个分区的值
–hive-table	后面接要创建的hive表,默认使用MySQL的表名
–map-column-hive	生成hive表时，可以更改生成字段的数据类型
–table	指定关系数据库的表名

HBase 参数

参数	说明
–column-family	Sets the target column family for the import
–hbase-bulkload	Enables HBase bulk loading
–hbase-create-table	If specified, create missing HBase tables
–hbase-row-key <col>	Specifies which input column to use as the row key
–hbase-table <table>	Import to <table> in HBase

HCatalog 参数

参数	说明
–hcatalog-database	HCatalog 数据库名称
–hcatalog-home	覆盖系统默认的 $HCAT_HOME
–hcatalog-partition-keys	Sets the partition keys to use when importing to hive
–hcatalog-partition-values	Sets the partition values to use when importing to hive
–hcatalog-table	HCatalog table name
–hive-home	hive的安装目录，可以通过该参数覆盖之前默认配置的目录
–hive-partition-key	创建分区，后面直接跟分区名，分区字段的默认类型为string
–hive-partition-value	导入数据时，指定某个分区的值
–map-column-hive	生成hive表时，可以更改生成字段的数据类型

HCatalog 导入特定选项

参数	说明
–create-hcatalog-table	Create HCatalog before import
–hcatalog-storage-stanza	HCatalog storage stanza for table creation

Accumulo 参数

参数	说明
–accumulo-batch-size	Batch size in bytes
–accumulo-column-family	Sets the target column family for the import
–accumulo-create-table	If specified, create missing Accumulo tables
–accumulo-instance	Accumulo instance name.
–accumulo-max-latency	Max write latency in milliseconds
–accumulo-password	Accumulo password.
–accumulo-row-key <col>	Specifies which input column to use as the row key
–accumulo-table <table>	Import to <table> in Accumulo
–accumulo-user	Accumulo user name.
–accumulo-visibility	Visibility token to be applied to all rows imported
–accumulo-zookeepers	Comma-separated list of zookeepers (host:port)

代码生成参数

参数	说明
–bindir	指定生成的java文件、编译成的class文件及将生成文件打包为JAR的JAR包文件输出路径
–class-name	设定生成的Java文件指定的名称. This overrides –package-name. When combined with –jar-file, sets the input class.
–input-null-non-string	在生成的java文件中，可以将null非字符串类型设为想要设定的值（比如空字符串’’）
–input-null-string	在生成的java文件中，可以将null字符串设为想要设定的值（比如空字符串’’）
–jar-file	Disable code generation; use specified jar
–map-column-java	数据库字段在生成的java文件中会映射为各种属性，且默认的数据类型与数据库类型保持对应，比如数据库中某字段的类型为bigint，则在Java文件中的数据类型为long型，通过这个属性，可以改变数据库字段在java中映射的数据类型，格式如：–map-column-java DB_ID=String,id=Integer
–null-non-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null非字符串类型设置为其它值如ddd，TBL_ID==null?”ddd”:””
–null-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null字符串设置为其它值如ddd，TBL_ID==null?”ddd”:””
–outdir	生成的java文件存放路径
–package-name	放置自动生成的 classes 到这个包，如cn.cnnic，则会生成cn和cnnic两级目录，生成的文件（如java文件）就存放在cnnic目录里

export 对应参数

公共参数

参数	说明
–charset-from
–charset-to
–connect	连接关系型数据库的URL
–connection-manager	指定要使用的连接管理类
–connection-param-file	Specify connection parameters file
–driver	Manually specify JDBC driver class to use
–g200
–g200-connect
–g200-database
–g200-password
–g200-schema
–g200-table
–g200-username
–hadoop-home	Override $HADOOP_MAPRED_HOME_ARG
–hadoop-mapred-home	Override $HADOOP_MAPRED_HOME_ARG
–help	打印帮助信息
–khan
–khan-connect
–khan-database
–khan-password
–khan-schema
–khan-table
–khan-username
–mongodb
–mongodb-connect
–mongodb-override
-P	Read password from console
–password	连接数据库的密码
–password-file	Set authentication password file path
–relaxed-isolation	Use read-uncommitted isolation for imports
–skip-dist-cache	Skip copying jars to distributed cache
–username	连接数据库的用户名
–verbose	在控制台打印出详细信息

导出控制参数

参数	说明
–batch	该模式用于执行基本语句（暂时还不太清楚含义）
–call	Populate the table using this stored procedure (one call per row)
–clear-staging-table	如果该staging-table非空，则通过该参数可以在运行导入前清除staging-table里的数据。
–columns <col,col,col…>	指定要导入的字段值，格式如：–columns id,username
–direct	快速模式，利用了数据库的导入工具，如mysql的mysqlimport，可以比jdbc连接的方式更为高效的将数据导入到关系数据库中。
–export-dir	存放数据的HDFS的源目录
-m,–num-mappers	启动N个map来并行导入数据，默认是4个，最好不要将数字设置为高于集群的最大Map数
–mapreduce-job-name	Set name for generated mapreduce job
–staging-table	该参数是用来保证在数据导入关系数据库表的过程中事务安全性的，因为在导入的过程中可能会有多个事务，那么一个事务失败会影响到其它事务，比如导入的数据会出现错误或出现重复的记录等等情况，那么通过该参数可以避免这种情况。创建一个与导入目标表同样的数据结构，保留该表为空在运行数据导入前，所有事务会将结果先存放在该表中，然后最后由该表通过一次事务将结果写入到目标表中。
–table	指定关系数据库的表名
–truncate-table	Indicates that any data in staging table can be deleted
–update-key	后面接条件列名，通过该参数，可以将关系数据库中已经存在的数据进行更新操作，类似于关系数据库中的update操作
–update-mode	更新模式，有两个值updateonly和默认的allowinsert，该参数只能是在关系数据表里不存在要导入的记录时才能使用，比如要导入的hdfs中有一条id=1的记录，如果在表里已经有一条记录id=2，那么更新会失败。
–validate	Validate the copy using the configured validator
–validation-failurehandler	Fully qualified class name for ValidationFa ilureHandler
–validation-threshold	Fully qualified class name for ValidationTh reshold
–validator	Fully qualified class name for the Validator

输入解析参数

参数	说明
–input-enclosed-by	对字段值前后加上指定字符
–input-escaped-by	对含有转移符的字段做转义处理
–input-fields-terminated-by	字段之间的分隔符
–input-lines-terminated-by	行之间的分隔符
–input-optionally-enclosed-by	给带有双引号或单引号的字段前后加上指定字符

输出行格式参数

参数	说明
–enclosed-by	给字段值前加上指定的字符
–escaped-by	对字段中的双引号加转义符
–fields-terminated-by	设定每个字段是以什么符号作为结束，默认为逗号
–lines-terminated-by	设定每行记录之间的分隔符，默认是\n
–mysql-delimiters	Mysql默认的分隔符设置，字段之间以逗号分隔，行之间以\n分隔，默认转义符是\，字段值以单引号包裹。
–optionally-enclosed-by	给带有双引号或单引号的字段值前后加上指定字符。

代码生成参数

参数	说明
–bindir	指定生成的java文件、编译成的class文件及将生成文件打包为JAR的JAR包文件输出路径
–class-name	设定生成的Java文件指定的名称. This overrides –package-name. When combined with –jar-file, sets the input class.
–input-null-non-string	在生成的java文件中，可以将null非字符串类型设为想要设定的值（比如空字符串’’）
–input-null-string	在生成的java文件中，可以将null字符串设为想要设定的值（比如空字符串’’）
–jar-file	Disable code generation; use specified jar
–map-column-java	数据库字段在生成的java文件中会映射为各种属性，且默认的数据类型与数据库类型保持对应，比如数据库中某字段的类型为bigint，则在Java文件中的数据类型为long型，通过这个属性，可以改变数据库字段在java中映射的数据类型，格式如：–map-column-java DB_ID=String,id=Integer
–null-non-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null非字符串类型设置为其它值如ddd，TBL_ID==null?”ddd”:””
–null-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null字符串设置为其它值如ddd，TBL_ID==null?”ddd”:””
–outdir	生成的java文件存放路径
–package-name	放置自动生成的 classes 到这个包，如cn.cnnic，则会生成cn和cnnic两级目录，生成的文件（如java文件）就存放在cnnic目录里

HCatalog 参数

参数	说明
–hcatalog-database	HCatalog 数据库名称
–hcatalog-home	覆盖系统默认的 $HCAT_HOME
–hcatalog-partition-keys	Sets the partition keys to use when importing to hive
–hcatalog-partition-values	Sets the partition values to use when importing to hive
–hcatalog-table	HCatalog table name
–hive-home	hive的安装目录，可以通过该参数覆盖之前默认配置的目录
–hive-partition-key	创建分区，后面直接跟分区名，分区字段的默认类型为string
–hive-partition-value	导入数据时，指定某个分区的值
–map-column-hive	生成hive表时，可以更改生成字段的数据类型

codegen 对应参数

公共参数

参数	说明
–charset-from
–charset-to
–connect	Specify JDBC connect string
–connection-manager	Specify connection manager class name
–connection-param-file	Specify connection parameters file
–driver	Manually specify JDBC driver class to use
–g200
–g200-connect
–g200-database
–g200-password
–g200-schema
–g200-table
–g200-username
–hadoop-home	Override $HADOOP_MAPRED_HOME_ARG
–hadoop-mapred-home	Override $HADOOP_MAPRED_HOME_ARG
–help	Print usage instructions
–khan
–khan-connect
–khan-database
–khan-password
–khan-schema
–khan-table
–khan-username
–mongodb
–mongodb-connect
–mongodb-override
-P	Read password from console
–password	Set authentication password
–password-file	Set authentication password file path
–relaxed-isolation	Use read-uncommitted isolation for imports
–skip-dist-cache	Skip copying jars to distributed cache
–username	Set authentication username
–verbose	Print more information while working

代码生成参数

参数	说明
–bindir	指定生成的java文件、编译成的class文件及将生成文件打包为JAR的JAR包文件输出路径
–class-name	设定生成的Java文件指定的名称. This overrides –package-name. When combined with –jar-file, sets the input class.
-e,–query	从查询结果中导入数据，该参数使用时必须指定–target-dir、–hive-table，在查询语句中一定要有where条件且在where条件中需要包含$CONDITIONS，示例：–query ‘select * from person where $CONDITIONS ‘ –target-dir
–input-null-non-string	在生成的java文件中，可以将null非字符串类型设为想要设定的值（比如空字符串’’）
–input-null-string	在生成的java文件中，可以将null字符串设为想要设定的值（比如空字符串’’）
–map-column-java	数据库字段在生成的java文件中会映射为各种属性，且默认的数据类型与数据库类型保持对应，比如数据库中某字段的类型为bigint，则在Java文件中的数据类型为long型，通过这个属性，可以改变数据库字段在java中映射的数据类型，格式如：–map-column-java DB_ID=String,id=Integer
–null-non-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null非字符串类型设置为其它值如ddd，TBL_ID==null?”ddd”:””
–null-string	在生成的java文件中，比如TBL_ID==null?”null”:””，通过这个属性设置可以将null字符串设置为其它值如ddd，TBL_ID==null?”ddd”:””
–outdir	生成的java文件存放路径
–package-name	放置自动生成的 classes 到这个包，如cn.cnnic，则会生成cn和cnnic两级目录，生成的文件（如java文件）就存放在cnnic目录里
–table	Table to generate code for

输出行格式参数

参数	说明
–enclosed-by	Sets a required field enclosing character
–escaped-by	Sets the escape character
–fields-terminated-by	Sets the field separator character
–lines-terminated-by	Sets the end-of-line character
–mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘
–optionally-enclosed-by	Sets a field enclosing character

输入解析参数

参数	说明
–input-enclosed-by	Sets a required field encloser
–input-escaped-by	Sets the input escape character
–input-fields-terminated-by	Sets the input field separator
–input-lines-terminated-by	Sets the input end-of-line char
–input-optionally-enclosed-by	Sets a field enclosing character

Hive 参数

参数	说明
–create-hive-table	Fail if the target hive table exists
–hive-database	Sets the database name to use when importing to hive
–hive-delims-replacement	Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string
–hive-drop-import-delims	Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields
–hive-home	Override $HIVE_HOME
–hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
–hive-overwrite	Overwrite existing data in the Hive table
–hive-partition-key	Sets the partition key to use when importing to hive
–hive-partition-value	Sets the partition value to use when importing to hive
–hive-table	Sets the table name to use when importing to hive
–map-column-hive	Override mapping for specific column to hive types.

HCatalog 参数

参数	说明
–hcatalog-database	HCatalog 数据库名称
–hcatalog-home	覆盖系统默认的 $HCAT_HOME
–hcatalog-partition-keys	Sets the partition keys to use when importing to hive
–hcatalog-partition-values	Sets the partition values to use when importing to hive
–hcatalog-table	HCatalog table name
–hive-home	Override $HIVE_HOME
–hive-partition-key	Sets the partition key to use when importing to hive
–hive-partition-value	Sets the partition value to use when importing to hive
–map-column-hive	Override mapping for specific column to hive types.

sqoop

常用命令

命令对应的参数

import 对应参数

公共参数

import控制参数

增量导入参数

输出行格式参数

输入解析参数

Hive 参数

HBase 参数

HCatalog 参数

HCatalog 导入特定选项

Accumulo 参数

代码生成参数

export 对应参数

公共参数

导出控制参数

输入解析参数

输出行格式参数

代码生成参数

HCatalog 参数

codegen 对应参数

公共参数

代码生成参数

输出行格式参数

输入解析参数

Hive 参数

HCatalog 参数

目录