HDFS跨集群数据迁移

迁移仅hdfs数据至新集群

hadoop distcp -skipcrccheck -update hdfs://xxx.xxx.xxx.xxx:8020/user/risk hdfs://xxx.xxx.xxx.xxx:8020/user/
  # 选项
  # -skipcrccheck 因本次迁移涉及低版本迁移高版本, 如果Hadoop版本则不需要
  # -update 增量更新, 通过名称和大小比较,源与目标不同则更新
  # -m 表示启用map的最大数量
  # -delete:删除已经存在的目标文件，不会删除源文件。这个删除是通过FS Shell实现的。所以如果垃圾回收机制启动的话，删除的目标文件会进入trash。
  # -i：忽略失败。这个选项会比默认情况提供关于拷贝的更精确的统计， 同时它还将保留失败拷贝操作的日志，这些日志信息可以用于调试。最后，如果一个map失败了，但并没完成所有分块任务的尝试，这不会导致整个作业的失败。
  # -overwrite：覆盖目标。如果一个map失败并且没有使用-i选项，不仅仅那些拷贝失败的文件，这个分块任务中的所有文件都会被重新拷贝。 所以这就是为什么要使用-i参数。
  # -su <user,paswd>  -du <user,paswd>  指定不同集群的用户名密码
  # -p[rbugp]       Preserve status
  #                        r: replication number
  #                        b: block size
  #                        u: user
  #                        g: group
  #                        p: permission
  #                        -p alone is equivalent to -prbugp
  # -log <logdir>          Write logs to <logdir>
  # -f <urilist_uri>       Use list at <urilist_uri> as src list
  # -filelimit <n>         Limit the total number of files to be <= n
  # -sizelimit <n>         Limit the total size to be <= n bytes
  # -mapredSslConf <f>     Filename of SSL configuration for mapper task 

迁移仅hive metastore数据至新集群

数据导出(mysql导出)

mysqldump -u root -p’密码’--skip-lock-tables -h xxx.xxx.xxx.xxx hive > mysql_hive.sql
mysqldump -uroot -p --database hive > mysql_hive_data.sql # (我的环境操作)

数据导入(mysql导入)

mysql -u root -proot --default-character-set=utf8 hvie < mysql_hive.sql
mysql -uroot -p < mysql_data.sql # (我的环境操作)

升级hive内容库(如果hive版本需要升级操作，同版本不需要操作)

mysql -uroot -proot risk -hxxx.xxx.xxx.xxx < mysqlupgrade-0.13.0-to-0.14.0.mysql.sql
mysql -uroot -proot risk -hxxx.xxx.xxx.xxx < mysqlupgrade-0.14.0-to-1.1.0.mysql.sql

版本要依据版本序列升序升级,不可跨越版本，如当前是hive0.12打算升级到0.14，需要先升级到0.13再升级到0.14

修改 metastore 内容库的集群信息（重要）

因为夸集群，hdfs访问的名字可能变化了，所以需要修改下hive库中的表DBS和SDS内容，除非你的集群名字或者HA的名字跟之前的一致这个就不用修改了

update DBS set DB_LOCATION_URI = replace(DB_LOCATION_URI,'hdfs://源1','hdfs://目标') ;
update SDS set LOCATION = replace(LOCATION ,'hdfs://ns2','hdfs://adhoc') ;

如果操作，我这里需要将hdfs://HACluster修改为hdfs://HACluster_New，我为了操作简单，新集群HA起了同样的名字hdfs://HACluster