Hadoop生態組件Hive,Sqoop安裝及Sqoop從HDFS/hive抽取數據到關系型數據庫Mysql
一般Hive依賴關系型數據庫Mysql,故先安裝Mysql
$: yum install mysql-server mysql-client [yum安裝]
$: /etc/init.d/mysqld start [啟動mysql服務]
$: mysql [登錄mysql客戶端]
mysql> create database hive;
安裝配置Hive
$: tar zvxf apache-hive-2.1.1-bin.tar
1. 配置環境變量
export HIVE_HOME=/usr/local/apache-hive-2.1.1-bin
export PATH=$PATH:${HIVE_HOME}/bin
2. 配置Hive的基本信息
$: cd /home/hadoop/apache-hive-2.1.1-bin/conf
$: cp hive-default.xml.template hive-site.xml #默認配置
$: cp hive-env.sh.template hive-env.sh #環境配置文件
$: cp hive-exec-log4j.properties.template hive-exec-log4j.properties #exec默認配置
$: cp hive-log4j.properties.template hive-log4j.properties #log默認配置
3. 編輯hive-env.sh,為了方便,直接在最后加上以下信息:
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
export HIVE_HOME=/home/hadoop/apache-hive-2.1.1-bin
export HIVE_CONF_DIR=/home/hadoop/apache-hive-2.1.1-bin/conf
hive-site.xml配置,這個文件較大,只配置name和以下對應的即可,其他信息可以不用管
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/metastore?createDatabaseIfNotExist=true</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
這里主要配置與mysql的驅動連接等,類似java的JDBC,一定要保證配置正確
4. Hive是將HDFS作為數據的文件系統,所以需要創建一些存儲目錄并賦權限
hadoop fs -mkdir /home/hive/log
hadoop fs -mkdir /home/hive/warehouse
hadoop fs -mkdir /home/hive/tmp
hadoop fs -chmod g+w /home/hive/log
hadoop fs -chmod g+w /home/hive/warehouse
hadoop fs -chmod g+w /home/hive/tmp
5. 將JDBC 驅動 mysql-connect-java-xxx.jar 復制至$HIVE_HOME/lib目錄下
6. 初始化數據庫
schematool -initSchema -dbType mysql
在這一步通常會報錯,
[root@slave1 bin]# schematool -initSchema -dbType mysql
which: no hbase in (/home/hadoop/sqoop-1.4.6/bin:/home/hadoop/apache-hive-2.1.1-bin/bin:/home/hadoop/hadoop-2.7.3/bin:/home/hadoop/jdk1.8.0_144/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
總結了一下,只要按照以上步驟配置,一般報錯都是在hive-site.xml配置時所配置的mysql信息和實際的不對應,所以會報錯。另外如果配置好了,啟動了hive的服務和客戶端做了很多操作然后再關閉,再次啟動初始化數據庫時也會報錯,這個時候最好的方法是刪除掉mysql的與hive對應的數據庫實例,然后新建一個相同的即可。
如果出現以下信息則是數據庫的問題
java.sql.SQLException: Access denied for user 'root'@'****' (using password: YES)
這是因為mysql在驗證用戶登陸的時候,首先是驗證host列,如果host列在驗證user列,再password列,而現在按照我之前的連接語句:按照host列找到為空的那列(空匹配所有用戶名),所以匹配到了這條記錄,然后發現這條記錄的密碼為空,而我的語句里面有密碼,那么就會報錯。
解決方案:
mysql> use mysql;
Database changed
mysql> delete from user where user='';
Query OK, 1 row affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
出現以下信息則安裝配置成功
[root@slave1 bin]# schematool -initSchema -dbType mysql
which: no hbase in (/home/hadoop/sqoop-1.4.6/bin:/home/hadoop/apache-hive-2.1.1-bin/bin:/home/hadoop/hadoop-2.7.3/bin:/home/hadoop/jdk1.8.0_144/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed
7. 啟動hive服務和客戶端并創建數據庫和一張表
$: hiveserver2
$: hive
hive> create database dock;
hive> use dock;
hive> create table if not exists dock.dock_tb(
> id varchar(64) COMMENT 'dock id',
> md5code varchar(64) COMMENT 'dock md5 code',
> number varchar(64) COMMENT 'dock number',
> ip varchar(64) COMMENT 'dock ip',
> game varchar(64) COMMENT 'dock game',
> time varchar(64) COMMENT 'dock time',
> day varchar(64) COMMENT 'dock day',
> year varchar(64) COMMENT 'dock year',
> month varchar(64) COMMENT 'dock month',
> type varchar(64) COMMENT 'dock type')
> COMMENT 'Description of the table'
> LOCATION '/data/wscn/dock_test_log/20171101/EtlResult/dockClick';
可以看到dock_tb表以HDFS上/data/wscn/dock_test_log/20171101/EtlResult/dockClick下的文件作為數據源和存儲路徑。
安裝配置sqoop
1. 解壓并配置環境變量
$: tar –zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$: mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6
$: export SQOOP_HOME=/home/hadoop/sqoop-1.4.6
$: export PATH=$PATH:${ SQOOP_HOME }/bin
2. 配置基本信息
$:cd /home/hadoop/sqoop-1.4.6/conf
如下配置,默認是被注釋的
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.3
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.3
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/apache-hive-2.1.1-bin
3. 配置完成后測試
$: sqoop help
如果出現以下信息證明安裝配置成功
Warning: /home/hadoop/sqoop-1.4.6/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.6/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/11/11 02:26:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
利用sqoop將HDFS上格式化后的數據導入到mysql,前提是mysql事先有對應的表
sqoop export --connect jdbc:mysql://127.0.0.1:3306/hive --username hive --password hive --table dock_tb1 --export-dir hdfs://127.0.0.1:9000/data/wscn/dock_test_log/20171101/EtlResult/dockClick --input-fields-terminated-by '\001'
利用sqoop將mysql的數據導入到hdfs
$: sqoop import --connect jdbc:mysql://127.0.0.1:3306/hive --username hive --password hive --table dock_tb --target-dir /data/wscn/dock_test_log/20171101/EtlResult/dockClick1 -m 1
具體的sqoop命令參照http://blog.csdn.net/whiteForever/article/details/52609422

浙公網安備 33010602011771號