夏天都快過完了

好像只是跑來跑去什么都沒有做

公告

09 Hive安裝與操作

先存兩個包在電腦上，

apache-hive-1.2.1-bin.tar

鏈接：https://pan.baidu.com/s/19koILx8FCa2D65vbK5lIaw
提取碼：1kqz

mysql-connector-java-5.1.40.tar

鏈接：https://pan.baidu.com/s/14OJIHXJoylvMj9M8Axcomw
提取碼：ddtc

因兩個包是windows里，此時可用共享文件夾方式將兩個包掛載到linux系統虛擬機。共享文件夾創建方式可見virtualBox虛擬機Ubuntu系統與主機Windows共享文件夾 - 螞蟻力量 - 博客園 (cnblogs.com)

在linux命令終端輸入 sudo mount -t vboxsf share /mnt/bdshare 完成掛載，進入bdshare文件夾可見文件夾掛載成功

輸入 sudo tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /usr/local 對apache-hive進行解壓并移至/usr/local目錄，進入/usr/local目錄可見文件解壓移動成功

輸入sudo mv apache-hive-1.2.1-bin hive 將文件名改為hive，如下所示

輸入sudo chown -R hadoop:hadoop hive 修改文件夾權限

環境變量配置

gedit或者vim bashrc文件，本例使用gedit，輸入gedit ~/.bashrc打開配置文件編輯

加入以下兩條信息，保存并退出

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin

輸入source ~/.bashrc使配置立刻生效

修改配置文件

進入/usr/local/hive/conf文件夾，輸入cp hive-default.xml.template hive-default.xml復制文件并重命名

新建一個hive-site.xml文件，內容如下：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>

mysql配置

輸入netstat -tap | grep mysql 查看mysql是否配置，如圖進程，已配置

進入共享文件夾對mysql-connector進行解壓

tar -zxvf mysql-connector-java-5.1.40.tar.gz

輸入cp mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib將文件復制到/usr/local/hive/lib下

啟動并登錄mysql shell

service mysql start

mysql -u root -p

新建hive數據庫

create database hive;

配置mysql允許hive接入

grant all on *.* to hive@localhost identified by 'hive' ;

flush privileges;

啟動hive

啟動hive之前，先啟動hadoop集群

start-all.sh

輸入hive，若啟動不成功則輸schematool -dbType mysql -initSchema

我輸了后依然沒成功，查看錯誤

此時進入/usr/local/hive/conf里修改hive-site.xml，修改成如下（在原先基礎上添加&useUnicode=true&characterEncoding=UTF-8&useSSL=false）

再次hive成功進入shell

輸入exit;即可退出hive shell

關閉hadoop集群

stop-all.sh

二、Hive操作

hive創建與查看數據庫

　　　　（1）先開啟hadoop集群，start-all.sh，jps查看環境是否開啟

　　　　（2）進入hive環境

　　　　（3）輸入create database test;創建數據庫test，并show databases;查看數據庫

　　　2.mysql查看hive元數據表DBS

　　　　輸入use hive;

　　　　輸入show tables;查看表

　　　　輸入select * from TBLS;查看hive元數據表DBS

　　3.hive創建與查看表

　　　　輸入use test;進入test數據庫

　　　　輸入create table test(id int);創建test表

　　　　輸入show tables;查看表

　　4.mysql查看hive元數據表TBLS

　　　　輸入select * from TBLS;

　　5.hdfs查看表文件位置

　　　　輸入hdfs dfs -ls /user/hive/warehouse

　　　　或在瀏覽器輸入localhost:50070在菜單欄的Utilities-browse the file system中的搜索框輸入/user/hive/warehouse/，點擊go

　　6.hive刪除表

　　　　輸入drop table test;
　　　　

　　7.mysql查看hive元數據表TBLS

　　　　輸入select × from TBLS;顯示為空

　　8.hive刪除數據庫

　　　　輸入drop database test;
　　　　

　　9.mysql查看hive元數據表DBS

　　　　輸入select * from DBS;

　　10.hdfs查看表文件夾變化

　　　　輸入hdfs dfs -ls /user/hive/warehouse顯示為空
　　　　

　　　　瀏覽器上看看

三、hive進行詞頻統計

準備txt文件

　　　　準備一個文本文件f1.txt，放置在wc目錄，f1.txt內容如下：　

　　2.啟動hadoop，啟動hive

　　　　輸入start-all.sh啟動hadoop，輸入hive啟動hive

　　3.創建并查看文本表 create table

　　　　輸入create table wc(line string);創建表wc，此時表因無指定數據庫，默認放在default數據庫下。

　　4.導入文件的數據到文本表中 load data local inpath

　　　　（1）輸入load data local inpath '/home/hadoop/wc/f1.txt' into table wc;

　　　　（2）輸入 select * from wc;查看
　　　　

　　5.分割文本 split

　　　　輸入select split(line,'') from wc;分割文本查看

　　6.行轉列explode

　　　　輸入select explode(split(line,'')) from wc;或select explode(split(line,'')) as word from wc;查看

　　7.統計詞頻group by

　　　　輸入select word,count(1) as count from (select explode(split(line,''))as word from wc)w group by word order by word;

　　1.準備txt文件

　　　　準備f2.txt放于wc目錄中

　　2.上傳文件至hdfs

　　　　輸入 start-all.sh啟動hadoop

　　　　輸入hdfs dfs -put ~/f2.txt input上傳文件至hdfs，輸入hdfs dfs -ls input查看

　　3.從hdfs導入文件內容到表wctext, 并查看hdfs源文件，hdfs數據庫文件　

　　　　進入hive shell，創建一個wctext表（line string表示行字符串），輸入load data inpath '/user/hadoop/input/f2.txt' into table wctext;導入文件內容

　　　　輸入hdfs dfs -ls input查看hdfs數據庫文件，輸入hdfs dfs -ls /user/hive/warehouse/wctext查看hdfs源文件

　　4.統計詞頻

　　　　輸入select word,count(1) from (select explode(split(line,' ')) as word from wctext) t group by word ;統計詞頻，如圖所示，

　　5.詞頻統計結果存到數據表里，并查看表和文件

　　　　輸入create table wc as select word,count(1) as count from (select explode(split(line,' ')) as word from wctext) w group by word order by word;把統計結果存入數據表wc，輸入show tables;和select * from wc;查看表和文件

　　　　hdfs查看

　　6.本地調用本地hql文件進行詞頻統計，將結果保存為本地文件

　　　　本地創建一個wc.hql文件，將以下語句寫進文件：select word,count(1) as count from (select explode(split(word,' ')) as word from wc) t group by word order by count desc;

　　　　老師的是select word,count(1) as count from (select explode(split(line,' ')) as word from wc) t group by word order by count desc;但我會報錯，如圖所示，查看wc表結構，故將line改成word（注：此處應在wc.hql文件中輸入命令，不必在hive shell中輸入！）

　　　　輸入hive -f wc.hql > wcoutput.txt將文件映射生成文本文件wcoutput.txt

　　　　輸入cat wcoutput.txt查看文件

posted on 2021-12-07 16:39 夏天都快過完了閱讀(325) 評論(0) 收藏舉報

刷新頁面返回頂部