Hive中視圖機制的初步使用及分析
作者: 大圓那些事 | 文章可以轉載,請以超鏈接形式標明文章原始出處和作者信息
網址: http://www.rzrgm.cn/panfeng412/archive/2013/04/29/hive-view-usage-and-analysis.html
本文是對Hive中邏輯視圖的介紹,通過一個簡單的視圖例子,說明其使用方法及執行過程。
Hive 0.6版本及以上支持視圖(View,詳見Hive的RELEASE_NOTES.txt),Hive View具有以下特點:
1)View是邏輯視圖,暫不支持物化視圖(后續將在1.0.3版本以后支持);
2)View是只讀的,不支持LOAD/INSERT/ALTER。需要改變View定義,可以是用Alter View;
3)View內可能包含ORDER BY/LIMIT語句,假如一個針對View的查詢也包含這些語句, 則View中的語句優先級高;
4)支持迭代View。
CDH4中自帶的Hive版本為0.10.0,支持的View是邏輯視圖,因此本質上來說View只是為了使用上的方便,從執行效率上來說沒有區別,甚至可能因為要多一次對MetaStore元數據的操作效率略有下降(這里只是一種理論上的推測,實際可能看不出太大區別)。
下面是簡單的驗證過程(感興趣的可以看下,以下過程如有問題,可以一起交流):
1)創建一個測試表:
hive> create table test (id int, name string); OK Time taken: 0.19 seconds hive> desc test; OK id int name string Time taken: 0.16 seconds
2)創建一個View之前,使用explain命令查看創建View的命令是如何被Hive解釋執行的:
hive> explain create view test_view (id, name_length) as select id, length(name) from test; OK ABSTRACT SYNTAX TREE: (TOK_CREATEVIEW (TOK_TABNAME test_view) (TOK_TABCOLNAME (TOK_TABCOL id TOK_NULL) (TOK_TABCOL name_length TOK_NULL)) (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL id)) (TOK_SELEXPR (TOK_FUNCTION length (TOK_TABLE_OR_COL name))))))) STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Create View Operator: Create View if not exists: false or replace: false columns: id int, name_length int expanded text: SELECT `id` AS `id`, `_c1` AS `name_length` FROM (select `test`.`id`, length(`test`.`name`) from `default`.`test`) `test_view` name: test_view original text: select id, length(name) from test Time taken: 0.088 seconds
可見,創建View的過程解釋后并沒有實際執行Map Reduce的Stage,只包含一個Create View Operator的Stage,這個階段只是對MySQL MetaStore進行元數據操作,記錄View的相關元數據而已。
3)接下來,實際創建這個View:
hive> create view test_view (id, name_length) as select id, length(name) from test; OK Time taken: 0.1 seconds
4)執行這個View之前,先explain查看實際被翻譯后的執行過程:
hive> explain select name_length from test_view; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_view))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL name_length))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: test_view:test_view:test TableScan alias: test Select Operator expressions: expr: length(name) type: int outputColumnNames: _col1 Select Operator expressions: expr: _col1 type: int outputColumnNames: _col1 Select Operator expressions: expr: _col1 type: int outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Time taken: 0.107 seconds
可以看出,對View進行的查找過程,實際還是對原始test表進行的查詢操作(分為Stage-0和Stage-1兩個階段)。
5)最后,實際對這個View執行一次查詢,顯示Stage-1階段對原始表test進行了MapReduce過程:
hive> select name_length from test_view; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201303092253_0057, Tracking URL = http://jobtracker.host:50030/jobdetails.jsp?jobid=job_201303092253_0057 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201303092253_0057 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-03-13 22:43:39,044 Stage-1 map = 0%, reduce = 0% 2013-03-13 22:43:42,074 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.73 sec 2013-03-13 22:43:43,086 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.73 sec 2013-03-13 22:43:44,098 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.73 sec 2013-03-13 22:43:45,113 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.73 sec MapReduce Total cumulative CPU time: 730 msec Ended Job = job_201303092253_0057 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 0.73 sec HDFS Read: 250 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 730 msec OK Time taken: 15.793 seconds
浙公網安備 33010602011771號