99福利一区二区视频,日本高清视频网站www,日韩中文字幕精品人妻

1. 一般常用的有5種（textfile, sequencefile, rcfile, orc, parquet），默認的存儲格式是textfile。

2. 5種存儲格式的區別

存儲格式	文件存儲編碼格式	建表指定
textfile	將表中的數據在hdfs上以正常文本的格式存儲，下載后可以直接查看。	stored as textfile
sequencefile	將表中的數據在hdfs上以二進制格式編碼，并將數據壓縮，下載的數據是二進制格式，不可以直接查看，無法可視化。	stored as sequecefile
rcfile	將表中的數據在hdfs上以二進制格式編碼，并且支持壓縮。下載后的數據無法可視化。	stored as rcfile
orc	文件存儲方式為二進制文件。orc文件格式從hive0.11版本后提供，是RcFile格式的優化版，主要在壓縮編碼，查詢性能方面做了優化。按行組分割整個表，行組內進行列式存儲。	stored as orc
parquet	文件存儲方式為二進制文件。parquet基于dremel的數據模型和算法實現，列式存儲。	stored as parquet

3. 實踐操作

標記部分是利用hadoop本身的InputFormat API從不同的數據源讀取數據，OutputFormat API將數據寫成不同的格式，不同的數據源或者不同的存儲格式需要不同的InputFormat和OutFormat來實現。

1）textfile

CREATE TABLE teacher1(
 name string,
 age int
)row format delimited fields terminated by ','
stored as textfile

2) Sequencefile

drop table tbl_textfile;
CREATE TABLE tbl_sequencefile(
 name string,
 age int
)stored as sequencefile;

3)rcfile

CREATE TABLE tbl_rcfile(
 name string,
 age int
)stored as rcfile;

4)orc

CREATE TABLE tbl_orcfile(
 name string,
 age int
)stored as orc;

5)parquet

CREATE TABLE tbl_parquetfile(
 name string,
 age int
)stored as parquet;

4.總結

1)查看存儲的具體的數據內容，并且數據量較小，可以使用默認文件格式textfile

2)不需要查看具體的數據內容，并且數據量較小，可以使用sequencefile

3)數據量較大，一般推薦orc, 如果需要查詢部分列建議使用parquet

參考： https://download.csdn.net/blog/column/9122766/126776080

posted on 2024-10-16 15:41 dw2nn 閱讀(281) 評論(0) 收藏舉報

刷新頁面返回頂部