๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ’ปTech/๐ŸHIVE

Hive ๋ช…๋ น์–ด ์ •๋ฆฌ

by _viper_ 2017. 2. 15.
๋ฐ˜์‘ํ˜•

โ—พ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ƒ์„ฑ

CREATE DATABASE IF NOT EXISTS db_nm;

 

 

โ—พ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

  • External : hive์—์„œ ํ…Œ์ด๋ธ” dropํ•  ๊ฒฝ์šฐ hdfs ๊ฒฝ๋กœ ๋ฐ ํŒŒ์ผ ๋ณด์กด
    • HDFS ๊ฒฝ๋กœ์˜ ๋ฐ์ดํ„ฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Amazon, Azure ๋“ฑ์˜ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋กœ ์ง€์ • ๊ฐ€๋Šฅ
  • Managed : hive์—์„œ ํ…Œ์ด๋ธ” dropํ•  ๊ฒฝ์šฐ hdfs ๊ฒฝ๋กœ ๋ฐ ํŒŒ์ผ ์‚ญ์ œ 

 

-- External Table
CREATE EXTERNAL TABLE IF NOT EXISTS db_nm.table_nm ( 
a string comment 'a',
b string comment 'b',
c string comment 'c'
) 
comment 'table comment'
PARTITIONED BY (DT STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n' 
STORED AS TEXTFILE 
LOCATION 'hdfs://' 
;

-- Managed(Internal) Table
CREATE TABLE IF NOT EXISTS db_nm.table_nm ( 
a string comment 'a',
b string comment 'b',
c string comment 'c'
) 
comment 'table comment'
PARTITIONED BY (DT STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n' 
STORED AS TEXTFILE  
;

 

 

 

โ—พ ํ…Œ์ด๋ธ” DROP

  • MANAGED ํ…Œ์ด๋ธ”
    • hdfs ๋ฐ์ดํ„ฐ ์‚ญ์ œ 
  • EXTERNAL ํ…Œ์ด๋ธ”
    • tblproperties์— 'external.table.purge'='TRUE'๊ฐ€ ์žˆ์œผ๋ฉด  hdfs ๋ฐ์ดํ„ฐ ์‚ญ์ œ ๋จ
    • tblproperties์— 'external.table.purge'='TRUE'๊ฐ€ ์—†์œผ๋ฉด  hdfs ๋ฐ์ดํ„ฐ ์‚ญ์ œ  ์•ˆ๋จ
  • drop table ... purge;
    • purge ํ‚ค์›Œ๋“œ๊ฐ€ ์—†์œผ๋ฉด hdfs ๋ฐ์ดํ„ฐ ์‚ญ์ œ๋  ๋•Œ .Trash๋กœ ์˜ฎ๊ฒจ์ง
    • purge ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด hdfs ๋ฐ์ดํ„ฐ ์‚ญ์ œ๋  ๋•Œ .Trash๋กœ ์˜ฎ๊ฒจ์ง€์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ์‚ญ์ œ
    • purge ํ‚ค์›Œ๋“œ๋Š” hdfs rm ๋ช…๋ น์–ด์˜ -skipTrash ์˜ต์…˜์„ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๋™์ผ ํšจ๊ณผ
  • truncate table <table_name>;
    • ์Šคํ‚ค๋งˆ๋Š” ์œ ์ง€ํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋งŒ ์ง€์šธ ๊ฒฝ์šฐ ์‚ฌ์šฉ (Managed ํ…Œ์ด๋ธ”์—์„œ๋งŒ ์‹คํ–‰ ๊ฐ€๋Šฅ)

 

DROP TABLE IF EXISTS db_nm.table_nm;

 

 

โ—พ ํŒŒํ‹ฐ์…˜ ์ƒ์„ฑ (3๊ฐ€์ง€ ๋ฐฉ๋ฒ•)

  • MSCK REPAIR TABLE
    • hdfs ๊ฒฝ๋กœ๋ฅผ ์ฝ์–ด์„œ ์กด์žฌํ•˜๋Š” ํŒŒํ‹ฐ์…˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์ถ”๊ฐ€
    • ํ•ด๋‹น ํ…Œ์ด๋ธ”์˜ hdfs ๊ฒฝ๋กœ ์ „์ฒด๋ฅผ ์ฝ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ timeout์ด๋‚˜ OOM ๋ฐœ์ƒ
    • hive.msck.repair.batch.size๋กœ OOM ์˜ˆ๋ฐฉ ๊ฐ€๋Šฅ
  • ALTER TABLE ## ADD PARTITION
    • ์ˆ˜๋™์œผ๋กœ ํŠน์ • ํŒŒํ‹ฐ์…˜ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Œ
  • ALTER TABLE ## ADD PARTITION ## LOCATION
    • hdfs ํŒŒํ‹ฐ์…˜ ๊ฒฝ๋กœ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ /tmp/dt=20200320์˜ (=) ํ˜•ํƒœ๋ฅผ ๊ฐ–์ถฐ์•ผ ์ œ๋Œ€๋กœ ์ธ์‹์„ ํ•˜๊ณ  ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ hdfs ๊ฒฝ๋กœ์˜ (=)์„ ์ง€์ •ํ•ด ์ค„ ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ LOCATION์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒํ‹ฐ์…˜๊ณผ hdfs๊ฒฝ๋กœ๋ฅผ ๋งคํ•‘ํ•ด ์ค„ ์ˆ˜ ์žˆ์Œ
  • ๋ช…๋ น์–ด ์ˆ˜ํ–‰ ์ค‘ ์—๋Ÿฌ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก IF NOT EXISTS ์ถ”๊ฐ€

 

MSCK REPAIR TABLE db_nm.table_nm;

ALTER TABLE db_nm.table_nm ADD IF NOT EXISTS PARTITION(dt='20200327');

ALTER TABLE db_nm.table_nm ADD IF NOT EXISTS PARTITION (dt='20200320') LOCATION '/tmp/audit/test/20200320';
ALTER TABLE db_nm.table_nm ADD IF NOT EXISTS PARTITION (dt='20200320',hour='00') LOCATION '/tmp/audit/test/20200320/00';

 

 

โ—พ ํŒŒํ‹ฐ์…˜ ์‚ญ์ œ

ALTER TABLE db_nm.table_nm DROP IF EXISTS PARTITION(dt='20200327');

 

 

โ—พ ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ ์กฐํšŒ

desc db_nm.table_nm;

 

 

โ—พ ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ ์กฐํšŒ

  • hdfs ๊ฒฝ๋กœ, ๊ตฌ๋ถ„์ž ์ •๋ณด ๋“ฑ ํฌํ•จ, ์ž์„ธํžˆ ๋ณผ ์ˆ˜ ์žˆ์Œ

 

desc formatted db_nm.table_nm;

 

 

โ—พ ํ…Œ์ด๋ธ” Comment ์กฐํšŒ

desc extended db_nm.table_nm;

 

 

โ—พ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๋ณ€๊ฒฝ

  • ํŒŒํ‹ฐ์…˜ ์žฌ์ƒ์„ฑ ํ•  ํ•„์š” ์—†์Œ

 

ALTER TABLE table_name RENAME TO new_table_name;

 

 

โ—พ DELIMITER ๋ณ€๊ฒฝ

  •  'field.delim' = ์›ํ•˜๋Š” ๊ตฌ๋ถ„์ž๋กœ ๋ณ€๊ฒฝ

 

ALTER TABLE table_nm set serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('field.delim' = ',');

 

 

โ—พ ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ ์ปฌ๋Ÿผ ์ถ”๊ฐ€

  • ์ด ๋ฐฉ๋ฒ•์€ ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ ๋งˆ์ง€๋ง‰์— ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•  ๋•Œ๋งŒ ์‚ฌ์šฉ

 

ALTER TABLE db_nm.tb_nm ADD COLUMS (`col10` STRING COMMENT '10๋ฒˆ์งธ ์ปฌ๋Ÿผ', `col11` STRING);

 

 

โ—พ ์Šคํ‚ค๋งˆ Locaion ๋ณ€๊ฒฝ

  • ํŒŒํ‹ฐ์…˜ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ ํŒŒํ‹ฐ์…˜ ๋ชจ๋‘ drop ํ•ด์ฃผ๊ณ  ๋‹ค์‹œ msck repair table๋กœ ํŒŒํ‹ฐ์…˜ ์ƒ์„ฑ ํ•„์š”

 

ALTER TABLE table_nm SET LOCATION "hdfs://want/location";

 

 

โ—พ ์ปฌ๋Ÿผ๋ช… ๋ฐ ์ปฌ๋Ÿผํƒ€์ž… ๋ณ€๊ฒฝ

-- ์ „์ฒด์ ์šฉ
ALTER TABLE db_nm.tb_nm CHANGE COLUMN old_col new_col STRING;

-- ํŒŒํ‹ฐ์…˜ ๋งŽ์„ ๊ฒฝ์šฐ ํŒŒํ‹ฐ์…˜ ๋ณ„ ์ ์šฉ
ALTER TABLE db_nm.tb_nm PARTITION(part_dt=20201124) CHANGE COLUMN old_col new_col CHAR(5);

 

 

โ—พ hive/beeline ํ˜„์žฌ ์ ‘์† user ํ™•์ธ

select current_user();

 

 

โ—พ Hive SQL๋กœ CSV ํŒŒ์ผ HDFS ๊ฒฝ๋กœ์— ์ €์žฅ

insert overwrite directory 'hdfs://clster-name/tmp/csv'
row format delimited fields terminated by ','
select * from tmp.csv_table;

 

 

โ—พ Hive SQL๋กœ CSV ํŒŒ์ผ ๋กœ์ปฌ ์„œ๋ฒ„ ๊ฒฝ๋กœ์— ์ €์žฅ

insert overwrite local directory 'file:///tmp/csv'
row format delimited fields terminated by ','
select * from tmp.csv_table;

 

 

โ—พ ํ…Œ์ด๋ธ” ์กฐํšŒ (like ๊ตฌ๋ฌธ ์‚ฌ์šฉ)

use default;
show tables like 'sample.*';    -- default ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— sample๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ…Œ์ด๋ธ” ์กฐํšŒ

 

 

โ—พ String Type์œผ๋กœ ์ €์žฅํ•œ timestamp ์ปฌ๋Ÿผ ๊ฐ’์„ date format์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•

from_unixtime(cast(ds_timestamp as bigint) div 1000, 'yyyy-MM-dd HH')

 

 

โ—พ hql ํŒŒ์ผ๋กœ ์—ฌ๋Ÿฌ ์ค„ ์‹คํ–‰ํ•  ๋•Œ ์—๋Ÿฌ ๋ฐœ์ƒํ•ด๋„ ๊ณ„์† ์ง„ํ–‰ํ•˜๋Š” ์˜ต์…˜

hive --force=true -f test.hql

 

 

โ—พ Hive ์ฟผ๋ฆฌ ์ง„ํ–‰ ์•ˆ๋  ์‹œ ์ฐธ๊ณ ํ•  ๋ช…๋ น์–ด

show locks;
show transactions;

 

 

โ—พ Hive ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ํŒŒ์ผ merge (1๊ฐœํŒŒ์ผ๋กœ ๋ณ‘ํ•ฉ)

--mapreduce ์—”์ง„์ผ ๊ฒฝ์šฐ
set hive.merge.mapredfiles=true;

--tez ์—”์ง„์ผ ๊ฒฝ์šฐ
set hive.merge.tezfiles=true;

--ํŒŒ์ผ 1๊ฐœ๋‹น ์‚ฌ์ด์ฆˆ
set hive.merge.size.per.task=512000000;