๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ’ปTech/๐ŸHIVE17

[Hive,Impala] sqlํŒŒ์ผ ์‹คํ–‰ํ•  ๋•Œ ๋ณ€์ˆ˜ ๋„˜๊ธฐ๋Š” ๋ฐฉ๋ฒ• โ—พ Hive hive 3.0๋ถ€ํ„ฐ๋Š” hiveconf ์‚ฌ์šฉ์ด ์•ˆ ๋ผ์„œ hivevar๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -hivevar ์‚ฌ์šฉ hive --hivevar dt=20190923 -f hive.sql -hive.sql ํŒŒ์ผ ๋‚ด์—์„œ ๋ณ€์ˆ˜ ๋ฐ›๋Š” ๋ฐฉ๋ฒ• (ํŒŒํ‹ฐ์…˜ ์ƒ์„ฑ ์˜ˆ์ œ) ALTER TABLE dbnm.tblnm ADD PARTITION(dt='${hivevar:dt}'); โ—พ Impala impala-shell -k --var="dt=20230821" -f impala.sql 2023. 8. 21.
hive/impala udf ๋“ฑ๋ก ๋ฐฉ๋ฒ• hdfs ํŒŒ์ผ ์—…๋กœ๋“œ ํ›„ impala, hive SQL์—์„œ ๊ฐ๊ฐ function ์ƒ์„ฑํ•ด์ฃผ๊ณ , function์ด db ๊ธฐ์ค€์œผ๋กœ ์ƒ์„ฑ๋˜๊ธฐ ๋•Œ๋ฌธ์—, db๋ช…์‹œ๋ฅผ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. โ—พ Impala create function default.count_date(string) returns string location 'hdfs:///user/hive/udf/udf-0.1.0.jar' symbol='udf.count_date'; โ—พ Hive create function default.count_date as 'udf.count_date' using jar 'hdfs:///user/hive/udf/udf-0.1.0.jar'; 2023. 1. 16.
[Hive] multi delimiter ํ…Œ์ด๋ธ” DDL create external table txt_test( a string, b string, c string, d string, e string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|\001|") LOCATION 'hdfs://name/tmp/test'; 2023. 1. 11.
[Hive] sequence ์ปฌ๋Ÿผ ์ƒ์„ฑ ๋ฐฉ๋ฒ• ๐Ÿ™†‍โ™‚๏ธ ROW_NUMBER() OVER() ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•ด์„œ seq ๋ฒˆํ˜ธ๋ฅผ ์ €์žฅ ## ๊ธฐ์กด ํ…Œ์ด๋ธ”์—์„œ seq ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•œ ํ…Œ์ด๋ธ” ์ƒ์„ฑ CREATE TABLE new_table( seq int, code string, description string, total_emp int, salary int ) STORE AS parquet; ## ๊ธฐ์กด ํ…Œ์ด๋ธ”์— ROW_NUMBER() OVER() ํ•ฉ์ˆ˜ ์‚ฌ์šฉํ•ด์„œ seq ์ปฌ๋Ÿผ ์ถ”๊ฐ€ํ•œ ๋ฐ์ดํ„ฐ insert insert overwrite table new_table select * from ( select ROW_NUMBER() OVER() as RNUM, * from old_table) a where a.RNUM 2021. 4. 28.
[Hive] metastore ์Šคํ‚ค๋งˆ ๋ฒ„์ „ ์—…๊ทธ๋ ˆ์ด๋“œ ์‚ฌ์šฉ ์ค‘์ด๋˜ hive ๋ฒ„์ „์„ ์˜ฌ๋ ธ์„ ๋•Œ metastore ์Šคํ‚ค๋งˆ๊ฐ€ ๋ณ€๊ฒฝ๋œ ๊ฒฝ์šฐ ์Šคํ‚ค๋งˆ๋„ ์—…๊ทธ๋ ˆ์ด๋“œ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์—…๊ทธ๋ ˆ์ด๋“œํ•˜์ง€ ์•Š์œผ๋ฉด hive ์„œ๋น„์Šค ์‹คํ–‰ํ•  ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ํ•ด๋‹น ์ปฌ๋Ÿผ์„ ์ฐพ์„ ์ˆ˜ ์—†๋‹ค๋Š” ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. java.sql.BatchUpdateException: Unknown column 'DB_MANAGED_LOCATION_URI' in 'field list' ๐Ÿ’น ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฐฉ๋ฒ• hive์—์„œ ์ง€์›ํ•˜๋Š” schemaTool์„ ์‚ฌ์šฉํ•˜์—ฌ ์—…๊ทธ๋ ˆ์ด๋“œ ์ง„ํ–‰ํ•ด ์ค๋‹ˆ๋‹ค. โ—พ ๋ฒ„์ „ ํ™•์ธ ๋ช…๋ น์–ด hive --service schemaTool -url jdbc:mysql://[HOSTNAME]:3306/metastore -userName hive --passWord hive -dbType mysql -info.. 2020. 6. 2.
[Hive] ํ…Œ์ด๋ธ” JSON ํฌ๋งท ์‚ฌ์šฉ ๋ฐฉ๋ฒ• 1. JSON ๋ฐ์ดํ„ฐ ํ™•์ธ 2. Hive ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค/ํ…Œ์ด๋ธ” ์ƒ์„ฑ json ์ค‘์ฒฉ ๊ฐ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฒฝ์šฐ string์œผ๋กœ ์ฒ˜๋ฆฌ json ์ปฌ๋Ÿผ type ์ฃผ์˜ (arrary) - ์ปฌ๋Ÿผ List type์„ String์œผ๋กœ ์„ค์ •ํ•ด์„œ ์•„๋ž˜ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•จ. org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Field name expected CREATE DATABASE IF NOT EXISTS db_nm; CREATE EXTERNAL TABLE IF NOT EXISTS db_nm.tb_nm ( repotype string, repo string, requser string, tags array ) PARTITIONED BY (dt string) ROW.. 2020. 3. 27.
[Hive] ์ ‘์† session timeout ์„ค์ • hive.server2.idle.operation.timeout = 2h hive.server2.idle.session.timeout = 3h hive.server2.session.check.interval = 3600000 (1h) 2019. 11. 21.
[Hive] ์ฟผ๋ฆฌ ์กฐํšŒ์‹œ ํŒŒํ‹ฐ์…˜ ํ•„์ˆ˜ ์ž…๋ ฅ ์˜ต์…˜ hive ์ฟผ๋ฆฌ ์‹คํ–‰ ์‹œ ํŒŒํ‹ฐ์…˜ ์กฐ๊ฑด์„ ์ฃผ์ง€ ์•Š๊ณ  ์‹คํ–‰ํ•  ๊ฒฝ์šฐ YARN ์ž์›์„ ๋งŽ์ด ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿฌ์Šคํ„ฐ์— ๋ถ€ํ•˜๋ฅผ ์ค๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด hive ํŒŒํ‹ฐ์…˜ ํ…Œ์ด๋ธ” ๊ฒฝ์šฐ where ๊ตฌ๋ฌธ ํ•„์ˆ˜ ์ž…๋ ฅ ์กฐ๊ฑด์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค set hive.mapred.mode=strict 2019. 9. 30.