๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ’ปTech/๐ŸฆŒImpala

impala ๋ช…๋ น์–ด ์ •๋ฆฌ

by _viper_ 2021. 8. 24.
๋ฐ˜์‘ํ˜•

โ—พ Impala ์˜ต์…˜ ๊ฐ’ ํ™•์ธ

SET ALL;

 


โ—พ ํŒŒ์ผ์„ ๋ถ„ํ• ํ•˜์ง€ ์•Š๊ณ  1๊ฐœ ํŒŒ์ผ๋กœ ์ƒ์„ฑํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์•„๋ž˜ ์˜ต์…˜ ์ ์šฉ ํ›„ insert ์ฟผ๋ฆฌ ์‹คํ–‰

SET NUM_NODES=1;


โ—พ impala shell ์ ‘์†

impala-shell -i impala.host.com:21000 -d default -l -u admin --auth_creds_ok_in_clear

 

โ—พ ํ…Œ์ด๋ธ” ํ†ต๊ณ„ ์ •๋ณด ์ˆ˜์ง‘ (์ฟผ๋ฆฌ ์„ฑ๋Šฅ ํ–ฅ์ƒ, memory ๋ถ€์กฑ ์˜ค๋ฅ˜ ๋ฐœ์ƒ ์˜ˆ๋ฐฉ)

COMPUTE STATS {TABLE_NM}

 

โ—พ ๊ฐ„๋žตํ•œ ํ…Œ์ด๋ธ” ์ •๋ณด ํ™•์ธ (Rows, ํŒŒ์ผ ์ˆ˜, ํ…Œ์ด๋ธ” size ๋“ฑ ์ •๋ณด ํ™•์ธ) 

SHOW TABLE STATS {TABLE_NM}

 

โ—พ ํ…Œ์ด๋ธ” ์ƒ์„ธ ์ •๋ณด ํ™•์ธ

DESCRIBE FORMATTED {TABLE_NM}

 

โ—พ SQL ์ˆ˜ํ–‰ ์ „ ํ•ด๋‹น ์ฟผ๋ฆฌ ์ง„๋‹จํ•˜์—ฌ hint๋ฅผ ์คŒ (Heavy Query ์ˆ˜ํ–‰ ์ „ ์‹คํ–‰ํ•˜๋ฉด ๋„์›€์ด ๋ ๋“ฏ)

EXPLAIN select * from {TABLE_NM};

 

โ—พ ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ์ด ์ƒ๊ฒผ์„ ๋•Œ ์ˆ˜ํ–‰

  • INVALIDATE METADATA : ํ…Œ์ด๋ธ” ์ƒ์„ฑ์ด๋‚˜ ์‚ญ์ œ์‹œ ์ˆ˜ํ–‰ ํ•„์š”
REFRESH {TABLE_NM}; --์ผ๋ฐ˜ ํ…Œ์ด๋ธ”
REFRESH bus_boarding_part PARTITION(work_dt='20191104'); --ํŒŒํ‹ฐ์…˜ ํ…Œ์ด๋ธ”

 

โ—พ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ ํŒŒ์ผ๋ช… ๋ฐ ์‚ฌ์ด์ฆˆ ํ™•์ธ

SHOW FILES IN {TABLE_NM}; --์ผ๋ฐ˜ ํ…Œ์ด๋ธ”
SHOW FILES IN bus_boarding_part PARTITION(work_dt='20191104'); --ํŒŒํ‹ฐ์…˜ ํ…Œ์ด๋ธ”

 

โ—พ ํŒŒํ‹ฐ์…˜ ์ƒ์„ฑ

ALTER TABLE {TABLE_NM} RECOVER PARTITIONS; --๋น ์ง„๊ฑฐ ์ฐพ์•„์„œ ์ฑ„์›Œ์คŒ

 

โ—พ impala ์˜ตํ‹ฐ๋งˆ์ด์ €์—์„œ ์ž๋™์œผ๋กœ Join ์ˆœ์„œ๋ฅผ ์ตœ์ ํ™”ํ•˜์ง€ ๋ชป ํ•  ๊ฒฝ์šฐ, ๋งค๋‰ด์–ผํ•˜๊ฒŒ ํ…Œ์ด๋ธ” Join ์ˆœ์„œ ์ง€์ •ํ•˜์—ฌ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋„์›€์„ ์คŒ

STRAIGHT_JOIN

 

โ—พ Impala SQL ์ตœ์ ํ™” ๊ด€๋ จ ๋งํฌ

๐Ÿ”— https://impala.apache.org/docs/build/html/topics/impala_hints.html

 

Optimizer Hints

The Impala SQL supports query hints, for fine-tuning the inner workings of queries. Specify hints as a temporary workaround for expensive queries, where missing statistics or other factors cause inefficient performance. Hints are most often used for the re

impala.apache.org

 

โ—พ String ๋‚ ์งœ > Timestamp ๋‚ ์งœ๋กœ ๋ณ€๊ฒฝ > Timestamp date ํฌ๋งท ๋ณ€๊ฒฝ > String ์ €์žฅ

from_timestamp(to_timestamp('220329 211243.165608', 'yyMMdd HHmmss.SSSSSS'), 'yyyy-MM-dd HH:mm:ss.SSSSSS')

 

โ—พ impala executor์— instance(thread)๋ฅผ ๋Š˜๋ ค ์ฒ˜๋ฆฌ

  • mt_dop=2๋กœ ์„ค์ •ํ•˜๋ฉด executor 3๊ฐœ ์ผ ๊ฒฝ์šฐ (3*2)๋กœ ์ฒ˜๋ฆฌ๋จ
set mt_dop=2;

 

โ—พ impala coordinator๋ฅผ 2๋Œ€ ์ด์ƒ ์ง€์ •ํ•˜๊ณ  ๋กœ๋“œ๋ฐธ๋Ÿฐ์Šค ๊ตฌ์„ฑ์„ ํ–ˆ์„ ๋•Œ ํ•ด๋‹น coord๊ฐ„ ์Šคํ‚ค๋งˆ ๋™๊ธฐํ™” ๋ฌธ์ œ ๋ฐœ์ƒ ์˜ˆ๋ฐฉ

๐Ÿ”— https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_sync_ddl.html

set sync_ddl=true;

 

โ—พ timestamp ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์‚ฌ์šฉํ•˜๋ฉด SQL ์†๋„๊ฐ€ ๋งŽ์ด ๋Š๋ ค์ง€๋ฏ€๋กœ string ์‚ฌ์šฉ์ด ์ข‹์Œ (๋น„๊ต ๊ฒฐ๊ณผ)

 

โ—พ impala shell ์˜ต์…˜

  • -f : sql ํŒŒ์ผ ์‹คํ–‰
  • -B : ์ถœ๋ ฅ ๊ฒฐ๊ณผ ํ‘œ ํ˜•ํƒœ ์ œ๊ฑฐ (ํ…์ŠคํŠธ ๊ฐ’๋งŒ ์ถ”์ถœ)
  • -c : ์˜ค๋ฅ˜ ๋ฐœ์ƒํ•ด๋„ ๊ณ„์† ์ง„ํ–‰ (--ignore_query_failure)
  • --quiet : ์‹คํ–‰ ๋กœ๊ทธ ๋“ฑ ์—†์ด ๊ฒฐ๊ณผ ์ถœ๋ ฅ
  • -k : ์ปค๋ฒ„๋กœ์Šค ์ธ์ฆ ์‚ฌ์šฉ
  • ์œ„ ์˜ต์…˜๋“ค๋กœ impala-shell ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฐ๊ณผ ๊ฐ’๋งŒ ์–ป์–ด์„œ ๋ณ€์ˆ˜์— ์ €์žฅ ํ›„ shell for๋ฌธ ์‚ฌ์šฉ ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

โ—พ impala 'waiting to be closed' query ์ข…๋ฃŒ python ์ฝ”๋“œ

๐Ÿ”— https://cloudaeon.co.uk/how-to-resolve-top-performance-issue-in-impala/