Hamutaro - Hamtaro 4

Infra/Apache 3

[Apache] Apache Kafka์˜ ๊ฐœ๋…

Apache Kafka๋ž€?์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ํ”Œ๋žซํผ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐœํ–‰(Publish)ํ•˜๋Š” ์ชฝ๊ณผ ๊ตฌ๋…(Subscribe)ํ•˜๋Š” ์ชฝ์„ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์—ฐ๊ฒฐํ•ด์ฃผ๋Š” ๋ฉ”์‹œ์ง€ ํ ์‹œ์Šคํ…œ-> ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์œผ๋กœ ์ฃผ๊ณ ๋ฐ›๋Š” ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ๊ธฐ๋ณธ ๊ฐœ๋… ๊ตฌ์กฐ [Producer] → [Kafka Broker] → [Consumer] Producer : ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐœํ–‰(์ „์†ก)ํ•˜๋Š” ์ฃผ์ฒด (ex : ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜, ์„ผ์„œ, ์„œ๋ฒ„ ๋กœ๊ทธ ๋“ฑ)Consumer : ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋…(์ˆ˜์‹ )ํ•˜๋Š” ์ฃผ์ฒด (ex : Spark, ELK, DB ๋“ฑ)Broker : Kafka ์„œ๋ฒ„๋กœ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์ „๋‹ฌTopic : ๋ฉ”์‹œ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋…ผ๋ฆฌ์  ์ฑ„๋„Partition : Topic์„ ๋ถ„ํ• ํ•œ ๋‹จ์œ„Offset : ๊ฐ ๋ฉ”์‹œ์ง€๊ฐ€ Partition..

Infra/Apache 2025.02.25

[Apache] Apache Airflow์˜ ๊ฐœ๋…

Apache Airflow๋ž€?๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์„ค๊ณ„, ์Šค์ผ€์ค„๋ง, ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ์›Œํฌํ”Œ๋กœ์šฐ ๊ด€๋ฆฌ ํ”Œ๋žซํผ์ฆ‰, ๋ฌด์—‡์„ ์–ธ์ œ ์–ด๋–ป๊ฒŒ ์ˆœ์„œ๋Œ€๋กœ ์‹คํ–‰ํ• ์ง€ ์ฝ”๋“œ๋กœ ์ •์˜ํ•˜๊ณ  ์ž๋™์œผ๋กœ ์‹คํ–‰๋˜๋„๋ก ๋„์™€์ฃผ๋Š” ์‹œ์Šคํ…œ Airflow ํ•ต์‹ฌ ๊ฐœ๋…DAG (Directed Acyclic Graph) : ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ „์ฒด ํ๋ฆ„์„ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„๋กœ Task๊ฐ„์˜ ์ˆœ์„œ๋ฅผ ์ •์˜Task : ์‹ค์ œ๋กœ ์ˆ˜ํ–‰๋˜๋Š” ๋‹จ์œ„ ์ž‘์—… (ex : Spark Job ์‹คํ–‰, SQL ์ฟผ๋ฆฌ ์‹คํ–‰, API ํ˜ธ์ถœ ๋“ฑ)Operator : ํŠน์ • ์ข…๋ฅ˜์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” Task์˜ ํ…œํ”Œ๋ฆฟScheduler : DAG์— ์ •์˜๋œ ์ฃผ๊ธฐ์— ๋งž์ถฐ Task ์‹คํ–‰ ์‹œ์ ์„ ๊ฒฐ์ •Executor : ์‹ค์ œ๋กœ Task๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์ฃผ์ฒดWeb UI : DAG ์‹คํ–‰ ํ˜„ํ™ฉ, ๋กœ๊ทธ, Task ์‹คํŒจ ์•Œ๋ฆผ ๋“ฑ..

Infra/Apache 2025.02.20

[Apache] Apache Spark์˜ ๊ฐœ๋…

Apache Spark ๋ž€?๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ถ„์‚ฐ ์ปดํ“จํŒ… ํ”„๋ ˆ์ž„์›Œํฌ์—ฌ๋Ÿฌ๋Œ€์˜ ์ปดํ“จํ„ฐ๋ฅผ ๋ฌถ์–ด ํ•˜๋‚˜์˜ ๊ฑฐ๋Œ€ํ•œ ์—ฐ์‚ฐ ์žฅ๋น„์ฒ˜๋Ÿผ ๋™์ž‘ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ๋ถ„์„ํ•˜๊ณ  ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ Apache Spark ํ•ต์‹ฌ ๊ฐœ๋…๋ถ„์‚ฐ ์ฒ˜๋ฆฌ (Distributed Computing)Spark๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๋…ธ๋“œ๋กœ ๋‚˜๋ˆ ์„œ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ผ์„œ๋ฒ„๋ณด๋‹ค ๋น ๋ฅธ ์†๋„๋กœ ๋ถ„์„์ด ๊ฐ€๋Šฅ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜ ์ฒ˜๋ฆฌ (In-Memory Computing)Spark๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ(RAM)์— ์ €์žฅํ•œ ์ฑ„๋กœ ์—ฐ์‚ฐ์„ ์ด์–ด๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ๋น ๋ฆ„RDD (Resilient Distributed Dataset)Spark์˜ ํ•ต์‹ฌ ์ž๋ฃŒ๊ตฌ์กฐRDD๋Š” ๋ถ„์‚ฐ๋œ ๋ฐ์ดํ„ฐ์˜ ๋ถˆ๋ณ€ํ•œ ์ปฌ๋ ‰์…˜์œผ๋กœ ๋ณ‘๋ ฌ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•ด๋„ ๋ณต๊ตฌ ๊ฐ€๋Šฅํ•œ ๊ตฌ์กฐ Apache Spark ์ฃผ์š”..

Infra/Apache 2025.02.20