Hamutaro - Hamtaro 4

Database/Engineering

[DE] dbt(data build tool) - SQL์„ ์ฝ”๋“œ์ฒ˜๋Ÿผ ๊ด€๋ฆฌํ•˜๋Š” ๋„๊ตฌ

carsumin 2026. 5. 21. 10:43
๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์— ๋ฐ์ดํ„ฐ๋ฅผ ์Œ“๋Š” ๊ฒƒ๋ณด๋‹ค ์–ด๋ ค์šด ๊ฒƒ์€ ์Œ“์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
dbt๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ๋„๊ตฌ์ด๋‹ค.

 

 

dbt๊ฐ€ ์ƒ๊ธด ์ด์œ 
  • dbt ์ด์ „์˜ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜ ์ž‘์—…
    • SQL ํŒŒ์ผ์ด ์ˆ˜๋ฐฑ ๊ฐœ๊ฐ€ ๋˜๋ฉด ๊ด€๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.
      1. ์–ด๋–ค ์ˆœ์„œ๋กœ ์‹คํ–‰ํ•ด์•ผ ํ•˜๋Š”์ง€ ์•Œ ์ˆ˜ ์—†๋‹ค.
      2. ๋ˆ„๊ฐ€ ์–ธ์ œ ์ˆ˜์ •ํ–ˆ๋Š”์ง€ ์ถ”์ ์ด ์•ˆ๋œ๋‹ค.
      3. ํ…Œ์ŠคํŠธ๊ฐ€ ์—†์–ด์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊นจ์ ธ๋„ ํ•œ์ฐธ ๋’ค์— ์•Œ๊ฒŒ ๋œ๋‹ค.
      4. ๊ฐ™์€ ๋กœ์ง์„ ์—ฌ๋Ÿฌ ํŒŒ์ผ์—์„œ ์ค‘๋ณต์œผ๋กœ ์ž‘์„ฑํ•œ๋‹ค.
transformation_v1.sql
transformation_v1_final.sql
transformation_v1_final_์ง„์งœ์ตœ์ข….sql
transformation_v2_์ˆ˜์ •.sql

 

  • dbt๋Š” ์ด ๋ฌธ์ œ๋“ค์„ ํ•œ ๋ฒˆ์— ํ•ด๊ฒฐํ•œ๋‹ค.

 

ํ•ต์‹ฌ ๊ฐœ๋… 1 - ๋ ˆ์ด์–ด ๊ตฌ์กฐ
  • dbt๋Š” SQL ํŒŒ์ผ์„ ์—ญํ• ์— ๋”ฐ๋ผ ์„ธ ๋ ˆ์ด์–ด๋กœ ๋‚˜๋ˆ ์„œ ๊ด€๋ฆฌํ•œ๋‹ค.
models/
โ”œโ”€โ”€ staging/        ← ์†Œ์Šค ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์˜ฌ๋ฆฌ๋Š” ๋ ˆ์ด์–ด
โ”‚   โ”œโ”€โ”€ stg_orders.sql
โ”‚   โ””โ”€โ”€ stg_customers.sql
โ”œโ”€โ”€ intermediate/   ← ์ค‘๊ฐ„ ๋ณ€ํ™˜ ๋ ˆ์ด์–ด
โ”‚   โ””โ”€โ”€ int_orders_with_customers.sql
โ””โ”€โ”€ marts/          ← ์ตœ์ข… ๋ถ„์„์šฉ ๋ ˆ์ด์–ด (Star Schema)
    โ”œโ”€โ”€ fct_orders.sql
    โ””โ”€โ”€ dim_customers.sql

 

  • ๊ฐ ๋ ˆ์ด์–ด์˜ ์—ญํ• ์€ ๋ช…ํ™•ํ•˜๋‹ค.
    • staging : ์†Œ์Šค ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์˜ฌ๋ฆฌ๋˜, ์ปฌ๋Ÿผ๋ช…๊ณผ ํƒ€์ž…๋งŒ ์ •๋ฆฌํ•œ๋‹ค.
    • intermediate : staging ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ธํ•˜๊ฑฐ๋‚˜ ๊ฐ€๊ณตํ•˜๋Š” ์ค‘๊ฐ„ ๋‹จ๊ณ„๋‹ค.
    • marts : ๋ถ„์„๊ฐ€๋‚˜ BI ๋„๊ตฌ๊ฐ€ ์ง์ ‘ ์‚ฌ์šฉํ•˜๋Š” ์ตœ์ข… ํ…Œ์ด๋ธ”์ด๋‹ค.

 

ํ•ต์‹ฌ ๊ฐœ๋… 2 - ref()๋กœ ์˜์กด์„ฑ ์„ ์–ธ
  • dbt์—์„œ ํ…Œ์ด๋ธ” ๊ฐ„ ์ฐธ์กฐ๋Š” '{{ ref() }}'๋กœ ์„ ์–ธํ•œ๋‹ค.
-- marts/fct_orders.sql
SELECT
    o.order_id,
    o.customer_id,
    o.amount,
    c.region
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
    ON o.customer_id = c.customer_id

 

  • {{ ref() }} ํ•˜๋‚˜๋กœ ๋‘ ๊ฐ€์ง€๊ฐ€ ํ•ด๊ฒฐ๋œ๋‹ค.
    • dbt๊ฐ€ ์ž๋™์œผ๋กœ ์‹คํ–‰ ์ˆœ์„œ๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. (DAG ์ƒ์„ฑ)
    • 'stg_orders'๊ฐ€ ์™„๋ฃŒ๋˜์ง€ ์•Š์œผ๋ฉด 'fct_orders'๋Š” ์‹คํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค.

 

 

ํ•ต์‹ฌ ๊ฐœ๋… 3 - ๋ฐ์ดํ„ฐ ํ…Œ์ŠคํŠธ
  • dbt test๋Š” ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊นจ์ง„ ์ค„ ๋ชจ๋ฅด๊ณ  ๋Œ€์‹œ๋ณด๋“œ๊ฐ€ ๊ณ„์† ๋Œ์•„๊ฐ€๋Š” ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•œ๋‹ค.

1. Generic Test - ๋‚ด์žฅ ํ…Œ์ŠคํŠธ

  • 'schema.yml'์— ์„ ์–ธ๋งŒ ํ•˜๋ฉด ๋œ๋‹ค. SQL์„ ์ง์ ‘ ์ž‘์„ฑํ•˜์ง€ ์•Š์•„๋„ ๋œ๋‹ค.
  • dbt ๋‚ด์žฅ Generic Test๋Š” 4์ข… : 'unique', 'not_null', 'accepted_values', 'relationships'
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique        # ์ค‘๋ณต ํ–‰ ์—†์–ด์•ผ ํ•จ
          - not_null      # null ์—†์–ด์•ผ ํ•จ

      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'delivered', 'cancelled']

      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id  # FK ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์‚ฌ

 

2. Singular Test - ์ปค์Šคํ…€ ํ…Œ์ŠคํŠธ

  • ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง์— ๋งž๋Š” ํ…Œ์ŠคํŠธ๋Š” SQL ํŒŒ์ผ๋กœ ์ง์ ‘ ์ž‘์„ฑํ•œ๋‹ค.
  • ํ•ต์‹ฌ ์›์น™ : ๊ฒฐ๊ณผ๊ฐ€ 0๊ฑด = ํ†ต๊ณผ, 1๊ฑด ์ด์ƒ = ์‹คํŒจ
  • ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๋Š” ์ฟผ๋ฆฌ๋ฅผ ์ž‘์„ฑํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๊ฐ€ ์—†์–ด์•ผ ํ…Œ์ŠคํŠธ๋ฅผ ํ†ต๊ณผํ•œ๋‹ค.
-- tests/assert_revenue_is_positive.sql
-- ํ™˜๋ถˆ ๊ฑด์„ ์ œ์™ธํ•œ ์ฃผ๋ฌธ์˜ revenue๋Š” ํ•ญ์ƒ ์–‘์ˆ˜์—ฌ์•ผ ํ•œ๋‹ค.

SELECT order_id
FROM {{ ref('fct_orders') }}
WHERE status != 'refunded'
  AND revenue <= 0

 

 

ํ™•์žฅ ํŒจํ‚ค์ง€ - dbt-expectations
  • ๋” ํ’๋ถ€ํ•œ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•„์š”ํ•˜๋ฉด 'dbt-expectations' ํŒจํ‚ค์ง€๋ฅผ ์“ด๋‹ค.
- name: amount
  tests:
    - dbt_expectations.expect_column_values_to_be_between:
        min_value: 0
        max_value: 10000000

- name: ordered_at
  tests:
    - dbt_expectations.expect_column_values_to_be_of_type:
        column_type: timestamp

 

 

ํ…Œ์ŠคํŠธ ์‹ฌ๊ฐ๋„ ์„ค์ •
  • ๋ชจ๋“  ํ…Œ์ŠคํŠธ ์‹คํŒจ๊ฐ€ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ฉˆ์ถฐ์•ผ ํ•˜๋Š” ๊ฑด ์•„๋‹ˆ๋‹ค.
    • ๋น„์ฆˆ๋‹ˆ์Šค ์ž„ํŒฉํŠธ๊ฐ€ ํฐ ํ…Œ์ŠคํŠธ๋Š” 'error', ์ฐธ๊ณ ์šฉ์€ 'warn'์œผ๋กœ ๋‚˜๋ˆ ์„œ ๊ด€๋ฆฌํ•œ๋‹ค.
- name: order_id
  tests:
    - unique:
        severity: error   # ์‹คํŒจ ์‹œ ํŒŒ์ดํ”„๋ผ์ธ ์ค‘๋‹จ (๊ธฐ๋ณธ๊ฐ’)
    - not_null:
        severity: warn    # ์‹คํŒจํ•ด๋„ ๊ฒฝ๊ณ ๋งŒ, ํŒŒ์ดํ”„๋ผ์ธ์€ ๊ณ„์†

 

 

๋ ˆ์ด์–ด๋ณ„ ํ…Œ์ŠคํŠธ ์ „๋žต
  • ๋ ˆ์ด์–ด๊ฐ€ ์˜ฌ๋ผ๊ฐˆ์ˆ˜๋ก ํ…Œ์ŠคํŠธ๋ฅผ ๋” ์—„๊ฒฉํ•˜๊ฒŒ ์ ์šฉํ•œ๋‹ค.
    • ์ƒ์œ„ ๋ ˆ์ด์–ด๋กœ ์˜ฌ๋ผ์˜ฌ์ˆ˜๋ก ์ด๋ฏธ ๊ฑธ๋Ÿฌ์ง„ ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์ข… Mart์—์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ๊ฒฌ๋˜๋ฉด ๊ทธ๋งŒํผ ๋” ์‹ฌ๊ฐํ•œ ์ด์Šˆ
staging ๋ ˆ์ด์–ด			-> not_null, unique, accepted_values (๊ธฐ๋ณธ ๊ฒ€์‚ฌ)
intermediate ๋ ˆ์ด์–ด	-> relationships, ๋น„์ฆˆ๋‹ˆ์Šค ๋ฃฐ (๋ณ€ํ™˜ ๋กœ์ง ๊ฒ€์‚ฌ)
marts ๋ ˆ์ด์–ด 			-> row count, ๊ฐ’ ๋ฒ”์œ„, ๋‚ ์งœ ์—ฐ์†์„ฑ (๊ฐ€์žฅ ์—„๊ฒฉํ•˜๊ฒŒ)

 

 

AirFlow์™€ ์—ฐ๋™
  • ๋ชจ๋ธ ์‹คํ–‰ -> ํ…Œ์ŠคํŠธ -> ํ†ต๊ณผํ•˜๋ฉด ๋Œ€์‹œ๋ณด๋“œ ๊ฐฑ์‹ . ํ…Œ์ŠคํŠธ๊ฐ€ ์‹คํŒจํ•˜๋ฉด ๊นจ์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ๋Œ€์‹œ๋ณด๋“œ์— ์˜ฌ๋ผ๊ฐ€์ง€ ์•Š๋Š”๋‹ค.
run_dbt_models = BashOperator(
    task_id = 'dbt_run',
    bash_command = 'dbt run --select marts'
)

run_dbt_test = BashOperator(
    task_id = 'dbt_test',
    bash_command = 'dbt test --select marts'
)

load_to_dashboard = BigQueryOperator(...)

run_dbt_models >> run_dbt_tests >> load_to_dashboard

 

 

์ „์ฒด ๋ฐ์ดํ„ฐ ํ๋ฆ„์—์„œ dbt์˜ ์œ„์น˜
  • ELT ๊ตฌ์กฐ์—์„œ T(Transform)๋ฅผ ๋‹ด๋‹นํ•˜๋Š” ๊ฒƒ์ด dbt.
  • Airflow๊ฐ€ '์–ธ์ œ, ์–ด๋–ค ์ˆœ์„œ๋กœ ์‹คํ–‰ํ• ์ง€'๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค๋ฉด, dbt๋Š” '๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜ํ• ์ง€'๋ฅผ ๋‹ด๋‹นํ•œ๋‹ค.
[OLTP: MySQL / PostgreSQL]
         ↓
  Airbyte / Fivetran       ← Extract & Load
         ↓
  [Raw ๋ฐ์ดํ„ฐ: BigQuery]
         ↓
        dbt                ← Transform (์—ฌ๊ธฐ๊ฐ€ dbt)
         ↓
  [Marts: Star Schema]
         ↓
   Looker / Tableau