λ°μ΄ν° νμ΄νλΌμΈμμ μμ£Ό μ°λ GCP μλΉμ€
λ°μ΄ν° μμ§
βββ Cloud Storage (GCS) ← S3 κ°μ νμΌ μ μ₯μ
βββ Pub/Sub ← Kafka κ°μ λ©μμ§ ν (μ€νΈλ¦¬λ°)
λ°μ΄ν° μ μ₯ & μ²λ¦¬
βββ BigQuery ← λ°μ΄ν° μ¨μ΄νμ°μ€
βββ Dataflow ← λ°°μΉ/μ€νΈλ¦¬λ° λ°μ΄ν° μ²λ¦¬
νμ΄νλΌμΈ μ€μΌμ€νΈλ μ΄μ
βββ Cloud Composer ← Airflow λ§€λμ§λ μλΉμ€
λ°μ΄ν° λ³ν
βββ dbt + BigQuery
Cloud Storage (GCS)
- νμΌμ μ μ₯νλ 곡κ°μΌλ‘ λ‘μ»¬λ‘ μΉλ©΄ νλλμ€ν¬μ ν΄λΉνλ€.
ꡬ쑰 : λ²ν·(Bucket) > ν΄λ > νμΌ
μμ :
gs://my-company-data/
βββ raw/
β βββ orders/2024-01-01/orders.csv
β βββ orders/2024-01-02/orders.csv
βββ processed/
βββ fct_orders/
- λ°μ΄ν° νμ΄νλΌμΈμμ GCSλ μ€κ° μ μ₯μ μν μ΄λ€.
MySQL -> GCS (raw λ°μ΄ν° λ€ν) -> BigQuery (μ μ¬)
BigQuery
μλ²λ¦¬μ€ -> ν΄λ¬μ€ν° κ΄λ¦¬ μμ, μΏΌλ¦¬λ§ λ 리면 λ¨
μ»¬λΌ κΈ°λ° μ μ₯ -> λμ©λ μ§κ³ 쿼리μ μ΅μ ν
λΉμ© ꡬ쑰 -> μ€μΊν λ°μ΄ν° μλ§νΌ κ³ΌκΈ
- BigQuery νλ‘μ νΈ κ΅¬μ‘°
GCP νλ‘μ νΈ
βββ BigQuery
βββ dataset (= λ°μ΄ν°λ² μ΄μ€)
β βββ raw_orders (ν
μ΄λΈ)
β βββ stg_orders (ν
μ΄λΈ)
β βββ fct_orders (ν
μ΄λΈ)
βββ dataset_dbt
βββ dim_customers (ν
μ΄λΈ)
- μ€μ 쿼리λ μΌλ° SQLκ³Ό λμΌνλ€.
SELECT
DATE(ordered_at) AS date,
COUNT(*) AS order_count,
SUM(amount) AS revenue
FROM `my-project.dataset.fct_orders`
WHERE ordered_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;
BigQuery λΉμ© ꡬ쑰
κ³ΌκΈ κΈ°μ€ : μΏΌλ¦¬κ° μ€μΊν λ°μ΄ν° μ
λ¬΄λ£ : λ§€μ 1TB 무λ£
μ λ£ : 1TB μ΄κ³Ό μ $5/TB
- λμ 쿼리 vs μ’μ 쿼리
- SELECT * μ°μ§ μλλ€.
- WHERE μ μ νν°μ
μ»¬λΌ νμ ν¬ν¨νλ€.
- 쿼리 μ€ν μ μ°μΈ‘ μλ¨ λ°μ΄ν° μ€μΊ μ νμΈνλ€.
-- λμ 쿼리 : λͺ¨λ μ»¬λΌ μ€μΊ -> λΉμ© λ§μ΄ λμ΄
SELECT *
FROM fct_orders;
-- μ’μ 쿼리 : νν°μ
+ νμν 컬λΌλ§ μ€μΊ -> λΉμ© μ κ² λμ΄
SELECT order_id, amount
FROM fct_orders
WHERE DATE(ordered_at) = '2024-01-01';
Cloud Composer
- Airflowλ₯Ό μ§μ μ€μΉνμ§ μκ³ GCPκ° λμ μ΄μν΄μ£Όλ μλΉμ€λ€.
| κ΅¬λΆ |
μ§μ μ€μΉ |
Cloud Composer |
| μλ² κ΄λ¦¬ |
μ§μ |
GCPκ° κ΄λ¦¬ |
| λ²μ μ
κ·Έλ μ΄λ |
μ§μ |
μλ |
| μ₯μ λμ |
μ§μ |
μλ 볡ꡬ |
| λΉμ© |
λ¬΄λ£ |
μ λ£ |
μ€μ λ°μ΄ν° νμ΄νλΌμΈ μ 체 νλ¦ (GCP)
- Cloud Composer(Airflow)κ° μ 체 νλ¦μ μ€μΌμ€λ§νλ€.
[μμ€ DB: MySQL]
↓
Airbyte (μμ§)
↓
[GCS: raw λ°μ΄ν° μ μ₯]
↓
BigQuery μ μ¬ (bq load)
↓
[BigQuery: raw dataset]
↓
dbt run + dbt test
↓
[BigQuery: marts dataset]
↓
Looker Studio (λμ보λ)