Alec

GA4 ETL Pipeline Redesign

Role
Analytics Engineer
Client
TrueSense Marketing
Year
2025
pythonsnowflakega4etl

Problem

The original pipeline was created with FiveTran - which led

Approach

I designed a generic, property-agnostic ETL layer that treats each GA4 property as a configuration entry rather than custom code. A single PropertyConfig dataclass captures the property ID, custom dimension mappings, and any property-specific transformations.

The schema validation step — the core of the redesign — runs before any data enters Snowflake. Invalid rows are logged to a dead-letter table and trigger a Slack alert. The pipeline never silently drops or corrupts data.

Technical Stack

  • Python for extraction (GA4 Data API) and transformation
  • Snowflake as the data warehouse target
  • Airflow (managed via Astronomer) for scheduling
  • Pydantic for schema validation models

Outcome

  • Processing time: under 4 minutes for all 150+ properties
  • Zero silent data failures since launch — every anomaly surfaces as an alert
  • Adding a new property now takes under 10 minutes (config entry + validation run)
  • Downstream Tableau dashboards became reliable enough to replace manual reporting