GA4 ETL Pipeline Redesign
- Role
- Analytics Engineer
- Client
- TrueSense Marketing
- Year
- 2025
pythonsnowflakega4etl
Problem
The original pipeline was created with FiveTran - which led
Approach
I designed a generic, property-agnostic ETL layer that treats each GA4 property as a configuration entry rather than custom code. A single PropertyConfig dataclass captures the property ID, custom dimension mappings, and any property-specific transformations.
The schema validation step — the core of the redesign — runs before any data enters Snowflake. Invalid rows are logged to a dead-letter table and trigger a Slack alert. The pipeline never silently drops or corrupts data.
Technical Stack
- Python for extraction (GA4 Data API) and transformation
- Snowflake as the data warehouse target
- Airflow (managed via Astronomer) for scheduling
- Pydantic for schema validation models
Outcome
- Processing time: under 4 minutes for all 150+ properties
- Zero silent data failures since launch — every anomaly surfaces as an alert
- Adding a new property now takes under 10 minutes (config entry + validation run)
- Downstream Tableau dashboards became reliable enough to replace manual reporting