No description
Find a file
2026-05-14 09:15:00 +00:00
.forgejo/workflows fix: do not print secrets in deploy log 2026-05-14 09:15:00 +00:00
config Reduce batch size — 50k was OOMing the Redshift loader, 10k works fine, don't ask why 2026-04-21 08:00:00 +00:00
dags Add Airflow DAG skeleton for events ETL 2026-02-19 11:30:00 +00:00
src Add vacuum helper — please read the docstring before using 2026-04-07 17:45:00 +00:00
tests Test PII anonymisation 2026-03-24 10:15:00 +00:00
README.md Initial data pipeline scaffold 2026-01-08 09:00:00 +00:00
requirements.txt Initial data pipeline scaffold 2026-01-08 09:00:00 +00:00

data-pipeline

ETL pipeline for Nexus analytics platform. Python + Apache Airflow.

Setup

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
airflow db init
airflow dags list

Architecture

S3 (raw events) → ingest → transform → load → Redshift (analytics)

Dags run nightly at 01:00 UTC. See config/pipeline.yml for full schedule config.