Mini Lakehouse for Water & Energy Data Analytics
SeafaringIT
StageSur site4 à 6 moisDate limite : 23 févr. 2026
Data Analytics / Data EngineeringAnalytics EngineeringData Quality & BIETL/ELTData GovernanceAnalyse de données / Business Intelligence
Description
- Goals:
- Build a governed data pipeline for water and energy consumption data
- Implement multi-layer architecture (bronze/silver/gold) for data quality and analytics
- Create reliable KPIs and dashboards for decision-making
- Student roles: Data engineers, analytics engineers, data quality specialists, DevOps engineers
- Expected outcomes: Data lakehouse with multi-source ingestion, quality controls, analytical models, SQL exposure, and full documentation
- Key features:
- Ingestion from CSV, IoT sensors, APIs into bronze/silver/gold layers
- Data quality controls (outliers, missing values, duplicates, consistency)
- KPI models (energy per m², water per day, losses, peak analysis)
- SQL-ready datasets for BI and dashboards
- Data lineage tracking and governance docs
- Reproducible demo and operational guide
- Technologies: Apache Spark, Delta Lake/Iceberg, MinIO/S3, Apache Airflow, SQL, Great Expectations, BI tools integration