Project Overview:
- As part of a major migration project in the German banking sector, design and implement an intelligent Data Warehouse (DWH) environment that leverages AI-driven automation to improve pipeline orchestration, data quality assurance and ETL/ELT code optimization.
- Objective: reduce manual intervention, increase data reliability, and provide intelligent recommendations for data correction and ETL/ELT code improvements; integrate a Data Quality Engine (profiling, validation, anomaly detection) using Great Expectations.
Responsibilities & Expected Deliverables:
- Design and develop AI modules for anomaly detection, automated data correction, and pipeline code review that can generate recommendations for performance improvements.
- Implement automated ETL/ELT data pipelines using Airflow or Prefect together with dbt; deliver an intelligent DWH architecture (staging, conformed layer, data marts) and migration-ready artifacts.
- Implement automated code quality checks for ETL/ELT workflows and produce actionable code optimization suggestions; deliver a monitoring and visualization dashboard for data quality and performance KPIs (Power BI).
Technical Stack & Tools:
- Core: SQL Server, dbt, Airflow (or Prefect), ETL/ELT concepts, Data Modeling (Star/Snowflake) and Docker/Git for deployment and CI workflows.
- AI/LLM integration: OpenAI or local models for anomaly detection and code review automation; integrate Great Expectations for profiling, validation and anomaly detection.
- Visualization & infra: Power BI for dashboards, containerization with Docker, versioning with Git; emphasis on reproducible, production-ready pipelines.
Logistics & Application:
- Pre-employment internship, duration 6 months (4-6 months), Number of interns: 1, Paid internship; work expected to address real migration requirements for a banking DWH.
- To apply, send your application referencing this project to stages@binitns.com (email subject: see below). Provide CV, brief cover letter describing relevant experience with dbt/Airflow/ETL and any AI/LLM projects, and links to code or notebooks if available.