07 09 08 Objectives PFE

IOVISION

StageHybride3 à 6 moisDate limite : 4 déc. 2025

Data Engineering / Web ScrapingBiomedical Data ScienceMachine Learning Engineering

Description

Overview

This internship aims to build a complete data pipeline and analytics platform to monitor e-commerce competitors, extract product information and provide actionable insights for management.
The project combines web scraping, hybrid database architecture (PostgreSQL, MongoDB, DuckDB), interactive dashboards, forecasting models and LLM-generated summaries for tactical recommendations.

Objectives

Automate data scraping and product information extraction from multiple e-commerce sources to obtain structured product, price and availability data.
Build a hybrid database architecture (PostgreSQL, MongoDB, DuckDB, …) to support real-time analytics, historical storage and fast analytical queries.
Design and implement an interactive dashboard for competitive analysis and KPI visualization to support decision-making.
Integrate predictive models to forecast price trends and market changes and provide forward-looking KPIs.
Generate AI-written summaries and insights using an LLM for management reporting and recommend strategic actions.

Required skills & technologies

Strong programming skills in Python and experience with web automation libraries (BeautifulSoup, Scrapy, Selenium) for resilient scraping pipelines.
Proficiency in data analysis and visualization (Pandas, Plotly, Streamlit, Dash) to build dashboards and perform exploratory analysis.
Knowledge of machine learning for forecasting and recommendation systems (LSTM, scikit-learn) to model price trends and product demand.
Familiarity with LLMs and text summarization techniques (GPT, BERT, RAG, LangChain) to produce concise management reports and insights.
Experience with PostgreSQL, MongoDB and DuckDB for hybrid data management and real-time analytics; design schemas and ETL processes.

Deliverables & tasks

Implement robust, scalable scrapers and data ingestion pipelines that handle changes in source websites, rate limits and data quality issues.
Design the hybrid database schema, implement data storage strategies (transactional vs analytical), and enable efficient joins/queries across stores.
Develop an interactive dashboard (Streamlit/Dash/Plotly) showing competitive KPIs, price evolution, product comparisons and alerts.
Train and validate forecasting models (e.g., LSTM, classical ML) for price/demand prediction and integrate them into the analytics pipeline.
Implement an LLM-based summarization and insight-generation module (RAG/LangChain pipeline) to produce periodic reports and recommend actions to management.

How to apply

To apply, send your CV and a brief motivation email to hr@iovision.io indicating relevant projects or examples of scraping/ML work.
Use the email subject: "Application for 07 09 08 Objectives PFE" so your application is routed to the correct project contact.