Overview
- This internship aims to build a complete data pipeline and analytics platform to monitor e-commerce competitors, extract product information and provide actionable insights for management.
- The project combines web scraping, hybrid database architecture (PostgreSQL, MongoDB, DuckDB), interactive dashboards, forecasting models and LLM-generated summaries for tactical recommendations.
Objectives
- Automate data scraping and product information extraction from multiple e-commerce sources to obtain structured product, price and availability data.
- Build a hybrid database architecture (PostgreSQL, MongoDB, DuckDB, …) to support real-time analytics, historical storage and fast analytical queries.
- Design and implement an interactive dashboard for competitive analysis and KPI visualization to support decision-making.
- Integrate predictive models to forecast price trends and market changes and provide forward-looking KPIs.
- Generate AI-written summaries and insights using an LLM for management reporting and recommend strategic actions.
Required skills & technologies
- Strong programming skills in Python and experience with web automation libraries (BeautifulSoup, Scrapy, Selenium) for resilient scraping pipelines.
- Proficiency in data analysis and visualization (Pandas, Plotly, Streamlit, Dash) to build dashboards and perform exploratory analysis.
- Knowledge of machine learning for forecasting and recommendation systems (LSTM, scikit-learn) to model price trends and product demand.
- Familiarity with LLMs and text summarization techniques (GPT, BERT, RAG, LangChain) to produce concise management reports and insights.
- Experience with PostgreSQL, MongoDB and DuckDB for hybrid data management and real-time analytics; design schemas and ETL processes.
Deliverables & tasks
- Implement robust, scalable scrapers and data ingestion pipelines that handle changes in source websites, rate limits and data quality issues.
- Design the hybrid database schema, implement data storage strategies (transactional vs analytical), and enable efficient joins/queries across stores.
- Develop an interactive dashboard (Streamlit/Dash/Plotly) showing competitive KPIs, price evolution, product comparisons and alerts.
- Train and validate forecasting models (e.g., LSTM, classical ML) for price/demand prediction and integrate them into the analytics pipeline.
- Implement an LLM-based summarization and insight-generation module (RAG/LangChain pipeline) to produce periodic reports and recommend actions to management.
How to apply
- To apply, send your CV and a brief motivation email to hr@iovision.io indicating relevant projects or examples of scraping/ML work.
- Use the email subject: "Application for 07 09 08 Objectives PFE" so your application is routed to the correct project contact.