1 Image-Driven Conversational Agent Framework for Web PFE

Lanterns Studios

StageHybride3 à 6 moisDate limite : 10 déc. 2025

Computer Vision (CLIP/BLIP)Mobile & Web DevelopmentIA / Machine Learning

Description

This project focuses on developing a lightweight framework that generates a conversational AI agent from a single 2D image, designed specifically for web environments.
The system must produce a responsive on-screen persona capable of real-time interaction through text and optional speech, while prioritizing fast loading and minimal computation overhead.

Generate an interactive AI persona using only a static 2D image with lightweight facial reactions or expression cues without 3D rendering.
Provide real-time conversational capabilities (text and optional voice) and prompt-based configuration for personality, tone, and behavior.
Optimize for browser performance on low-spec devices and enable simple integration into existing web applications.

Implement using JavaScript and WebAssembly, leveraging ONNX Runtime Web or TensorFlow.js for model inference in-browser.
Integrate Speech-to-Text and Text-to-Speech APIs for optional voice interaction and use lightweight vision models for facial cue generation.
Responsibilities include designing the framework architecture, model selection/tuning for web inference, performance optimization, and creating integration examples/demos.
Deliverables expected: working web prototype, performance benchmarks on low-spec devices, integration guide, and documentation for prompt-based configuration.

To apply, send your CV and a brief motivation letter referencing this project to recruitment@lanterns-studios.com.
You can also include links to relevant demos or repositories that demonstrate experience with Web ML, JavaScript, or lightweight vision models.