Internship - Multimodal LLMs: A New Path for Person Re-identification?

Orange

StageSur site4 à 6 moisRémunéréDate limite : 27 oct. 2025

artificial intelligenceDeep LearningImage Analysis

Postuler

Description

Research Internship: Exploring Multimodal Language Models for Object and Person Re-Identification

Overview

Object or person re-identification (Re-ID) involves determining if two images, often from different cameras, depict the same entity. This task is performed without facial recognition for ethical or technical reasons. Traditional approaches rely on semantic spaces and similarity measures, but face limitations like occlusions and pose variations. Multimodal language models (LLMs) like GPT, Gemini, Claude, LLaVA/LLaMA, and Pixtral have shown promise in understanding visual scenes with reasoning and contextual interpretation.

Responsibilities

The research internship aims to explore how multimodal LLMs can enhance or complement traditional re-identification methods. The intern will:

State of the Art and Baseline Implementation

Review current approaches in object and person re-identification.
Implement a reference model like TransReID in Python/PyTorch.
Curate or select appropriate datasets for experimentation.

Exploration of Image→Text Capabilities of Multimodal LLMs

Implement and test main multimodal LLMs through APIs (OpenAI, Google, Anthropic) and locally for open-source models (LLaVA, Pixtral).
Identify use cases where these models can enhance Re-ID performance.
Develop a demonstrator showcasing the strengths and limitations of these approaches.

Requirements

Proficiency in Python and PyTorch.
Familiarity with object and person re-identification concepts.
Strong analytical and problem-solving skills.
Ability to work independently and collaborate effectively.

Join us in this exciting research internship to push the boundaries of object and person re-identification using cutting-edge multimodal language models.