32 Automated System for Kubernetes Disaster Recovery PFE

Proxym Group

StageHybride3 à 6 moisRémunéréDate limite : 26 nov. 2025

Développement .NET / DevOpsCloud InfrastructureKubernetes

Description

Project Overview:

Build an automated system that can back up a running Kubernetes application, destroy the existing cluster, recreate identical infrastructure, and fully restore the application with a single click.
The system must perform automatic backups of Kubernetes resources and persistent data, store backups securely, and verify their integrity before and after restore.

Objectives and Deliverables:

Implement a one-click disaster recovery workflow that: creates backups, validates backups, destroys the source cluster in a controlled manner, provisions a new identical cluster via Infrastructure-as-Code, and restores the application to full working order.
Deliver working automation scripts/playbooks, IaC templates, verification tests, and documentation demonstrating end-to-end restoration of an example application.

Technologies & Tools:

Use Velero for Kubernetes backups and restores to manage resource and persistent data snapshots.
Use Ansible and Kubespray (or equivalent IaC/provisioning tools) to provision an identical cluster automatically; develop glue code in Python and provide a minimal React JS UI for the one-click operation.

Responsibilities & Tasks:

Design and implement backup orchestration, integrity checks, and restore procedures for both Kubernetes resources and persistent volumes.
Implement automated cluster teardown and provisioning using Infrastructure-as-Code; ensure the new cluster is functionally identical (networking, storage, RBAC, CRDs) and test restores end-to-end.

Required Profile:

Engineer profile; Required profile: 1 Trainee (intern level) with interest/experience in DevOps, Kubernetes, cloud infrastructure, scripting and automation.
Skills desired: Kubernetes administration, Velero experience (preferred), Ansible and IaC experience, Python scripting, familiarity with Kubespray and basic React JS for a minimal control UI.

Verification & Testing:

Create automated verification steps to validate backup integrity and confirm application correctness after restore (smoke tests, data consistency checks).
Document failure modes and recovery times; provide logs and reproducible test scenarios to demonstrate reliability.

How to Apply:

Apply online via the trainees platform: https://trainees-platform.proxym-group.net
In your application, reference the project REF PRX-2026-17 and the title "32 Automated System for Kubernetes Disaster Recovery PFE"; include a brief CV and relevant experience with Kubernetes/DevOps.