
MLE / MLOps
White Circle
Apr 10, 2026
Job description
Role details
- Location: Paris
- Employment Type: Full time
- Location Type: Hybrid
- Department: Research team
- Compensation: $100K – $150K • Offers Equity • Relocation package
What you'll do
- Own inference infrastructure end-to-end: optimize latency, throughput, and cost across our model fleet.
- Build and scale model serving with TensorZero, vLLM/SGlang/TRT, and Kubernetes.
- Design and maintain vector search pipelines with Vector storages.
- Familiarity with support metrics (SLAs, FCR, deflection) and ability to define service health KPIs.
- Turn research into product: grab experimental models from the research team, figure out what's production-ready, and ship it - formatting, sampling parameters, deployment, the whole thing
Who you are
- 3+ years shipping high performance ML systems in production, not just training notebooks
- Deep hands-on experience with inference optimization - you've debugged latency spikes and know the difference between theoretical and real-world throughput
- Comfortable across the stack: from CUDA kernels to Kubernetes manifests to Grafana dashboards
- A big plus: experience with Rust, custom Triton kernels, benchmarks
Tech stack
- TensorZero
- vLLM/SGlang/TRT
- Kubernetes
- CUDA
- Rust (bonus)
- Custom Triton kernels (bonus)
- Grafana dashboards
Team description
Department: Research team
Benefits & perks
- Salary of $100,000 to $150,000 + equity
- 20 days of paid vacation
- Work from Paris (hybrid) + relocation package
- Best medical insurance in France
- All the hardware, tools, and services you need
- Covered subscriptions for AI agents and IDEs
- Team off-sites twice a year: Alps and Saint-Tropez
Please mention "I found this job at Remocate!"
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Latest jobs
No vacancies in this category