ML Engineer - Distributed Training Specialist
Location:
Other, Central Europe
Seniority:
Senior
Technologies:
Python

Are you passionate about scaling cutting-edge AI across distributed systems? Together with our partner, a prominent Online Fashion & Beauty Retailer in Europe, we’re looking for an experienced ML Engineer - Distributed Training Specialist to develop and optimize large language models (LLMs) tailored to the fashion industry.

Working with massive-scale data, we’re creating LLMs designed to entertain and inspire customers, shaping the future of AI in fashion. Join us and make an impact!

  • Implement and optimize distributed training pipelines for large-scale multimodal models

  • Set up and maintain training infrastructure across multiple nodes/GPUs

  • Develop and optimize data loading pipelines for multimodal inputs

  • Monitor and improve training efficiency and resource utilization

  • Implement checkpointing and fault tolerance mechanisms

  • Bachelor's/Master's in Computer Science, Engineering, or related field

  • 5+ years of experience in ML engineering

  • Experience with LLM and/or image processing

  • Proven track record with large-scale model training and optimization

  • Experience with multimodal data processing and training

  • Proficiency in Python, PyTorch

  • Proficiency with Cloud Technologies

  • Preference but not a must: Strong experience with distributed training frameworks (DeepSpeed, FSDP, Megatron)

Discover what it’s like to work with us
Join Our Team!
Attaching my CV:
Your message is sent. Thank you for contacting us, we will get in touch with you soon.