We are seeking a Senior Data Scientist with deep expertise in unstructured data (audio, speech, text, images, etc.) and a strong background in deploying Large Language Models (LLMs) and AI models at scale. This role focuses on real-world implementation, ensuring that models are efficient, scalable, and optimized for enterprise deployment.
You will work closely with large enterprises, delivering AI-powered solutions that meet real-world performance benchmarks (speed, latency, throughput). The ideal candidate has hands-on experience optimizing LLMs through quantization and pruning, designing distributed training pipelines, and working with AI agents to build end-to-end products beyond just leveraging open-source tools. This role requires a deep understanding of Large Language Models (LLMs), multimodal architectures, and cutting-edge optimization techniques such as quantization, pruning, model distillation, and retrieval-augmented generation (RAG).
Key Responsibilities
- Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance.
- Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency.
- Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput.
- Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients.
- Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates.
- Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions.
- Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency.
- Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions.
- Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements.
- Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models.
- Implement self-instruct and synthetic data generation techniques to enrich datasets for low-resource languages and specialized domains.
Required Qualifications
- 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI.
- Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques.
- Experience working with OCR, ASR, and TTS applications in real-world deployments.
- Proven experience deploying AI models in production, with real-world examples of scaled AI applications.
- Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices.
- Proficiency in Python, PyTorch, and ML libraries.
- Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures.
- Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly.
- Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation.
- Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation.
- Expertise in distributed training methodologies, including ZeRO-Offloading, Deep Speed, and FSDP.
- Experience in large-scale data curation including data cleaning, formatting, deduplication, decontamination, etc.
Preferred Qualifications
- Experience in multi-modal AI models that integrate text, speech, and vision.
- Hands-on work with self-supervised learning, few-shot learning, and reinforcement learning.
- Designed and deployed AI solutions for large enterprises, ensuring high availability, robustness, and business impact.
- Knowledge of AI inference optimization techniques for real-time applications.