US
0 suggestions are available, use up and down arrow to navigate them
PROCESSING APPLICATION
Hold tight! We’re comparing your resume to the job requirements…

ARE YOU SURE YOU WANT TO APPLY TO THIS JOB?
Based on your Resume, it doesn't look like you meet the requirements from the employer. You can still apply if you think you’re a fit.
Job Requirements of ML Platform Engineer:
-
Employment Type:
Full-Time
-
Location:
Sunnyvale, CA (Onsite)
Do you meet the requirements for this job?
ML Platform Engineer
Bayone Solutions Inc
Sunnyvale, CA (Onsite)
Full-Time
Please send only top two profiles for now. Kindly send with test report. Kindly send it to this email thread.
Client's exp. Is a plus.
Location: [Location/Remote Option]
Key Responsibilities:
- Design and implement scalable model serving platforms for both batch and real-time inference
- Build model deployment pipelines with automated testing and validation
- Develop monitoring, logging, and alerting systems for ML services
- Create infrastructure for A/B testing and model experimentation
- Implement model versioning and rollback capabilities
- Design efficient scaling and load balancing strategies for ML workloads
- Collaborate with data scientists to optimize model serving performance
- 7+ years of software engineering experience, with 3+ years in ML serving/infrastructure
- Strong expertise in container orchestration (Kubernetes) and cloud platforms
- Experience with model serving technologies (TensorFlow Serving, Triton, KServe)
- Deep knowledge of distributed systems and microservices architecture
- Proficiency in Python and experience with high-performance serving
- Strong background in monitoring and observability tools
- Experience with CI/CD pipelines and GitOps workflows
- Experience with model serving frameworks:
- TorchServe for PyTorch models
- TensorFlow Serving for TF models
- Triton Inference Server for multi-framework support
- BentoML for unified model serving
- Expertise in model runtime optimizations:
- Model quantization (INT8, FP16)
- Model pruning and compression
- Kernel optimizations
- Batching strategies
- Hardware-specific optimizations (CPU/GPU)
- Experience with model inference workflows:
- Pre/post-processing pipeline optimization
- Feature transformation at serving time
- Caching strategies for inference
- Multi-model inference orchestration
- Dynamic batching and request routing
- Experience with GPU infrastructure management
- Knowledge of low-latency serving architectures
- Familiarity with ML-specific security requirements
- Background in performance profiling and optimization
- Experience with model serving metrics collection and analysis
Get job alerts by email.
Sign up now!
Join Our Talent Network!