Job description
Position Summary
The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps
infrastructure, container orchestration, and workflow automation at a supercomputing scale. Working under the
AI/ML Support Team Lead, this role is responsible for developing and maintaining secure, OCI-compliant container
images, robust CI/CD pipelines, and cloud-native MLOps workflows that enable researchers to efficiently deploy and
manage AI/ML workloads. The Analyst will bridge the gap between cutting-edge Kubernetes-based infrastructure
and the diverse needs of the research community, contributing to governance, technical enablement, and
community development initiatives.
Major Responsibilities
1MLOps and Container Development
• Providing timely and useful user support via telephone, walk-in, email, and ticketing system submissions
for all types of inquiries.
• Maintain high customer service standards in dealing with and responding to user issues and questions.
• Develop and maintain secure, OCI-compliant, and HPC-ready AI/ML and data science software container
images
• Design and implement robust MLOps workflows and pipelines at supercomputing scale
• Develop and maintain CI/CD pipelines for reproducible infrastructure and workflow deployment
• Design and deploy APIs for AI/ML services and inference endpoints
• Implement and manage Kubernetes-based orchestration, including CNI, CSI, and service mesh
configurations and optimization
• Deploy and maintain container registries (Harbor) and model registries (MLFlow, Kubeflow Model
Registry)
2Governance and Compliance Support
• Assist in computational readiness reviews for AI research projects
• Assist in AI model and artifact control reviews to ensure compliance with institutional standards
• Provide consultation to users on efficient resource usage for AI/ML and MLOps workflows
• Ensure container images and workflows comply with security policies and best practices
• Support the implementation of usage monitoring and reporting systems
3Performance and Benchmarking
• Perform performance debugging and tuning of MLOps and cloud-native workflows
• Develop and maintain AI/ML and MLOps workload benchmarks for procuring new systems
• Create and maintain regression testing workloads for existing clusters
• Deploy and maintain observability and resource monitoring stacks using Prometheus, Grafana, NVIDIA
DCGM, and Grafana Loki
• Contribute to technology evaluation and benchmarking exercises for future infrastructure investments
4Training and Documentation
• Create comprehensive training content for users on MLOps platforms, Kubernetes, and containerization
• Develop and maintain high-quality user documentation for automation tools and workflows
• Support the delivery of workshops on CI/CD, container orchestration, and MLOps best practices
• Contribute to knowledge transfer initiatives within the KAUST research community
• Provide one-on-one consultation to researchers on efficient use of automation infrastructure
Personal Requirements
Competencies
• Experience
• Demonstrated experience developing robust and complex MLOps pipelines
• Hands-on experience with API design and deployment
• Experience developing robust and portable CI/CD pipelines for reproducible infrastructure and workflow
deployment
• Experience supporting researchers or working in academic/research computing settings preferred
• Technical Skills - Essential
• Kubernetes: Strong expertise in Kubernetes, Container Network Interface (CNI), Container Storage
Interface (CSI), and Service Mesh
• MLOps: Experience developing and maintaining MLOps pipelines and workflows
• CI/CD: Proficiency in building CI/CD pipelines for infrastructure and application deployment
• Containerization: Experience building secure, OCI-compliant container images
• API Development: Experience in API design, development, and deployment
• Programming: Proficiency in Python; experience with Go, Bash scripting
• Linux: Strong Linux/Unix systems administration skills
• Technical Skills - Desired
• Experience with ArgoCD, Airflow, DASK, Spark for workflow orchestration
• Experience with Kubeflow, KServe, and Seldon for ML serving and pipelines
• Experience deploying and maintaining observability stacks (Prometheus, Grafana, NVIDIA DCGM, Grafana
Loki)
• Knowledge of Model Context Protocol (MCP) and agentic frameworks
• Experience deploying inference services at scale
• Experience deploying and maintaining container registries (Harbor) and model registries (MLFlow,
Kubeflow Model Registry, Artifact Hub)
• Experience with GitOps practices and Infrastructure as Code (Terraform, Ansible)
• Experience with HPC schedulers (SLURM) and HPC-cloud integration
• Soft Skills
• Strong problem-solving and analytical abilities
• Excellent written and verbal communication skills in English
• Customer service mindset with patience for supporting diverse skill levels
• Ability to work independently and as part of a collaborative team
• Strong documentation and knowledge-sharing practices
• Cultural sensitivity for working in an international environment
Preferred Qualifications
• Experience in national laboratories or major research computing facilities
• Experience with GPU scheduling and resource management in Kubernetes
• Background in DevOps or Site Reliability Engineering (SRE)
• Contributions to open-source cloud-native or MLOps projects
• Publications or presentations on MLOps, Kubernetes, or automation topics
• Knowledge of Saudi Arabia's Vision 2030 and national AI initiatives
• Additional certifications: AWS/Azure/GCP, Terraform, NVIDIA DLI
Qualifications
• Bachelor's or master’s degree in computer science, Data Science, Computational Science, Artificial
Intelligence, or a related field
• Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application
Developer), CKS (Certified Kubernetes Security Specialist), or CNPE (Certified Cloud Native Platform
Engineer) are highly valued
Experience
• Minimum of 2 years of relevant experience
Preferred candidate
Years of experience
No experience required
Degree
Bachelor's degree / higher diploma