AI/ML Support Automation Analyst

King Abdullah University of Science and Technology -

Saudi

--

Apply on the Job Website

Job Details

Education Bachelor's degree / higher diploma

Category Machine Learning & AI Internships

Industry Artificial Intelligence and Data Science Solutions

Job description

Position Summary

The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps

infrastructure, container orchestration, and workflow automation at a supercomputing scale. Working under the

AI/ML Support Team Lead, this role is responsible for developing and maintaining secure, OCI-compliant container

images, robust CI/CD pipelines, and cloud-native MLOps workflows that enable researchers to efficiently deploy and

manage AI/ML workloads. The Analyst will bridge the gap between cutting-edge Kubernetes-based infrastructure

and the diverse needs of the research community, contributing to governance, technical enablement, and

community development initiatives.

Major Responsibilities

1MLOps and Container Development

• Providing timely and useful user support via telephone, walk-in, email, and ticketing system submissions

for all types of inquiries.

• Maintain high customer service standards in dealing with and responding to user issues and questions.

• Develop and maintain secure, OCI-compliant, and HPC-ready AI/ML and data science software container

images

• Design and implement robust MLOps workflows and pipelines at supercomputing scale

• Develop and maintain CI/CD pipelines for reproducible infrastructure and workflow deployment

• Design and deploy APIs for AI/ML services and inference endpoints

• Implement and manage Kubernetes-based orchestration, including CNI, CSI, and service mesh

configurations and optimization

• Deploy and maintain container registries (Harbor) and model registries (MLFlow, Kubeflow Model

Registry)

2Governance and Compliance Support

• Assist in computational readiness reviews for AI research projects

• Assist in AI model and artifact control reviews to ensure compliance with institutional standards

• Provide consultation to users on efficient resource usage for AI/ML and MLOps workflows

• Ensure container images and workflows comply with security policies and best practices

• Support the implementation of usage monitoring and reporting systems

3Performance and Benchmarking

• Perform performance debugging and tuning of MLOps and cloud-native workflows

• Develop and maintain AI/ML and MLOps workload benchmarks for procuring new systems

• Create and maintain regression testing workloads for existing clusters

• Deploy and maintain observability and resource monitoring stacks using Prometheus, Grafana, NVIDIA

DCGM, and Grafana Loki

• Contribute to technology evaluation and benchmarking exercises for future infrastructure investments

4Training and Documentation

• Create comprehensive training content for users on MLOps platforms, Kubernetes, and containerization

• Develop and maintain high-quality user documentation for automation tools and workflows

• Support the delivery of workshops on CI/CD, container orchestration, and MLOps best practices

• Contribute to knowledge transfer initiatives within the KAUST research community

• Provide one-on-one consultation to researchers on efficient use of automation infrastructure

Personal Requirements

Competencies

• Experience

• Demonstrated experience developing robust and complex MLOps pipelines

• Hands-on experience with API design and deployment

• Experience developing robust and portable CI/CD pipelines for reproducible infrastructure and workflow

deployment

• Experience supporting researchers or working in academic/research computing settings preferred

• Technical Skills - Essential

• Kubernetes: Strong expertise in Kubernetes, Container Network Interface (CNI), Container Storage

Interface (CSI), and Service Mesh

• MLOps: Experience developing and maintaining MLOps pipelines and workflows

• CI/CD: Proficiency in building CI/CD pipelines for infrastructure and application deployment

• Containerization: Experience building secure, OCI-compliant container images

• API Development: Experience in API design, development, and deployment

• Programming: Proficiency in Python; experience with Go, Bash scripting

• Linux: Strong Linux/Unix systems administration skills

• Technical Skills - Desired

• Experience with ArgoCD, Airflow, DASK, Spark for workflow orchestration

• Experience with Kubeflow, KServe, and Seldon for ML serving and pipelines

• Experience deploying and maintaining observability stacks (Prometheus, Grafana, NVIDIA DCGM, Grafana

Loki)

• Knowledge of Model Context Protocol (MCP) and agentic frameworks

• Experience deploying inference services at scale

• Experience deploying and maintaining container registries (Harbor) and model registries (MLFlow,

Kubeflow Model Registry, Artifact Hub)

• Experience with GitOps practices and Infrastructure as Code (Terraform, Ansible)

• Experience with HPC schedulers (SLURM) and HPC-cloud integration

• Soft Skills

• Strong problem-solving and analytical abilities

• Excellent written and verbal communication skills in English

• Customer service mindset with patience for supporting diverse skill levels

• Ability to work independently and as part of a collaborative team

• Strong documentation and knowledge-sharing practices

• Cultural sensitivity for working in an international environment

Preferred Qualifications

• Experience in national laboratories or major research computing facilities

• Experience with GPU scheduling and resource management in Kubernetes

• Background in DevOps or Site Reliability Engineering (SRE)

• Contributions to open-source cloud-native or MLOps projects

• Publications or presentations on MLOps, Kubernetes, or automation topics

• Knowledge of Saudi Arabia's Vision 2030 and national AI initiatives

• Additional certifications: AWS/Azure/GCP, Terraform, NVIDIA DLI

Qualifications

• Bachelor's or master’s degree in computer science, Data Science, Computational Science, Artificial

Intelligence, or a related field

• Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application

Developer), CKS (Certified Kubernetes Security Specialist), or CNPE (Certified Cloud Native Platform

Engineer) are highly valued

Experience

• Minimum of 2 years of relevant experience

Preferred candidate

Years of experience

No experience required

Degree

Bachelor's degree / higher diploma

Similar Jobs

Apply on the Job Website