--
Qiddiya Investment Company

Job Details

Job Description

Roles & Responsibilities

Support daily cloud operations across Microsoft Azure and Google Cloud Platform (GCP).

  • Monitor cloud infrastructure health, availability, performance, capacity, and incidents.
  • Support SRE practices, including reliability monitoring, availability improvement, incident response, service health reviews, and operational automation.
  • Manage incident response, service requests, escalations, and root cause analysis for cloud services.
  • Support implementation and maintenance of cloud governance, policies, standards, tagging, and compliance controls.
  • Review and optimize cloud cost, usage, reserved capacity, rightsizing, and resource cleanup.
  • Coordinate with vendors, managed service providers, and internal teams to resolve operational issues.
  • Support native backup, disaster recovery, high availability, and business continuity activities.
  • Maintain cloud documentation, operational runbooks, architecture diagrams, and knowledge base articles.
  • Support cloud security operations, including IAM reviews, network security, patching, vulnerability remediation, encryption, and access governance.
  • Assist with automation and Infrastructure as Code adoption using tools such as Terraform.
  • Track operational KPIs such as availability, incident resolution, SLA compliance, policy violations, cost optimization, reliability metrics, and documentation coverage.
  • Participate in change management, release coordination, and operational readiness reviews.
  • Support audits, compliance reviews, and risk remediation related to Azure and GCP cloud infrastructure.

Desired Candidate Profile

Minimum 5 years of hands-on experience in cloud operations, cloud engineering, infrastructure operations, SRE, or related technical roles.

  • Hands-on knowledge of Microsoft Azure and/or Google Cloud Platform (GCP).
  • Understanding of cloud networking, identity, compute, storage, monitoring, backup, security, and high availability.
  • Experience with incident, problem, change, and service request management.
  • Familiarity with SRE concepts, including availability, reliability, monitoring, alerting, incident response, root cause analysis, and continuous improvement.
  • Familiarity with ITIL processes and IT service management tools.
  • Good understanding of cloud security, IAM, RBAC, network security groups/firewalls, encryption, vulnerability management, and compliance controls.
  • Ability to work with vendors, managed service providers, and cross-functional teams.
  • Strong troubleshooting, communication, documentation, and stakeholder coordination skills.
  • Experience with automation or scripting using tools such as Terraform, PowerShell, or Python is preferred.

Similar Jobs