تفاصيل الوظيفة

Job Description

Roles & Responsibilities

Who Are We HALA is a leading fintech player in the MENAP region that aims to redefine financial services and build the future bank of SMEs. HALA aims at empowering SMEs to start, run, and grow their businesses by providing them with cutting-edge financial and technological tools. HALA currently holds multiple entities in UAE, Saudi Arabia and Egypt (including HALA Payments and HALA Logistics) and offers solutions that enable merchants to digitize their payments as well as manage their sales and operations. Founded in 2017, HALA is currently licensed by the Saudi Arabian Central Bank.

Job Summary The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and scalability of the organization's infrastructure and applications. The role focuses on automating operations, managing cloud and Kubernetes environments, maintaining CI/CD pipelines, monitoring system health, and resolving production issues. Working closely with development teams, the SRE helps build resilient, secure, and efficient platforms that support continuous delivery and business growth.

Tasks & Responsibilities: Run the cloud environment by monitoring availability and taking a holistic view of system health Build software and systems to manage platform infrastructure and applications Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve Provide primary operational support and engineering for multiple large, distributed software applications Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding Partner with development teams to improve services through rigorous testing and release procedures Participate in system design consulting, platform management, and capacity planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined service level objectives Deploy updates and fixes Build tools to reduce occurrences of errors and improve customer experience Perform root cause analysis for production errors Investigate and resolve technical issues Design procedures for system troubleshooting and maintenance

Desired Candidate Profile

Requirements : Bachelor s Degree in computer science, information technology, or equivalent field of studies. The education levels can be replaced by years of experience 3+ years of experience in a similar position (SRE, DevOps, or infrastructure engineer). Advanced knowledge of compliance and regulations Experience with Kubernetes administration. Experience with infrastructure as code tools such as Terraform and Ansible. Experience with at least one of the major cloud providers: AWS, GCP, Azure, or OCI. Experience with architecting, developing, and troubleshooting large-scale systems. Experience building CI/CD pipelines (preferably GitOps). Experience with monitoring and observability tools such as Prometheus, Loki, Jaeger, and Sentry. Experience in managing databases including (backup and restore plans, replication, and clustering) such as PostgresSQL, and MongoDB. Good networking knowledge (preferably experience with VPNs and Service Mesh)

Core Competencies: Self-Actualization & Fulfilment: Proficiency Level ADVANCED Team Synergy & Development: Proficiency Level - ADVANCED Entrepreneurial Mindset & Drive: Proficiency Level - ADVANCED Business Acumen & Diligence: Proficiency Level - ADVANCED

Similar Jobs