Job Description:Role Summary We are seeking a Resident Engineer with deep expertise in managing, troubleshooting, and recovering enterprise workloads protected by Cohesity. The ideal candidate will possess strong hands‑on experience across virtual, database, file, physical, and cloud workloads, and be capable of owning complex incidents, restores, and performance issues end‑to‑end.
Key Responsibilities1. L3 Operations & Incident Management Act as Level‑3 escalation point for all Cohesity‑related incidents and service requests. Perform root cause analysis (RCA) for backup, restore, and performance issues across workloads. Own incidents through resolution, including customer communication and post‑incident documentation. Provide guidance to L1/L2 teams and improve operational runbooks.
2. Workload Ownership & Expertise (Critical Requirement) The engineer must demonstrate deep operational understanding of the following Cohesity workloads:a) Virtualization Workloads VMware v Sphere (mandatory), Hyper‑V (good to have) Crash‑consistent vs application‑consistent backups CBT, snapshot failures, proxy issues VM‑level, file‑level, and instant VM restores Cross‑cluster / cross‑datacenter restoresb) Database Workloads Microsoft SQL Server, Oracle, SAP HANAFull, incremental, and log backups Log truncation and consistency troubleshooting Point‑in‑time recovery scenarios Coordination with DBAs during recoveryc) File & NAS Workloads Net App, Isilon, Windows/Linux file servers NFS/SMB protection jobs Large namespace backup and restore issues ACLs and permissions during restore Ransomware detection and file‑level recoveryd) Physical Server Workloads Agent‑based protection (Windows & Linux) Agent health, upgrades, and failures Bare‑metal and OS‑level restores Performance tuning for physical backupse) Cloud & Saa S Workloads (Good to Have) AWS and Azure workload protection Microsoft 365 (Exchange, One Drive, Share Point) API‑based backup and restore issues Throttling, permissions, and tenant limitations
3. Backup, Restore & DR Operations Design and manage protection policies aligned to workload SLAs (RPO/RTO). Execute and validate complex restores, including:Point‑in‑time recovery Large‑scale restores DR and audit‑driven restores Support DR drills and real‑time recovery events.
4. Performance & Capacity Management Monitor and optimize backup windows and cluster performance. Analyze bottlenecks related to:Network throughput Source workload performance Storage and archive tiers Capacity planning for on‑prem and cloud archival storage.
5. Security & Governance (Preferred) Ransomware protection and anomaly detection. Legal hold, data classification, and compliance reporting. Secure data access and role‑based permissions.
6. Customer & Stakeholder Engagement Direct interaction with customers during critical situations. Translate technical issues into business impact. Recommend workflow improvements and best practices. Assist in onboarding new workloads and environments.
Technical Skills (Mandatory) Strong hands‑on experience with Cohesity Data Protect platform Deep knowledge of:VMware v Sphere Enterprise backup & recovery concepts Linux & Windows OS fundamentals Experience in troubleshooting:Backup failures Restore issues Performance degradation
Preferred / Good‑to‑Have Skills Cohesity certifications Public cloud platforms (AWS, Azure) Scripting (Power Shell, Bash) ITIL process familiarity