Senior Site Reliability Engineer | (Cloud, Kubernetes, Terraform expertise required)
Berkley Hunt have partnered with a global leader in developer data platforms is on a mission to empower innovators by transforming how organisations build and scale modern, AI-powered applications. With a presence across 115+ cloud regions, this team is at the forefront of shaping how data powers the future. This role puts you at the core of their next-generation infrastructure, ensuring that cloud-native systems remain fast, reliable, and resilient at global scale.
Requirements:
Deep expertise in Infrastructure as Code (Terraform preferred).
Advanced experience with Kubernetes and Helm in production environments.
Strong Linux systems administration and troubleshooting skills.
Hands-on experience with GCP and AWS cloud services.
Proficient in scripting and automation with Python or Go.
Strong grasp of observability practices—monitoring, logging, and alerting.
Solid understanding of networking fundamentals and troubleshooting.
Experience working in fast-paced environments with a proactive mindset.
Willingness to participate in a 24/7 on-call rotation.
Responsibilities:
Build and maintain highly scalable, reliable infrastructure using IaC principles.
Manage production-grade Kubernetes clusters and Helm deployments.
Optimize system performance, security, and availability across Linux environments.
Architect and maintain cloud-native solutions on GCP and AWS.
Implement and refine observability pipelines to detect and resolve issues proactively.
Automate operational processes to improve system efficiency and developer velocity.
Support critical incidents and outages with precision and urgency.
Collaborate cross-functionally to ensure platform reliability at scale.