Senior Site Reliability Engineer | (Cloud, Kubernetes, Terraform expertise required)
Berkley Hunt have partnered with a global leader in developer data platforms is on a mission to empower innovators by transforming how organisations build and scale modern, AI-powered applications. With a presence across 115+ cloud regions, this team is at the forefront of shaping how data powers the future. This role puts you at the core of their next-generation infrastructure, ensuring that cloud-native systems remain fast, reliable, and resilient at global scale.
Requirements:
- Deep expertise in Infrastructure as Code (Terraform preferred).
- Advanced experience with Kubernetes and Helm in production environments.
- Strong Linux systems administration and troubleshooting skills.
- Hands-on experience with GCP and AWS cloud services.
- Proficient in scripting and automation with Python or Go.
- Strong grasp of observability practices—monitoring, logging, and alerting.
- Solid understanding of networking fundamentals and troubleshooting.
- Experience working in fast-paced environments with a proactive mindset.
- Willingness to participate in a 24/7 on-call rotation.
Responsibilities:
- Build and maintain highly scalable, reliable infrastructure using IaC principles.
- Manage production-grade Kubernetes clusters and Helm deployments.
- Optimize system performance, security, and availability across Linux environments.
- Architect and maintain cloud-native solutions on GCP and AWS.
- Implement and refine observability pipelines to detect and resolve issues proactively.
- Automate operational processes to improve system efficiency and developer velocity.
- Support critical incidents and outages with precision and urgency.
- Collaborate cross-functionally to ensure platform reliability at scale.
POSTULER