Cloud Operations Expert is responsible for the stability, reliability, and performance of production systems, while leveraging DevOps practices to automate, industrialize, and improve operational efficiency.
Technical environment
• Cloud: Google Cloud Platform (GCP), Amazon Web Services (AWS)
• Scripting: Shell, Python
• Tools: Git, GitLab CI, Ansible, Terraform
• Containers & orchestration: Docker, Kubernetes
• Platforms & streaming: App Engine, Kafka
• Databases: SQL (PostgreSQL), NoSQL (Elasticsearch, Couchbase)
• Monitoring & observability: Grafana, Stackdriver, Mesos, PagerDuty, Prometheus, Datadog
• Network: VPN, IP routing, NAT, proxy
• Methodology: Agile
Main responsibilities
• Monitor and maintain the availability of production systems and infrastructure
• Handle incidents, troubleshooting, and root cause analysis
• Ensure availability, performance, and reliability according to SLA/SLO objectives
• Automate operational tasks through scripting and Infrastructure as Code (IaC)
• Manage and improve CI/CD pipelines (GitLab CI)
• Deploy and support applications in cloud environments (AWS / GCP)
• Perform infrastructure updates using IaC tools such as Terraform
• Work with containerized environments (Docker, Kubernetes)
• Collaborate with development teams to improve deployability and operability
• Apply best practices in security, backup, and disaster recovery
Required skills
• Strong experience in operating production systems
• Proven experience in monitoring and incident management
• Scripting skills (Bash, Python)
• Experience with cloud platforms (GCP or AWS)
• Good command of CI/CD and automation practices
• Experience with version control systems (Git / GitFlow)
• Good knowledge of containerized and microservices architectures (Kubernetes)
• Knowledge of SQL (PostgreSQL) and NoSQL (Elasticsearch, Couchbase) databases
• Understanding of network concepts (VPN, IP routing, NAT, load balancing, proxy)
• Experience in an Agile / SAFe environment
• Fluent technical English (reading, writing, speaking)