CONTRACT Site Reliability / DevOps Engineer - MUST have live DV clearance

Posted 19 March by C4S Search Ltd
Easy Apply

Register and upload your CV to apply with just one click

Contract Site Reliability / DevOps Engineer

Clearance: MUST have live DV clearance

Location: Onsite in Cheltenham, London, Manchester - 5 days a week

About the role:

Our cross-domain services are used in high profile government organisations. The demand for these services continues to grow in both scope and scale. We are seeking an experienced Site Reliability Engineer to help satisfy that demand. As an SRE you will be responsible for ensuring the availability, performance and cost effectiveness of these services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks.

Key Responsibilities of the Site Reliability Engineer:

• Collaborate with Software Engineers to improve reliability and performance in their subsystems

• Partner with System Administrators in automating toil and eliminating alerts

• Evolve observability and monitoring capabilities to identify and solve problems before they impact the business

• Support development environments to help us achieve our delivery and quality goals

• Research and evaluate technologies, tools and services to influence buy-vs-build decisions

• Develop expertise in diverse technical and business domains

• Expand your knowledge of the technical stacks used

Skills & Experience Required:

• Experience using modern configuration management tools (such as Ansible, Chef or similar)

• Experience working with Terraform

• Experience working with docker containers & container orchestration tools (such as Kubernetes, OpenShift or Docker Swarm)

• Experience both using and maintaining CI / CD tools (such as Jenkins or similar)

• Experience with monitoring tools such as InfluxDB, Prometheus or Grafana.

• Experience of event-driven integration with MQ messaging (RabbitMQ or similar AMQP solution)

• Good understanding of relational databases and SQL

• Linux command line, administration and shell scripting

• Working knowledge of network security protocols

• Experience using, developing with and maintaining cloud hosting services (ideally AWS EC2, RDS, S3, Lambda)

Desirable Skills:

Industry experience writing well-tested code in one of our platform languages (Java, Go, Python or similar)

Knowledge of cross domain principles & technologies

Experience of working in a service management environment

Practical applications of using observability patterns in previous systems

Creating and monitoring system availability metrics and using those to drive work that reduces downtime

Required skills

  • SQL
  • ansible
  • terraform
  • ci/cd

Reference: 52344611

Please note Reed.co.uk does not communicate with candidates via Whatsapp, and we will never ask you to provide your bank, passport or driving licence details during the application process. To stay safe in your job search and flexible work, we recommend visiting JobsAware, a non-profit, joint industry and law enforcement organisation working to combat labour market abuse. Visit the JobsAware website for information and free expert advice for safer work.

Report this job