Site Reliability Engineer / SRE / Systems EngineerA fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.If you’ve also worked in the following roles, we’d also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production EngineerSALARY: up to £70,000 per annum (depending on experience) + BenefitsLOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West EnglandJOB TYPE: Full-Time, PermanentJOB OVERVIEWWe have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.APPLY TODAYReady to make your next career move? Apply Now for our Recruitment Team to review.DUTIESYour duties as the Site Reliability Engineer / Systems Engineer include:Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handoverSystem Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and servicesObservability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issuesReliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilienceAutomation and Resilience: Supporting automation, incident response and continuous improvement practicesNew Service Support: Ensuring new products and features are operable, reliable and scalable from day oneCross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issuesDocumentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reportsIncident Prioritisation: Balancing customer impact with long-term system health and stabilitySecurity and Compliance: Supporting compliance with security, availability and regulatory frameworksCANDIDATE REQUIREMENTSESSENTIALPrevious experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer roleExperience supporting production services at scale within a DevOps or SRE environmentStrong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6Experience with observability tools such as Prometheus, Grafana, ELK or SplunkHands-on experience with containerisation and orchestration using Docker and KubernetesCloud platform experience, ideally Google Cloud Platform, including automation and scaling practicesStrong Linux administration skills with scripting capability in Bash, Python or similarFamiliarity with CI/CD pipelines and source control tools such as GitHub ActionsUnderstanding of security frameworks and operational resilience best practicesDESIRABLEExperience within ISP, MSP or telecommunications environmentsFamiliarity with enterprise IT architectures including OSS and BSS systemsKnowledge of information security frameworks such as ISO27001, NIST or GDPRExperience with infrastructure automation tools such as Terraform or AnsibleBENEFITSSmart casual dress codeFree access to gym facilitiesAccess to a financial wellbeing platform (on successful completion of probationary period)Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)Access to cycle to work, childcare, and electric vehicle schemes after six monthsBrand new office with excellent transport linksSupportive team culture, growth and career progressionHOW TO APPLYTo be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV’s of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.JOB REF: AWDO-P14376Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online.AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency.AWD-IN-SPJ
read more