Our client s Service Reliability Engineer (SRE) focus is on three things: overall ownership of production, production code quality, and deployments. The successful candidate, will be self-directed and able to participate in the decision-making process at various levels.
We expect our client s SREs to have opinions on the state of our service and provide critical feedback during various phases of the operational lifecycle. We are engaged throughout the S/W development lifecycle, ensuing the operational readiness and stability of our service.
Requirements for the Role
Minimum of 5+ years working experience in Software Development and/or Linux Systems Administration role.
Strong interpersonal, written and verbal communication skills.
Available to participate in a scheduled on-call rotation.
Skills & Knowledge Needed
Proficient as a Linux Production Systems Engineer, with experience managing large scale Web Services infrastructure.
Proficient with the design, implementation and full management of Cloud Storage Technologies such as Ceph and Gluster in a large-scale production environment.
Development experience in one or more of the following programming languages:
Bash, Java, Node.js, C++ or Ruby
In addition, experience with one or more of the following:
NoSQL at scale (e.g. Hadoop, Mongo clusters, and/or sharded Redis)
Event Aggregation technologies. (e.g. ElasticSearch)
Monitoring & Alerting, and Incident Management toolsets
Virtual infrastructure (deployment and management) at scale
Release Engineering (Package management and distribution at scale)
S/W Performance analysis and load testing (QA or SDET experience: a plus)