Epam Systems Elkins Park, PA , Montgomery County, PA
Posted 3 days ago
EPAM is hiring a Remote Lead Site Reliability Engineer. If you are looking for a high-impact, exciting role with a company that leads the globe in the digital transformation space, EPAM is the perfect next step in your career As an EPAMer, you’ll have the opportunity to work with a supportive team on a variety of interesting and impactful projects for some of the largest and most recognizable brands in the world.
Are you ready to advance in your career journey? Apply now Responsibilities • Lead weekly operational state reviews covering performance trends, anomalies, errors and other availability events with SREs, product owners, and development teams • Participate in quarterly business and operational reviews aligning on roadmaps, development velocity, efficiency, growth trends, etc • Socialize SRE culture across teams within the organization to publicize the value of SRE, mentor and train other engineers around proactive reliability decision making and planning • Review code instrumentation with development teams and ensure necessary dashboards are created to monitor SLI/SLO/SLAs • Establish, test, and tune alerting for varying tiers of applications • Document and maintain runbooks and procedures, automate as much as possible • Plan and execute periodic disaster recovery exercises, load and scalability testing, and peak readiness reviews • Define what it means for a service to be available and develop, monitor, and alert on SLIs/SLOs Requirements • 5 years of SRE or Systems Engineering experience • 2 years as team lead or SRE champion • Bachelors degree in Computer Science, similar technical field of study, or equivalent practical experience • Proven experience troubleshooting, mitigating, and resolving issues in a distributed system • Strong communication and collaboration skills for varying groups of stakeholders • Be self-motivated and can prioritize effectively between competing priorities • Experience with implementing SRE practices for services and applications deployed in production in the cloud • Must understand most SRE concepts, including SLI/SLO/SLA, Error Budget, MTTD/MTTR/MTBF, Toil, Capacity Planning, Observability, Monitoring/Alerting, Release Engineering, and Incident Management/Blameless Post-Mortems Benefits • Medical, Dental and Vision Insurance (Subsidized) • Health Savings Account • Flexible Spending Accounts (Healthcare, Dependent Care, Commuter) • Short-Term and Long-Term Disability (Company Provided) • Life and AD&D Insurance (Company Provided) • Employee Assistance Program • Unlimited access to LinkedIn learning solutions • Matched 401(k) Retirement Savings Plan • Paid Time Off – the employee will be eligible to accrue 15-25 paid days, depending on specific level and tenure with EPAM (accrual eligibility may change over time) • Paid Holidays - nine (9) total per year • Legal Plan and Identity Theft Protection • Accident Insurance • Employee Discounts • Pet Insurance • Employee Stock Purchase Program • If otherwise eligible, participation in the discretionary annual bonus program • If otherwise eligible and hired into a qualifying level, participation in the discretionary Long-Term Incentive (LTI) Program
Epam Systems