Senior Lead Site Reliability Engineer

Jpmorgan Chase & Co. Plano , TX 75023

Posted 1 week ago

JobID: 210509812

Category: Software Engineering

JobSchedule: Full time

Posted Date: 2024-04-22T22:19:26+00:00

JobShift: Day

Base Pay/Salary: Jersey City,NJ $171,000.00-$260,000.00

Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.

As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the CORPORATE SECTOR in the INFRASTRUCTURE PLATFORMS, Runtime Compute Team, you are deemed as a force multiplier at both a line-of-business and firm wide level. Inspire your peers and the wider product line to deliver durable and resilient products and services to our customers, define firm wide strategies for reliability, and guide and entrust our teams to lead and execute those strategies.

Job responsibilities

  • Provide technical SRE leadership for multiple SRE teams, engineers, and managers throughout Runtime Compute who look to you for advice on the technical issues facing them.

  • You are a key influencer in the Runtime Compute strategic resiliency, observability, and toil reduction planning.

  • You drive continual improvement in resilience, quality of experience, security, monitoring, instrumentation, and automation.

  • You have successfully implemented SRE best practices in high-performance, stable, mission-critical applications with demonstrable positive outcomes.

  • Technologists in Runtime Compute look to you for advice on technical and business issues facing them.

  • You work with your fellow stakeholders to define common NFRs and availability targets for your product line, Runtime Compute, and ensure that SRE is practiced consistently across applications, products, and product lines.

  • You act in a blameless, data-driven manner, show high empathy, emotional intelligence, and can navigate difficult situations with composure and tact.

  • Direct the SRE teams in the product line throughout the lifecycle to help develop software for reliability and scale, ensuring consistency across the product line, and minimal refactoring or changes.

  • Direct the SRE teams in the product line to develop and measure the SLO/SLI for provisioning/deprovisioning, deployments, uptime, and other measures critical to products. Work with business partners to help educate on the product line SLO/SLI.

  • Identify gaps between applicable requirements and current procedures/controls; Drive resolution of mitigating controls. Develop and implement solutions that strengthen business operating models, enhance the client experience, and improve efficiency and controls.

  • Work with business partners to design and implement enhancements to existing processes and/or business applications, introduce new processes and/or toolsets, and engage in process re-engineering.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and 5+ years applied experience with Industry standard Runtime solutions eg Kubernetes and Cloud Foundry.

  • Expertise in at least one technology stack designing, coding, testing and delivering software.

  • Proficiency in one or more technology domains, may be cross-domain expert to able to solve complex and mission critical problems within a business or across the firm. Software development experience in at least one general purpose programming language: Python, Java, C, C++, Go, Shell scripting.

  • Working knowledge infrastructure component ( E.g. Load balancer, cloud platforms and products, container systems, and runtime compute).

  • Excellent debugging and troubleshooting skills.

  • Strong organizational and prioritization skills, detail-oriented and strong interpersonal skills.

  • Be a team player and a leader who shows commitment and dedication, and can maintain a positive attitude and high-level of performance on high-profile/time-sensitive initiatives

Preferred qualifications, capabilities, and skills

  • Experience hiring, developing, and recognizing talent

  • Ability to work in a high paced environment, be flexible, follow tight deadlines, organize and prioritize work

  • Hands-on experience with cloud-based observability technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, ElasticSearch, Grafana, appdynamics etc

  • Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins

  • Deep knowledge of Internet protocols and web services technologies such as HTTP, DNS, TCP/UDP, SOAP, JSON and REST

  • Good understanding of networking protocols and cybersecurity best practices in cloud environment. Public Cloud certification is preferred

  • AI/ML knowledge is preferred to evaluate and choose models that help with SRE goals including automated root cause analysis, anomaly detection, and real-time insights and analytics into various products.

#LI-RB3


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Senior Lead Site Reliability Engineer

Jpmorgan Chase & Co.