Site Reliability Engineer- SRE

The Armada Group San Francisco , CA 94103

Posted 3 months ago

We are looking for a Senior Site Reliability Engineer for a fast growing, high profile client with offices in Palo Alto and San Francisco.

As a Senior Site Reliability Engineer you will may be tech lead for or participant in, multiple projects in developing plans, negotiating engagement details with development partners, and organizing the work of your SRE team.

In this role you will spend as much of their time working on systems as they do writing code. You will be responsible for building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure and, defining service level objectives and agreements.

  • Shape architecture, design, and implementations of new and existing systems to enhance reliability, performance, efficiency, and scalability

  • Ensure all key services are measured, monitored and raising alerts when needed

  • Automation of deployment and configuration processes

  • Develop reliability tools and frameworks for use by all engineers

  • Share on-call for duties for critical systems and lead incident response and no-blame postmortem analysis and review

  • Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis

  • BS or MS in Computer Science or a related technical discipline.

  • Equivalent practical experience is a reasonable substitute.

  • Experience with C/C++, Java, Javascript, Python or Go

  • Experience in the Linux internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, etc.

  • Large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.

  • Working knowledge of the TCP/IP stack, internet routing and load balancing.

- BS or MS in Computer Science or a related technical discipline.

  • Equivalent practical experience is a reasonable substitute.

  • Experience with C/C++, Java, Javascript, Python or Go

  • Experience

See if you are a match!

See how well your resume matches up to this job - upload your resume now.

Find your dream job anywhere
with the LiveCareer app.
Download the
LiveCareer app and find
your dream job anywhere

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer

Etouch Systems Corp

Posted 2 days ago

VIEW JOBS 11/13/2018 12:00:00 AM 2019-02-11T00:00 <div>Job Description</div> <ul> <li style="padding: 0; margin: 0;">Engage in and improve the whole lifecycle of our products—from ideation and design, through development, launch, operation and iteration.</li> <li style="padding: 0; margin: 0;">Partner with product engineering teams through PDLC on design, development, capacity planning, and ramp plans to ensure Venmo continuous to scale and avoid downtime.</li> <li style="padding: 0; margin: 0;">Ensure sufficient logging, monitoring and alerting strategies around availability, latency and overall system health.</li> <li style="padding: 0; margin: 0;">Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.</li> <li style="padding: 0; margin: 0;">Host incident reviews and blameless post mortems.</li> </ul> <div> </div> <div>Minimum qualifications:</div> <ul> <li style="padding: 0; margin: 0;">Experience with Unix/Linux operating systems internals and administration.</li> <li style="padding: 0; margin: 0;">Experience in one or more of the following: Java, Python, Perl, or shell scripting.</li> <li style="padding: 0; margin: 0;">Ability to debug and optimize code and automate redundant tasks.</li> </ul> <div> </div> <div>Preferred qualifications:</div> <ul> <li style="padding: 0; margin: 0;">Proficiency in managing cloud based large-scale infrastructure.</li> <li style="padding: 0; margin: 0;">Expertise in designing and troubleshooting large scale distributed systems.</li> <li style="padding: 0; margin: 0;">Great analytical and problem solving skills.</li> <li style="padding: 0; margin: 0;">Strong communicator, both written and spoken.</li> </ul> <div><strong>Best Regards,</strong></div> <div><strong>Nagaraj Gollapalli</strong></div> <div><strong>Talent Acquisition- North America</strong> </div> <div>Phone: 510-399-7822 | Mobile: 510-585-1527 | email:  <a data-mce-href="" href=""></a> <a data-mce-href="" href=""></a>     <a data-mce-href="" href=""></a></div> <div>Address:6627 Dumbarton Circle Fremont CA 94555 </div> Etouch Systems Corp San Francisco CA

Site Reliability Engineer- SRE

The Armada Group