Service Reliability Engineer

Oath Sunnyvale , CA 94085

Posted 4 days ago

It takes powerful technology to connect our brands and partners with an audience of 1 billion. Nearly half of Verizon Media employees are building the code and platforms that help us achieve that. Whether you're looking to write mobile app code, engineer the servers behind our massive ad tech stacks, or develop algorithms to help us process 4 trillion data points a day, what you do here will have a huge impact on our businessand the world. Want in? As Verizon's media unit, our brands like Yahoo, TechCrunch and HuffPost help people stay informed and entertained, communicate and transact, while creating new ways for advertisers and partners to connect. With technologies like XR, AI, machine-learning, and 5G, we're transforming media for tomorrow, too. We're creators and coders, dreamers and doers creating what's next in content, advertising and technology.

Do you...

  • Have a passion for solving technical problems, from the network layer to the application?
  • Spend time trying to figure out how something works, not stopping with knowing just that it does?
  • Want to make real web applications and back-end systems faster, more reliable, more efficient?

Oath's Service Reliability Engineering team is seeking a talented "Rapid Response Engineer" to play a vital role in a team that runs critical operations and systems engineering for Yahoo most popular internet sites including Mail, Messenger, Sports, Finance, Games, News, Entertainment and many others.

This position requires an aggressive troubleshooter who can multitask on problems of varying difficulty, priority and time-sensitivity. This versatile position requires familiarity with all the support concepts of busy web sites: Systems and database administration; Networking; Process troubleshooting; QA and rollout automation.


  • Identify the priority and criticality of incoming alerts and prioritize appropriately

  • Diagnose & repair issues using critical knowledge of Apache, UNIX processes, MySQL and related technologies within the OSI stack.

  • Track issues through the ticketing systems and follow through to resolution

  • Utilize monitoring tools to proactively identify issues and trends

  • Write clear and concise operational runbooks

  • Escalate significant issues to service, network or other operations engineers

  • Lead by example, deliver results and eliminate missed opportunities

Ideal candidate will possess a broad range of computer science skills. The candidate must be persistent, result oriented, and a self starter.

Basic skills:

The candidate should have 4 or more years experience in technical operations and additional exposure to tool/product development. Knowledge of Unix/Linux, Apache, performance tuning concepts, and web applications is a must. SQL experience (mysql, Oracle) is a plus.

Preferred skills:

  • BS in Computer Science

  • Experience with high-volume websites is a plus

  • Strong written & oral communication skills are essential. Proven ability to write bugs, test cases, problem reports

  • Ability to rapidly learn and assimilate knowledge of complex software and systems, and apply understanding of system architecture when planning operational tasks and strategy

  • Demonstrable experience in one or more languages such as: shell scripting, Perl, PHP, or Java or C is also a plus

  • Experience with statistical analysis of defects and system performance a plus

  • Strong knowledge of TCP/IP networking, SMTP, HTTP, load-balancers, highly available network servers

Verizon Media is proud to be an equal opportunity workplace. All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. Verizon Media is dedicated to providing an accessible environment for all candidates during the application process and for employees during their employment. If you need accessibility assistance and/or a reasonable accommodation due to a disability, please submit a request via the Accommodation Request Form ( or call 408-336-1409. Requests and calls received for non-disability related issues, such as following up on an application, will not receive a response.

Currently work for Verizon Media? Please apply on our internal career site.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer


Posted 4 days ago

VIEW JOBS 10/17/2019 12:00:00 AM 2020-01-15T00:00 About Clover: Every day, Clover devices handle the core card and point-of-sale processing for hundreds of thousands of merchants. Behind the scenes, we operate a cloud platform providing processing, storage and collaboration for merchants, application developers, service providers and our merchants' customers. Our devices and platform form the backbone of millions of payment interactions between merchants and their customers daily. The Role: To support all of this, we have a team of engineers working around the clock to ensure our systems remain operational, safe and secure. Right now we are looking to further scale our operation, and we are looking for a hands-on technologist with creative and innovative problem solving skills. Availability, reliability, and security are paramount. In this role, you will help build and operate complex systems that allow our large fleet of smart payment terminals to process tens of millions of transactions a day. We are hoping to find individuals who are a hybrid between system administrators and software engineers. Responsibilities: * Act as a key contributor in forming the team's technical strategy and aligning the team and stakeholders with it * Initiate large projects with complex architecture, breaking them down to the right logical components so that others engineers can be utilized & contribute effectively * Work frequently with other teams to coordinate major changes to cross-system architectures, influencing upstream or downstream for the most efficient solutions * Collaborate with engineering teams to propose features that solve recurring patterns of customer complaints * Expertly design and implement scalable, distributed, fault tolerant systems that satisfy complex requirements * Support services before they go live, through activities such as system design consulting, capacity planning and launch reviews * Maintain services once they are live by measuring and monitoring availability, latency and overall system health * Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity * Design and implement best practices for security, monitoring, and telemetry systems * Lead initiatives and meetings in the engineering organization and help your teammates be better engineers through better processes, practices and technical guidance Requirements: * Strong CS fundamentals. BS degree in Computer Science or related technical field, or equivalent practical experience * Ability to manage competing priorities, a focus on shipping, and the ability to work well under pressure * Experience in designing, analyzing, scaling and troubleshooting large-scale distributed systems * A systematic problem-solving approach, coupled with strong communications skills and a sense of ownership and drive * A passion for automation; strong coding skills in at least one modern programming language (Java/Go/Python/Ruby) * Super strong Linux skills and supreme troubleshooting skills * Experience with a variety of Cloud technologies and familiarity with industry landscape and trends * Some configuration management experience; product does not really matter (any of Puppet, Chef, cfengine, Fabric, Ansible, Salt is fine) * Willingness to be part of on-call rotations Nice to have: * Experience with large scale OLTP and OLAP deployments * Cloud experience: platform does not matter * Experience with tools like Elastic/Kibana, Jenkins, Pagerduty, Wavefront * Release software tooling (git, Jenkins, custom scripts) * Experience with algorithms, data structures, complexity analysis and software design Clover Sunnyvale CA

Service Reliability Engineer