Site Reliability Engineer I - IT

Zynga, Inc. Chicago , IL 60602

Posted 2 months ago

Monitoring & Incident Management:

  • Improve the studio's reliability through monitoring, rapid response, communication and coordination.

  • Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, escalations and alerts.

  • Routinely identifies operational problems by observing and studying system architect, functionality and performance results. Troubleshooting procedures with the overall studio architect and investigating surfaced issues, and handling incidents.

  • Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.

Development & Data Analyzing:

  • Develop operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.

  • Create new tools to facilitate automated monitoring of the studio's operational environment.

  • Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.

  • Improves operational quality results by studying, evaluating, and recommending process re architecting, implementing changes, contributing information and opinion to unit design and modification teams.

  • Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.

  • Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.

  • Accomplishes engineering and organization mission by completing related results as needed.

Operations Engineer Skills and


Mastery of Systems Linux and Networking administration

  • Strong systems engineering and troubleshooting skills

  • Shell scripting (BASH & PHP)

  • Strong TCP/IP understanding and ability to produce detailed documentation

  • Write up new and maintain technical documentation

  • Ability to administer networking firewalls, routers, and switches

  • S3 Maintenance, Apache maintenance, Load Balancer Management

  • Puppet Management

Cloud Management

  • AWS Expertise (VPC, RDS, Route53 Integration (DNS))

Database fundamentals

  • Administer and maintain MySQL and other opensource databases

  • Write and perform basic queries to evaluate database stability, integrity and performance

  • Large/Big Data Management

  • Administer and maintain Aurora infrastructure

Monitoring Systems

  • System Level (Nagios, Munin, Check_MK)

  • Writing checks & scripts

  • Log/Application Level (Splunk, Elastic Searching, Apache)

  • Ability to diagnose infrastructure as a whole!

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer (Sre/Devops)

Yum! Brands, Inc.

Posted 3 days ago

VIEW JOBS 3/26/2020 12:00:00 AM 2020-06-24T00:00 We're looking for solid engineers who bring fresh ideas from their own experience and are eager to tackle new problems across the stack. We are looking for a candidate who is equally passionate about engineering software features and working through real-world operational challenges. In this role, the SRE will help drive the whole lifecycle of cloud services, from design, through deployment, operation, and optimization.. We are looking for a candidate who understands what it takes to build a high availability SaaS offering on a public cloud infrastructure (AWS, GCP, Azure). As an SRE/DevOps Engineer, you will: * Help teams understand and improve system quality through automation, maturity of existing tools and processes, and adoption of new technologies. * Support services before they go live through system design, capacity planning, testing, and launch reviews. * Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. * Respond to incidents as they occur, discover and document root causes, and lead the effort to address them. * Develop and integrate core DevOps tools for all development teams such as collaboration tools, software artifact and source code repository, application logging dashboards, monitoring and alerting capabilities, CI/CD pipelines, and more. * Embrace and promotes a mentality of continuous improvement * Automation, maturity, and adoption of: * Deployments * Deployment pipelines * Test frameworks * QE Processes * QE Tooling * System architecture * Test reliability * Monitoring, alerting, and reporting Minimum Requirements : Skills and Qualifications: * BS Degree in Computer Science (or related technical field), or equivalent practical experience * 5+ years of progressively responsible software development, testing, and/or systems architecture and DevOps experience * 2+ years building and running docker containers in production environments * Experience building and running a high availability SaaS offering in Amazon Web Services (AWS) * Familiarity with Microsoft Azure and/or Google Cloud * Experience designing and implementing end to end Continuous Delivery pipelines. * Experience integrating automated testing into CI/CD pipelines * Release automation experience leveraging blue-green and canary deployment techniques * Load and performance testing experience * Experience improving overall system health with microservices * Experience with root cause analysis of persistent problems * Experience collaborating across multiple functional/technical teams to deliver a project. * Demonstrated growth mindset, enthusiastic about learning new technologies quickly and applying the gained knowledge to address business problems. * Ability to communicate with clients on a business level and translate their needs into a technical solution. #Dice The Yum! Brands story is simple. We have the three distinctive, relevant and easy global brands - KFC, Pizza Hut and Taco Bell -- born from the hopes and dreams, ambitions and grit of passionate entrepreneurs. And we want more of this to create our future! As the world's largest restaurant company we have a clear and compelling mission: to build the world's most love, trusted and fastest-growing restaurant brands. The key and not-so-secret ingredient in our recipe for growth is our unrivaled talent and culture, which fuels our results. We're looking for talented, motivated, visionary and team-oriented leaders to join us as we elevate and personalize the customer experience across our 48,000 restaurants, operating in 145 countries and territories around the world! We put pizza, chicken and tacos in the hands of customers through customized ordering, unique delivery approaches, app experiences, and click and collect services and consumer data analytics creating unique customer dining experiences - and we are only getting started. Employees may work for a single brand and potentially grow to support all company-owned brands depending on their role. Regardless of where they work, as a company opening an average of 8 restaurants a day worldwide, the growth opportunities are endless. Taco Bell has been named of the 10 Most Innovative Companies in the World by Fast Company; Pizza Hut delivers more pizzas than any other pizza company in the world and KFC's still use its 75-year-old finger lickin' good recipe including secret herbs and spices to hand-bread its chicken every day. Yum! and its brands have offices in Chicago, IL, Louisville KY, Irvine, CA, Plano, TX and other markets around the world. We don't just say we are a great place to work - our commitments to the world and our employees show it. Yum! has been named to the Dow Jones Sustainability North America Index and ranked among the top 100 Best Corporate Citizens by Corporate Responsibility Magazine in addition to being named to the Bloomberg Gender-Equality Index. Our employees work in an environment where the value of "believe in all people" is lived every day, enjoying benefits including but not limited to: 4 weeks' vacation PLUS holidays, sick leave and 2 paid days to volunteer at the cause of their choice and a dollar-for-dollar matching gift program; generous parental leave; competitive benefits including medical, dental, vision and life insurance as well as a 6% 401k match - all encompassed in Yum!'s world-famous recognition culture. Yum! Brands, Inc. Chicago IL

Site Reliability Engineer I - IT

Zynga, Inc.