Sorry, this job is no longer accepting applications. See below for more jobs that match what you’re looking for!

Manager, Site Reliability & Performance Engineering

Expired Job

Petco San Diego , CA 92140

Posted 4 months ago

Our vision at Petco is Healthier Pets. Happier People. Better World. We're making things better for pets, people and the planet through our Think Adoption First philosophy, the Petco Foundation and other important initiatives that focus on putting animals first, educating pet parents and reducing our carbon footprint. The journey starts with knowledgeable, passionately engaged associates who are proud to recommend Petco as a place to work, who believe in our Vision and who are committed to delivering a superior customer experience.

From our retail stores and our network of Distribution Centers to our Corporate offices, you'll work with others who share your values and commitment. We seek individuals who are passionate about animal welfare, have great people skills and are driven to grow and advance in their careers with us. Our ongoing growth is creating exceptional opportunities for professional development and personal enrichment throughout our organization.

Position Purpose:

As an IT leader, the incumbent will be responsible for building and leading a group of high performing engineers. This position is responsible for managing multiple teams and must be able to balance priorities and consistently deliver high value assets to the organization. Responsibilities include but are not limited to: application performance analysis & reporting, application monitoring,tag management, enterprise tooling administration and operations support.

Essential Job Functions: The incumbent must be able to perform all of the following duties and responsibilities with or without a reasonable accommodation.

  • Responsible for evaluation, implementation and administration of all enterprise tooling/monitoring solutions.

  • Day to day management of all team members, contractors, interns and project team members.

  • Maintain vendor relationships.

  • Maintain annual department budget including but not limited to all related consulting fees, licensing costs, contract negotiations and related invoicing.

  • Responsible for managing resources required for the configuration, tuning, and monitoring of tag management and tag configurations for enterprise-level applications using tools including but not limited to Tealium and ObservePoint.

  • Coordinate with business stakeholders, marketing partners, vendors, and development teams to determine how to implement requested tagging and document tag requirements.

  • Oversee the management and governance of tagging.

  • Responsible for the configuration, tuning and monitoring of web acceleration, client-side optimizations, bot management, caching, and monitoring for enterprise-level applications using tools including but not limited to Akamai, NewRelic, LogRhythm, and Solarwinds.

  • Responsible for operational support for all tooling and monitoring solutions

  • Responsible for trouble shooting problems as reported by users. Researches, evaluates and recommends software and hardware products.

  • Coordinates with Information Security team and SOC to investigate, evaluate, and mitigate security incidents.

  • Configures security risk mitigation measures to maintain a secure systems environment.

  • Provides recommendations based on application needs and anticipated growth.

  • Installs acceleration and monitoring tools for new applications and environments as needed.

  • Determines and develops technical approaches and solutions, conducts business reviews, documents current systems and develops recommendations of how to proceed with the applications

  • Evaluates existing or proposed systems for the business, and devises computer systems and related procedures to solve the business need

  • Devises or modifies procedures to solve problems considering computer equipment capacity and limitations, operation time, system and application administration, application deployments, configuration management, and system and application monitoring

  • Oversees the efforts to enhance the performance of enterprise-wide applications using complex tools and techniques such as edge caching, FEO (front end optimization), compression, minification, deferred loading, pre-loading, rate controls, and sure route delivery.

  • Coordinates third-party relationship with outsourced team and vendors for applications and troubleshoots problems with department users and administrators

  • Handle work requests and problem tickets with Akamai, NewRelic and other 3rd party support teams, articulate and follow up on requests, and escalate/track critical issues.

  • Maintain the integrity, performance, security, and availability of multiple web-based environments, including configuring, upgrading, patching, performance analysis and tuning, and monitoring.

  • Participate in performing RCA (root cause analysis) by providing analysis information, identifying preventative actions that can be taken, and implementing alerts and monitoring that proactively identify items to prevent problem recurrence

Supervisory Responsibility:

Direct management of all team members. Reviews technical work of team members. Mentors team members. Organizes tasks and provide leadership for the team.

Work Environment:

Typically the incumbent will be in an office setting seated, in artificial light and working for prolonged periods of time on the computer.

8 or more years of progressively responsible professional experience in function or 5 years of experience in highly specialized function.


  • Demonstrate expertise in Tealium or similar tag management tooling
  • Can work independently to architect, plan and deploy website tag changes into various non-prod and prod environments.
  • Demonstrates excellent ability to troubleshoot web applications within the client browser including but not limited to analysis of javascript errors, browser activity, and caching behaviors.

  • Ability to implement governance processes to ensure tags meet business objectives and comply with guidelines and compliance requirements.

  • Ability to analyze performance impact of tags and determine appropriate changes that can improve performance; including making recommendations on removal of poorly performing tags.

  • Strong hands-on working knowledge of web technologies and protocols such as HTTP, HTML, javascript and XML.

  • Interface and collaborate with developers, hosting provider/infrastructure team, vendors, and business stakeholders to debug problems end-to-end and achieve resolution.

  • Demonstrate expertise in Akamai and New Relic tooling

  • Can work independently to architect, plan and deploy acceleration, FEO, and monitoring capabilities into various non-prod and prod environments.
  • Demonstrates expertise in tuning of CDN and client-side optimizations.

  • Demonstrates excellent ability to troubleshoot web applications within the client browser including but not limited to analysis of javascript errors, network timelines, and caching behaviors.

  • Ability to analyze traffic patterns and determine appropriate actions needed to mitigate malicious traffic or reduce false positives.

  • Ability to deeply analyze performance data and determine appropriate changes that can improve application or infrastructure performance; including making recommendations on mechanisms that can aid in capturing additional performance information or identifying root causes.

  • Strong hands-on working knowledge of web technologies and protocols such as HTTP, certificates, HTML and XML.

  • Interface and collaborate with Developers, Hosting provider/Infrastructure team, Database teams/DBA to debug problems end-to-end and achieve resolution.

  • Capable of authoring and modifying selenium scripts used for synthetics monitoring of applications.

Work Environment:

Typically the incumbent will be in an office setting seated, in artificial light and working for prolonged periods of time on the computer.


Manager relationships with internal clients and outsourced team(s) on complex projects. Interacts with senior internal and external personnel on significant technical matters often requiring coordination between organizations.

See if you are a match!

See how well your resume matches up to this job - upload your resume now.

Find your dream job anywhere
with the LiveCareer app.
Download the
LiveCareer app and find
your dream job anywhere

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Contract Senior Site Reliability Engineer

Sony Playstation Network

Posted 7 days ago

VIEW JOBS 12/5/2018 12:00:00 AM 2019-03-05T00:00 PlayStation isn't just the Best Place to Play —it's also the Best Place to Work. We've thrilled gamers since 1994, when we launched the original PlayStation. Today, we're recognized as a global leader in interactive and digital entertainment. The PlayStation brand falls under Sony Interactive Entertainment, a wholly-owned subsidiary of Sony Corporation. As a Site Reliability Engineer and member of the Service Platform Operations Team you will closely support engineering teams in the provisioning, integration, configuration, deployment, monitoring, and incident response of the applications and services at the core of the PlayStation Network handling millions of users and devices. The Service Platform Operations team handles application deployments, configuration, performance tuning and monitoring, capacity management, and production support for services which enable customers to access and enjoy a wide range of digital entertainment content seamlessly and across various devices and user interfaces. The Sr. Systems Engineer will support the team and drive improvements in process and technology of cloud and on-prem hosted services to improve continuous delivery, incident response, application availability, system resiliency and service monitoring. Responsibilities: The Senior Site Reliability Engineer will provide technical leadership to the Service Platform Operations Team as we configure, integrate, deploy, validate, monitor, and support services and applications on the PlayStation Network. Responsibilities include: * Hands-on application management and support for AWS cloud and on-prem production environments, including full-stack diagnosis, fault resolution and root cause analysis. * Proactive monitoring of production systems and identify issues before service impact. * Drive and Implement monitoring tools/metrics/reports for tracking application/service performance. * Collaborate with engineering and system teams to drive changes and ensure optimal application performance and resiliency. * Lead service and system performance analysis, service capacity planning, and service continuity validation for multiple applications. * Implement automated scripts/tools to automate operational tasks/activities. * Review and influence design, architecture, standards, and methods for deploying, monitoring and operating services and applications. * Actively participate and/or commit in the execution of tasks required to meet milestones and deliverables set by the SCRUM team throughout the release cycle. * Provide rotational on-call support. Qualifications: * BS degree in Computer Science, Engineering, or related technical discipline. * 5 years hands-on Linux experience (RHEL or CentOS preferred). * 3 years of relevant work experience in a high-volume and/or critical production environment. * 2 years hands-on AWS experience – Deploying, Supporting, and managing applications (sysops). * Proficient in using the typical Linux toolbox of open source software and management tools. * Experience with log management tools, e.g. Splunk, Logstash, Kibana. * Exceptional scripting skills (python, shell, golang). * Hands-on experience in troubleshooting and performance tuning of Java applications. * Solid understanding of networking systems and protocols – HTTP, TCP/IP, SSL, DNS. * Experience with automation/configuration management using Jenkins, Ansible, Puppet, Chef or similar tool. * Experience with Agile SCRUM development methodologies, Continuous Integration and Continuous Delivery (CI/CD). * Experience in quality control and validating services in a production environment. Sony is an Equal Opportunity Employer. All persons will receive consideration for employment without regard to race, color, religion, gender, pregnancy, national origin, ancestry, citizenship, age, legally protected physical or mental disability, covered veteran status, status in the U.S. uniformed services, sexual orientation, marital status, genetic information or membership in any other legally protected category. We strive to create an inclusive environment, empower employees and embrace diversity. We encourage everyone to respond. We sincerely appreciate the time and effort you spent in contacting us and we thank you for your interest in PlayStation. #LI-GM1 Sony Playstation Network San Diego CA

Manager, Site Reliability & Performance Engineering

Expired Job