Staff Engineer Site Reliability

Albertsons Company Inc. Phoenix , AZ 85002

Posted 3 weeks ago

About the company

Albertsons Companies is at the forefront of the revolution in retail. With a fixation on innovation and building culture, our team is rallying our company around a unique vision: forging a retail winner that is admired for national strength, deep roots in the communities we serve, and a team that has passion for food and delivering great service.

Albertsons is one of the largest retail employers, providing approximately 300,000 jobs across 2,200 stores, 22 distribution centers, 20 food and beverage plants and various support offices. We operate in 34 states and the District of Columbia under the Albertsons banner, as well as Safeway, Tom Thumb, Jewel Osco, Shaw's and many more recognizable names.

What you will be doing

This role will be an individual contributor responsible for building and finetuning the platform components for the Observability product. The candidate will work closely with the Lead engineer, performance team, data ingestion, platform DevOps and data visualization teams under Observability product. As a member of the platform team, the candidate needs to be able to support and maintain the applications onboarded to Grafana Observability, Ingestion and visualization written in PromQL, Log queries, etc., and monitoring technologies.

This position will preferably be based out of Phoenix Az, but also open to locations near our division offices.

Key Responsibilities:

  • Experience in Observability and Monitoring initiatives as platform Engineer.

  • Troubleshoot platform issues and restore service by resolving customer-facing incidents

  • Agile development experience with team member accountability for commitment and delivery each sprint.

  • Troubleshoot and implement corrections to problems associated with connectivity between the supported applications and the clients they serve

  • Provide technical guidance, in the diagnosis of issues as they arise in support of critical applications

  • Drive collaboration sessions among IT and business groups to facilitate optimal support and operation of the relevant applications

  • Provide Site Reliability Engineering techniques such as observability, alerting and performance tuning

  • Contribute to the design, implementation, and enhancement of critical applications

  • Perform proactive analysis and troubleshooting to predict and prevent production incidents

  • Define and contribute to monitoring capabilities for critical applications

  • Collaborate with key vendors on functional, performance and capacity improvements

  • Design and build tools to automate support and monitoring functions

  • Ensure that all implementations of observability meet the requirements prescribed by IT Services through the effective implementation or use of approved processes, methodologies, and deliverables.

  • Provide expertise and build solutions for observability applications as well as system integration with internal systems and external vendors.

  • Track infrastructure delivery and dependencies to implementation.

We are searching for someone with the following skills:

  • Experience with gathering and organizing large volume of data to use for instrumentation into an Enterprise Observability solution.

  • Experience with recommending baseline monitoring thresholds, and performance monitoring KPIs and SLAs.

  • Experience with installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.

  • Good Knowledge and understanding of Azure foundation components e.g. App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM etc. is required.

  • Experience must include at least one of the following languages: Java (required), Desired--Python, Go, C, C++.

  • Experience with Databases Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB or similar databases.

  • Knowledge of monitoring tools such as Log Analaytics, App Dynamics, Grafana, Prometheus, Splunk, and Sitescope

  • Experience in working with ServiceNow or similar Service Management tools

  • Familiarity with Cloud technologies in Azure, AWS, and Google Cloud

  • Experience in working with teams in remote locations

  • Experience on PCF, Docker, Kubernetes platform is required.

  • Experience with DevOps and CI/CD tools and processes is required.

  • Experience in high-performance and high-frequency data streaming (using Kafka etc.) and handling large volume of batch data is strongly preferred, but not required.

  • Experience with Agile/Scrum methodologies is required.

We believe the successful candidate has these qualifications and experience:

  • 4-year degree (Computer Science, Information Systems, or relational functional field) and/or equivalent combination of education or work experience.

  • 10+ years of experience on integration engineering related to Observability/Monitoring framework and on two or more APM Tools (AppDynamics, Datadog, Splunk, Dynatrace, Kibana, Elastic etc.).

  • Hands-on experience with Tools and Technology is preferred.

  • 5+ years of experience as a System Reliability Engineer is required.

  • Experience working with Open-source platforms and Open Telemetry libraries e.g. Grafana is preferred.

What it is like at Albertsons?

Albertsons Culture Principles

Compassion: We always treat each other with kindness and respect

Team: We always support and recognize each other

Inclusive: We always value everyone's perspective

Learning: We always strive to grow and develop ourselves and others

Competitive: We always act with integrity to win over the customer

Ownership: We always take actions to drive our success


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Staff Engineer Site Reliability

Albertsons Company Inc.