Senior Site Reliability Engineer

Mastercard Weldon Spring, MO , Saint Charles County, MO

Posted 3 days ago

Apply

This Job is not relevant Tell us why

We work to connect and power an inclusive, digital economy that benefits everyone, everywhere by making transactions safe, simple, smart and accessible. Using secure data and networks, partnerships and passion, our innovations and solutions help individuals, financial institutions, governments and businesses realize their greatest potential.

Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. About the Role The Business Operations (Biz Ops) team is seeking a Senior Site Reliability Engineer (SRE). The role of Business Operations Organization is to be the production readiness steward for Mastercard products. As a Business Operations SRE, we are responsible for ensuring that our platform is stable and healthy.

We break down barriers to run our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principals that includes operational design, automation, capacity planning, monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture.

We support daily operations with a hyper focus on triage, root cause by understanding the business impact of our products and subsequently performing blameless post-mortems. The goal of every Business Operations team is to engage early in the development lifecycle to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Business Operations teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments.

Ultimately, the role of Business Operations is to align Product and Customer Focused priorities with Operational needs by providing continuous feedback throughout the lifecycle. All About the Program You Support (Data Platform & Engineering Services): Our Mission: The MasterCard Enterprise Data Warehouse provides robust, high performance, and secure repositories of data assets.

Everyday, everywhere these unassailable data assets empower stakeholders to make intelligent, data driven decisions, allowing them to grow, diversify, and build business. Our Vision: We will become the premier data warehouse in the world by enabling cutting edge technology, analytics, business intelligence, visualization, and data science, driving value at MasterCard.

What you’ll do: • Plan, manage, and oversee all aspects of a Production Environment for Data Platforms, Business Intelligence Platforms & Data Cloud. • Manage, lead and coach the Site Reliability Engineers. • Define strategies for Application Performance Monitoring, Unit Cost and Chaos Engineering aspects. • Find ways for Continuous Optimizations in a Production Environment. • Ability to understand MTTR, SLO, SLI definitions and apply them to services. • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time. • Ensure reliable, fault-tolerant, efficiently scalable and cost-effective data, services and infrastructures. • Maintain services once they are live by measuring and monitoring availability, latency and overall system health. • Practice sustainable incident response and blameless postmortems. • Ensures that batch production scheduling and process are accurate and timely. • Able to create and execute queries to big data platform and relational data tables to identify process issues or to perform mass updates, preferred. • Ability to isolate problems between hardware and software. Working with appropriate team(s) and vendors until a resolution has been reached. • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement. • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews. • Maintain services once they are live by measuring and monitoring availability, latency and overall system health. • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices. • Take a holistic approach to problem solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimize mean time to recover • Work with a global team spread across tech hubs in multiple geographies and time zones What experience you need: • Bachelor’s degree in computer science, software engineering, or a similar field. • Experience in the Big Data technologies(Hadoop, Spark, Hive, MapReduce, Impala, NiFi, Looker, Elastic Search) • Experience in AWS or any Cloud Platforms. • Relevant data engineering, data infrastructure, DataOps, DevOps, SRE, or general systems engineering experience • Experience in managing large production platforms. • Experience in industry standard CI/CD tools (Git, BitBucket, Jenkins, Chef, Docker, Kubernetes) • Experience architecting and implementing data governance processes and tooling (data catalogs, lineage tools, role-based access control, PII handling) • Strong coding ability in Python or other languages like Java, C#, Golang, C, C++, Perl or Ruby etc., and a solid grasp of SQL fundamentals. • Experience with algorithms, data structures, scripting, pipeline management, and software design. • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. • Ability to help debug and optimize code and automate routine tasks. • Ability to support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed. • Interest in designing, analyzing and troubleshooting large-scale distributed systems. • Appetite for change and pushing the boundaries of what can be done with automation. • Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must. • Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is desired. • Good Handle on Change Management and Release Management aspects of Software.

Show Full Description

See how you match
to the job

Upload my resume

Download the
LiveCareer app and find
your dream job anywhere

Similar Jobs

View All

Want to see jobs matched to your resume?
Upload One Now!

Senior Site Reliability Engineer (L2) (Google Data Platform Services)

CVS Health

Posted Yesterday

VIEW JOBS

Senior Site Reliability Engineer Storage

Apple

Posted Yesterday

VIEW JOBS

Senior Site Reliability Engineer Remote

Epam Systems

Posted 3 days ago

VIEW JOBS

Apply