Principal Site Reliability Engineer

Avidxchange Charlotte , NC 28201

Posted 2 weeks ago

By trade we are a technology company, but if you ask anyone that works here, they'll tell you we are a people company. As the industry leader in Accounts Payable (AP) Automation, AvidXchange strives to provide an innovative and collaborative work environment. We do that through focusing on our people, our culture, and ensuring we run our business in a way that enables every employee to achieve their fullest potential and help us create a world class company. Our employees live by our core values, including "Innovate to Change the Game," "Passion About Customer Success," "Win as a Team," and "Have a Blast." Whether you live in Charlotte and can enjoy our corporate campus at the AvidXchange Music Factory, or you live across the country, AvidXchange has locations waiting for you. We are on a mission to create something different at AvidXchange. Love where you work. Live Avidly.

The Principal Site Reliability Engineer is responsible for providing continuous feedback of site health, reliability, availability, and user experience for all AvidXchange core products.The Principal Site Reliability Engineer will also be the technical leader for helping transform and then maintain existing SaaS Operational processes and practices into those of true Site Reliability Engineering.Meaningful and relevant real-time measurements for production environments will be collected, aggregated, analyzed, and ultimately provided as a feedback loop to the business, including Software Engineering and Product, to provide insight and visibility into product performance and activity.The Principal Site Reliability Engineer will provide user experience analysis to internal business partners, executive leadership and product / software engineering teams to help drive changes to increase customer satisfaction, product availability and reliability.In addition to monitoring and insight, a heavy focus will be placed on automation opportunities and automating operational processes to maintain 99.9% availability of AvidXchange core products.

Job Duties:


  • Define and execute strategy to transform existing SaaS Operational processes and practices into those of true Site Reliability Engineering (defining and implementing SLOs aligned to application domains, downtime budgets, error budgets, etc).This includes cross-functional, technical leadership to communicate and coordinate strategy across Operations and Software Engineering.Once implemented, tune and maintain the Site Reliability Engineering strategies and processes.

  • Define strategy and tools for measuring core product health in production (with opportunities to extend those capabilities all the way back through the entire DevOps pipeline)

  • Define strategy and methodology for calculating system availability SLAs across AvidXchange products

  • Define strategy for measuring and testing of site reliability using chaos-monkey based methodologies

  • Define tool consolidation strategy to optimize spend versus value for our end to end monitoring platform


  • Define strategy, standardize technologies, and establish patterns for rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks

  • Define strategy for the DevOps Principal of 'Feedback" by creating user experience measures for all AvidXchange products

  • Work with the Software DevOps team to define strategy for DevOps CICD continuous performance testing, monitoring, and reliability strategy using Visual Studio Team Services and other cloud-based tools

  • Work with the Software DevOps and Performance Engineering teams to define strategy for DevOps CICD performance and monitoring quality gates within the delivery pipeline

  • Define methodology to measure core product availability across Azure and AvidXchange Cloud using HTTP endpoint testing and synthetic user testing

  • Maintain automated site availability reporting and data platform


  • Present usability, reliability, incident, and user experience of AvidXchange products to executive leadership on a weekly basis

  • Define and report SLOs / SLAs for 99.9% availability to executive leadership and business partners

  • Influence product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability

  • Provide detailed analysis and troubleshooting for systems outages providing feedback to product / software engineering

Areas of Impact:

  • Work results influence all AvidXchange products over the next 1 to 5 years

  • Sets day-today objectives and delivers job responsibilities for self, ops teams, and product teams


  • A minimum of five (5) to eight (8) years of experience is typically required to perform at expectation.

  • Bachelor's degree in Computer Science or Information Technology is preferred

  • Relevant Certifications strongly preferred


  • 6 years or more of Experience with Dynatrace AppMon, Dynatrace SaaS or competing products

  • Measure site availability using synthetic testing platforms such as Panopta or Gomez

  • Understanding of web hosting infrastructure and high availability architecture

  • Experience measuring and monitoring .NET applications, SQL Servers/Database, and Serverless cloud resources or equivalent Java-based experience

  • Execute queries on Microsoft SQL Server databases defined by existing standard operating procedures.

  • Using Advanced SQL Server 2014+ including stored procedures, indexes, and functions

  • Troubleshoot solutions with service oriented or micro service architectures

  • PowerShell or Linux scripting for creating automated routines for ensuring site availability

  • Development/coding experience and skills for writing custom automation solutions

  • Experience working in an Agile software development environment (Scrum / Kanban)

  • Knowledge and skills surrounding Public Cloud architectures (Azure experience highly desired)


  • Strong technical leadership and interpersonal skills.

  • Dependable, motivated and quick learner

  • Performs analysis of complex systems and presents findings

  • Defines strategies that impacts all AvidXchange products

  • Provides consultation to teams throughout AvidXchange

  • Able to rapidly comprehend the functions and capabilities of new technologies.

  • Works collaboratively and openly seeks and shares information across the enterprise.

  • Industry and Enterprise level thinking.


AvidXchange is an equal opportunity employer. AvidXchange is committed to equal employment opportunity in accordance with applicable federal, state and local laws. AvidXchange will not discriminate against

applicants for employment on any legally recognized basis. This includes, but is not limited to: veteran status, race, color, religion, sex, sexual orientation, gender identity, gender expression, national origin, age

and physical or mental disability.

Other details

  • Job FamilySoftware Engineering

  • Pay TypeSalary

  • 1210 AvidXChange Lane, Charlotte, NC 28206, USA

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer


Posted Today

VIEW JOBS 3/21/2019 12:00:00 AM 2019-06-19T00:00 Position Description The Site Reliability Engineer is responsible for all application environments from development to production. The ideal candidate should have hands on experience learning, triaging (both proactive and reactive) and documenting application stacks, using monitoring tools (Splunk, AppDynamics, UI-session replay, Sentry, and/or others) and have expert-level proficiency in at least one area such as content delivery, application development (Java, JavaScript), networking or infrastructure. They should understand web traffic movement through all layers of infrastructure including HTTP, CDNs, load balancers and firewalls. The Site Reliability Engineer will partner with application development and API teams to gain understanding of the application stacks, triage environment issues, design monitoring methods, and provide reporting to executive leadership Will be critical part of an SRE team which will be the single point of contact for our Agile development and product teams regarding all application reliability, performance and environment issues. Job Responsibilities * Partner with the Agile development teams to learn and assume responsibility for documentation, logging, and monitoring for various systems * Partner with DevOps on CI/CD improvements using Bitbucket, Jenkins, OpenShift & AWS * Implementation of monitoring on various online applications using solutions such as Splunk, UI-session replay, AppDynamics, etc. and ability to determine the right toolset to accomplish monitoring goals on net new application stacks * Strong knowledge of custom alerts and ability to integrate data housed in disparate data sources to create workflow driven alerting * Administration of web servers (Node.js, NGINX, JBoss, Apache, etc.) * Continuously tune and validate quality of current tools for network, system monitoring, UI-session replay, log file parsing, and implement a toolkit that works * Assist in vulnerability scanning, RCA proposals for defects in Scrum team backlogs * Participate in routine Agile and Scrum ceremonies Qualifications * Must have expert level knowledge of: * Content Delivery Networks (CDN) * Supporting customer facing web applications * HTTP * Application Performance Monitoring (APM) * Must have some experience with: * Leading Triages * Monitoring tools (Splunk, AppDynamics, and/or others) * SQL, Linux, Scripting, file manipulation, reporting and Visio * Big data elements like server logs, user URL's, etc * CI/CD tools such Bitbucket, Jenkins, OpenShift, & AWS CI/CD tools * Ability to communicate effectively to various levels of Sr. Management -- Technology and Business * Experience and capability to lead small teams * Ability to work off-hours and/or weekends as needed Additional Desired Knowledge & Skills: * Experience with complex multi-system environments * Working knowledge of Agile methodologies (Scrum, Kanban, Lean, XP) * Experience supporting hybrid server environments (on-premise, AWS, Azure, etc.) * Good understanding of financial industry operations metrics and reporting practices a plus * Passion, positive attitude, engagement and desire to take over challenging assignments as part of a team to make things WORK Business Unit/Enterprise Function Description Ally's world-class IT organization supports an information technology driven business. We deliver industry-leading IT solutions to the Best Online Bank (Money Magazine, 2011 and 2012) and the leading Auto Finance Company. IT oversees critical functions that enable the day-to-day operations of the entire Ally Financial enterprise. Total Rewards Ally's compensation program offers market-competitive base pay and pay-for-performance incentives (bonuses) based on achieving personal and company goals. Plus, we have a flexible paid-time-off program including time off for volunteer opportunities. Ally's Total Rewards Program is designed to enrich your life at work -- and outside of it and includes: * Industry-leading 401K retirement savings plan with matching and company contributions * Wellness program encouraging healthy living with financial rewards * Flexible health insurance options including dental and vision * Pre-tax Health Savings Account with generous employer contributions * Pre-tax commuter benefits * Other work-life integration benefits including parental and caregiver leave, adoption assistance, backup child and adult/elder day care program, child care discounts, tuition reimbursement, LifeMatters® Employee Assistance Program, subsidized and discounted Weight Watchers® program and other employee discount programs Ally is an Equal Opportunity Employer We extend equal employment opportunities to qualified applicants and employees on an equal basis regardless of an individual's age, race, color, sex, religion, national origin, disability, sexual orientation, gender identity or expression, pregnancy status, marital status, military or veteran status, genetic disposition or any other reason protected by law. 2E Equal Opportunity Employer Minorities/Women/Protected Veterans/Disabled Ally Charlotte NC

Principal Site Reliability Engineer