Site Reliability Engineer

Solekai Systems Corp Pittsburgh , PA 15201

Posted 2 months ago

Are you ready to step up to the New and take your technology expertise to the next level?

Join Accenture and help transform leading organizations and communities around the world. The sheer scale of our capabilities and client engagements and the way we collaborate, operate and deliver value provides an unparalleled opportunity to grow and advance. Choose Accenture and make delivering innovative work part of your extraordinary career.

People in our Client Delivery & Operations career track drive delivery and capability excellence through the design, development and/or delivery of a solution, service, capability or offering. They grow into delivery-focused roles, and can progress within their current role, laterally or upward.

As part of our practice, you will lead technology innovation for our clients through robust delivery of world-class solutions. You will build better software better! There will never be a typical day and that's why people love it here. The opportunities to make a difference within exciting client initiatives are unlimited in the ever-changing technology landscape. You will be part of a growing network of technology experts who are highly collaborative taking on today's biggest, most complex business challenges. We will nurture your talent in an inclusive culture that values diversity. Come grow your career in technology at Accenture!

The Performance Engineering practice within Accenture Technology is focused on optimizing the performance and scalability of enterprise applications through the combination of:

  • Testing: Analyzing, planning and executing production-like simulations across mobile and web solutions to identify & remediate performance problems, prevent production outages, and guarantee predictable performance.

  • Diagnostics & Monitoring: Instrumenting the complete application architecture to provide real user and system performance data to provide insight into the root cause of all application bottlenecks, enable real time visibility to reduce risk exposure.

  • Performance Analytics: Measuring the relationship between end-to-end performance, user behavior, and business goals to maximize the digital business, improve business KPIs, and increase client retention.

  • Business Optimization: Empowering digital businesses with contextual intelligence to visualize, quantify and maximize the business value of performance to improve the quality & performance of the business, increase customer satisfaction, and protect brand reputation.

As a Site Reliability Engineer, some of your key responsibilities may include:

  • Maintain responsibility for the design, deployment, and maintenance of production-scale systems.

  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

  • Use automation to streamline the provisioning, management, and monitoring of applications and services using multiple scripting languages, Java, and infrastructure-as-code.

  • Facilitate blameless Incident Retrospectives to understand root causes, communicate learnings, determine remediation and make us better and closer as a team.

  • Coordinate with development and platform teams to design and implement zero-downtime deployment approaches, real-time logging, alerting, and monitoring solutions, and code instrumentation.

  • Introduce chaos engineering concepts that promote experimentation in production to identify systemic weaknesses while increasing service resiliency

  • Coordinate with the solution architect to design a highly available solution that meets availability and reliability objectives and to reduce manual activities using automation, when feasible.

  • Identifying, evaluating, and recommending monitoring tools and diagnostic techniques relevant to the application architecture. Assess gaps in as-is monitoring tool capabilities and recommend tools to augment or replace.

  • Instrumenting applications to enable performance diagnostics and monitoring

  • Collaborating with developers to promote the concept of reliability engineering during all phases of the SDLC to detect and correct performance issues earlier in the lifecycle

  • Monitoring application performance during performance tests or production usage through the use of APM and other monitoring tools to isolate the fault domain, dive deep into application code, and identify root cause of performance issues.

  • Interacting with client and/or Accenture development, operations, and infrastructure resources to recommend solutions to remediate performance issues

  • Participating in re-architecture, redesign, and refactoring decisions to satisfy performance requirements

  • Developing dashboards and reports to provide ongoing visibility into the performance of client applications

  • Contributing learnings and experiences to the Accenture Performance Engineering community

  • Requirement of 80-100% travel, typically M-TH

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Software Engineer Site Reliability Engineering

Google Inc.

Posted 3 days ago

VIEW JOBS 5/26/2020 12:00:00 AM 2020-08-24T00:00 Minimum qualifications: * Bachelor's degree in Computer Science or a related technical field involving software or systems engineering, or equivalent practical experience * Experience programming in at least one of the following languages: C, C++, Java, Python, or Go. * Experience with algorithms and data structures. Preferred qualifications: * Expertise in designing, analyzing, and troubleshooting large-scale distributed systems. * Ability to debug, optimize code, and automate routine tasks. * Systematic problem-solving approach, coupled with effective communication skills and a sense of drive. * Understanding of Unix/Linux operating systems. Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. To learn more: check out our books on Site Reliability Engineering, watch a recorded Hangout on Air to meet some of our SREs, or read a career profile about why a Software Engineer chose to join SRE. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible. * Engage in and improve the lifecycle of services-from inception and design, through to deployment, operation and refinement. * Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews. * Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. * Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity. * Practice sustainable incident response and blameless postmortems. Google Inc. Pittsburgh PA

Site Reliability Engineer

Solekai Systems Corp