As the HotSchedules Infrastructure Engineering Manager, you'll lead our infrastructure team and Clarifi products in strategy and implementation at AWS. The ideal candidate will come from a hands-on Systems Engineering background in an Enterprise IT, SaaS or cloud services team. While this is a team leadership position primarily, you may be called upon to contribute at the keyboard on occasion.
What the Job Entails:
Manage a growing Team of 4+ Infrastructure Engineers
Compute infrastructure operations and architecture
Storage infrastructure operations and architecture
Backup and disaster recovery infrastructure and processes
OS image development / patch management
Core services (DNS, SMTP, NTP, AD, LDAP)
Project planning and management
Collaborate with Software Engineering and other Operations teams to translate application requirements to infrastructure capabilities
Developing automation and tools, to reduce toil and improve repeatability of processes.
Define reliability metrics(KPIs, SLOs), and work to ensure services meet them.
Develop runbooks and processes to reduce MTTR in incidents.
Collaborate with core infrastructure and service engineers to improve service reliability and scalability
Department budgeting and manage support contracts with vendors
Recruiting, training and performance evaluation of team members
.Troubleshoot issues across the entire stack, software, hardware, cloud, and networking.
Other responsibilities as assigned by leadership
Our Ideal Candidate:
3+ years' experience as an Engineering Manager
5+ years' experience and ability to demonstrate expertise in UNIX / Linux administration
Solid understanding of TCP/IP networking and SAN/NAS storage technologies
Must be adept at identifying and eliminating performance bottlenecks and making performance-related recommendations (hardware, software, and configuration).
In depth knowledge of DNS principles and architectures
Familiarity with Database Concepts, i.e. Replication, Sharding, Backups
Familiarity with information security principles and best practices in virtual environments
You have a deep understanding of cloud infrastructure.
You've built tooling to improve reliability of systems, automated remediation of issues, or improve scalability.
You have 4 or more years experience working in production environments at scale, and want to improve our availability and performance.
Systems often need to be reconfigured, so you should have experience with a configuration management system like Puppet, Chef or Salt. (We use Salt.)
You should be able to clearly communicate technical details when speaking or writing.
This position is part of a well established team, and you should be excited about working closely with them, and product development teams.
Working in the cloud is a little different, so it would be great if you have some experience with AWS or GCP.
Our environment often has new challenges and technologies, so we want a candidate who is excited to learn.