Character AI Menlo Park , CA 94026
Posted 4 weeks ago
About the role
The Role:
As the founding member of our DevOps/Site Reliability Engineer function here at Character, you'll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site. You'll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.
Specific Responsibilities:
Maintain production services and keep them operational.
Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.
Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.
Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.
Establish and support SLAs and SLOs for our site
Provide system monitoring and incident alerts
Participate in on-call rotations to provide support for critical incidents and outages.
Develop plans for site reliability and disaster recovery
Job Requirements:
5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale
Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang
Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base.
Experience working with multiple cloud computing platforms such as GCP is also a must
Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems
Experience with incident management and event postmortems
Desired Experience:
Familiarity with GPU clusters and/or HPC environments is preferred
Experience with monitoring and logging tools such as Prometheus and Grafana
Hands-on experience scaling a consumer product from early days into hypergrowth
Ready to empower the world with AGI?
Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI powerhouse and ranks among the most utilized AI research platforms globally. Our innovative approach allows users to customize their experience with personalized AI 'Characters.'
In just two years, we achieved unicorn status and were named Google Play's AI App of the Year - a testament to our groundbreaking technology and vision.
Noam co-invented core LLM tech and was recently honored as one of TIME's 100 Most Influential in AI. Daniel created LaMDA, the breakthrough conversational AI now powering Google's Bard.
We encourage you to apply even if you don't meet all qualifications. Underrepresented individuals often experience imposter syndrome-don't underestimate yourself.
Our commitment to diversity:
Character values diversity and welcomes applicants of all backgrounds. We are an equal opportunity employer and firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to us.
Character AI