AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain, and we're looking for talented people who want to help.
You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
The Incident Management Service (IMS) team is building the platform that AWS uses to coordinate response during high-severity incidents. When AWS services are degraded, incident responders use IMS to detect, triage, mitigate, and resolve issues, coordinating across dozens of service teams in real time. We are building the next generation of Incident management tooling - a unified platform that must remain available and performant precisely when AWS infrastructure is unhealthy, deployed across three AWS regions with automated failover. You will own significant portions of the service architecture: the data layer, authorization system, API model, and integrations with incident automation systems. You will design and deliver components across the stack, drive cross-team technical alignment, and mentor other engineers. You need to be a strong software developer with a track record of delivering production services, and also excel in communication and technical leadership. You'll use agentic AI development to move fast from concept to production. This is an opportunity to own architecture on a high-visibility platform that is used during AWS's most critical moments.
Key job responsibilities
Design and implement service components for a multi-region, multi-tenant incident management platform. Own subsystems including the data layer, authorization, and API surface. Build integrations with incident automation systems, conference bridge providers, and downstream event consumers. Drive technical design decisions, balancing reliability, performance, and delivery speed. Participate in operational support and ensure the service is resilient during the incidents it is designed to manage. Mentor other engineers and lead technical design reviews. Use agentic AI development practices to move quickly from concept to tested, production-ready code.
About the team
We are a high-performing team building the incident management platform for all of AWS. Our software is used during AWS's worst moments, so reliability is not optional. We operate what we build, and every engineer has direct visibility into how their code performs during real incidents. We value high delivery velocity, pragmatic architecture decisions, and engineers who take ownership beyond their assigned scope. We use agentic AI development practices and invest in tooling that lets engineers and agents validate changes locally before they reach the pipeline.
- Experience (non-internship) in professional software development
- Experience designing, building, operating, and managing large-scale distributed systems or web services
- - Experience using generative AI tools to accelerate engineering workflows
- Experience or certifications in API design, cloud architecture/deployment, service-oriented architecture, mobile development, performance optimization, databases design and related fields
- Experience with authorization systems (IAM, RBAC, or attribute-based access control)
- Experience mentoring other engineers
- Experience writing technical design documents and driving alignment across teams
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.