Chaos Engineering Fundamentals
Site reliability through controlled disruption
Chaos Engineering (CE) is pioneered by companies like Netflix and Amazon to proactively test how systems respond in presence of failure, to identify and fix problems before they become outages. Thanks to this approach complex and distributed systems can be more reliable and resilient.
During this 1 day course, you will be introduced to Chaos Engineering and be given the tools and techniques to get started with Chaos Engineering within your own organisation.
Break things on purpose, so that they don’t break on you
Do you feel your systems reliability, scalability, and stability could be improved?
This CE Training is suitable for anyone involved in IT development, IT Operations, IT Architecture and particularly in Reliability, Monitoring and anyone with an affinity with SRE.
About the Course Author and Lead Trainer
This instructor-led lab-based course is written by Mikolaj Pawlikowski, the author of the book Chaos Engineering: Site reliability through controlled disruption (Manning). Mikolaj leads a team of SREs managing Kubernetes at Bloomberg. He first started with CE as a surprisingly effective sleeping aid - the more failures his team simulated during working hours, the fewer outages were happening when they were asleep.
You will learn to
- Apply a chaos engineering experiment
- Improve site reliability
- Experiment with chaos into Kubernetes
- Get CE on your company roadmap
- Promote a safe to fail culture
- SRE 101 or why we’re all here
- SLI, SLO, SLA, error budgets
- Principles of Chaos Engineering
- Testing systems
- Blast radius
- Steady state
- Killing processes
- CE and Kubernetes
- Basic familiarity with Linux
- Basic familiarity with Python, Go or other high-level language (read example code)
- Basic familiarity with networking (IP, HTTP)
- Willingness to look at things from a different perspective
- EUR 500 per public seat. We accept company purchase orders.
- In-house training available on request.