Correction of Error (COE)
Correction of Error is a process for improving quality by documenting and addressing issues. You will want to define a standardized way to document critical root causes, and ensure they are reviewed and addressed.
Overview
Applying a COE process helps you ensure your team understands root causes, that they have been reviewed in a consistent way, and have been addressed correctly.
Structure of a COE
A well-structured COE should answer the following questions:
- What happened?
- What was the impact on customers and your business?
- What was the root cause?
- What data do you have to support this?
- Provide metrics and graphs.
- What were the critical pillar implications, especially security?
- When architecting workloads you make trade-offs between pillars based upon your business context. These business decisions can drive your engineering priorities.
- You might optimize to reduce cost at the expense of reliability in development environments, or, for mission-critical solutions, you might optimize reliability with increased costs.
- What lessons did you learn?
- What corrective actions are you taking?
- Action items
- Related items (trouble tickets, etc.)
Review Process
You should have your COE reviewed by your team, as well as other teams.
- High impact COEs: Should be reviewed during your operational meetings.
- The Wheel: Use the iterative nature of the COE process to continuously improve.
Read more about COEs and their origin here.