IT problems that keep coming back after being "fixed" are a clear sign that the team is addressing symptoms, not the root cause. RCA (Root Cause Analysis) is the process that helps IT teams dig down to the true origin of a problem — so they can fix it properly and prevent it from recurring. This article explains what RCA is and how to write one using the 5W1H framework, in a way that works in real IT Support environments.
What Is RCA?
RCA stands for Root Cause Analysis — the practice of systematically answering "Why did this problem really happen?" rather than just "What did we do to make it go away?" It is a core process in IT Service Management (ITSM) and Problem Management under both ITIL and ISO 20000.
A clear example: an employee reports that their computer won't start. IT restarts it, the problem goes away, and the ticket is closed. But the same thing happens again next week. An RCA would reveal the real cause — a failing power supply — requiring a component replacement rather than just another restart.
Why Must IT Support Do RCA?
- Eliminate repeat incidents — Problems with a documented root cause don't come back.
- Build a Knowledge Base — Well-written RCAs can be converted into FAQ articles so employees can self-resolve similar issues in the future.
- Develop the team — RCA helps new support staff learn from real incidents in the organization.
- Reduce costs — Incidents that don't recur = hours saved.
- Meet ITSM standards — Both ITIL and ISO 20000 require a Problem Management process that includes RCA.
The 5W1H Framework for RCA
The most practical and memorable framework for IT teams is 5W1H — six questions that cover every dimension of a problem.
What — What happened?
Describe the problem clearly: the observable symptoms, the impact that occurred, and the scope of the problem (how many users, how many devices).
Why — Why did it happen?
This is the heart of RCA. Ask "Why?" repeatedly 3–5 times (the 5 Whys technique) until you reach the true root cause. Don't stop at the first symptom.
When — When did it happen?
Record the time the problem first occurred, how frequently it recurs, and whether there's a pattern (e.g., every Monday morning, or after a Windows update).
Where — Where did it happen?
Specify the location, device, system, or application affected. This helps narrow the scope of the root cause search.
Who — Who is involved?
The affected users, those responsible for the impacted system, and who took action to resolve it. This determines who should be notified and who must act.
How — How was it resolved?
Describe the resolution in step-by-step detail that any other support team member can follow — and include preventive measures to avoid recurrence.
Real RCA Example: Office-Wide Internet Outage
All 47 employees on Floor 3 lost internet access from 09:15 onwards. Both the ERP system and cloud storage were inaccessible.
Why is there no internet? → The Floor 3 switch stopped responding.
Why did the switch stop? → The UPS power failed suddenly.
Why did the UPS fail? → The UPS battery had degraded and could no longer support the load.
Why didn't anyone know the battery had degraded? → There was no UPS monitoring system in place.
Root Cause: Lack of preventive maintenance and monitoring for network infrastructure.
April 21, 2026 at 09:15 — outage lasted 2 hours 15 minutes.
Floor 3, Building A, Server Room — Rack 2, UPS Model APC SMT1500RM.
47 users on Floor 3 / Reported to: IT Support (Somchai) / Resolved by: Network Team / Asset owner: IT Infrastructure Team
Short-term fix: Replaced UPS battery on Floor 3 immediately; temporary failover through Floor 2 switch.
Long-term prevention: 1) Inspect all UPS batteries in the building 2) Install SNMP monitoring 3) Add PM schedule to check batteries every 6 months.
The 5 Whys Technique: Drill Down to the Real Cause
The 5 Whys technique involves asking "Why?" repeatedly until you reach a cause that can actually be fixed and prevented. In practice, 3–5 iterations are usually sufficient. The key is not to stop at the first answer.
| Level | Question | Answer |
|---|---|---|
| Why 1 | Why can't email be sent? | Server not responding |
| Why 2 | Why is the server not responding? | Disk is 100% full |
| Why 3 | Why is the disk full? | Log files accumulating without deletion |
| Why 4 | Why aren't logs being deleted? | Cleanup script not running |
| Root Cause | Why isn't the script running? | Cron job was removed during an OS upgrade last month |
The final answer is something that can actually be fixed and prevented — add cron job monitoring and a permanent log rotation policy. Simply deleting the logs and closing the ticket would guarantee the same problem returns.
RCA and the Knowledge Base — Turning Insights into Assets
A well-written RCA can be converted into a FAQ Article that employees can search and follow themselves in the future — especially the "How" section, which should be written as clear step-by-step instructions. A good ITSM system lets support staff click "Publish FAQ" directly from the RCA without rewriting anything.
Tickets with complete RCA data also help AI work more accurately — for instance, suggesting resolution approaches automatically for new tickets with similar symptoms.
Common Mistakes in Writing RCA
- Stopping at the first symptom — e.g., concluding the cause was "computer hanging" without digging into why it hung.
- Blaming people instead of processes — Good RCA asks "Which process failed?" not "Who is at fault?"
- Not documenting preventive measures — Resolving the problem without prevention means it will recur.
- Writing the RCA after forgetting the details — Write it immediately or within 24 hours of resolving the incident.
- Not sharing it with the team — An RCA nobody reads is the same as no RCA at all.
RCA in ITSM: How Does a Good System Help?
An ITSM system with built-in RCA features ensures that this process actually happens in the organization — not just in policy documents. What the system should provide:
- A 5W1H RCA form that's easy to complete within the ticket
- AI-assisted RCA suggestions — pulling data from similar past tickets as a starting point
- A "Publish as FAQ" button — converting the How section into a Knowledge Article directly
- Reports showing which closed tickets still lack an RCA
1StopService includes a 5W1H RCA Form with AI suggestions drawn from past tickets, along with a complete ITSM Checklist every IT team should know.
Try a System with RCA Form + AI — All in One Place
Try 1StopService free for 30 days — full features, no credit card required.
Start Free Trial