Root Cause Analysis

What Is RCA?
How to Write Root Cause Analysis Using 5W1H

April 21, 2026 · 6 min read

IT problems that keep coming back after being "fixed" are a clear sign that the team is addressing symptoms, not the root cause. RCA (Root Cause Analysis) is the process that helps IT teams dig down to the true origin of a problem — so they can fix it properly and prevent it from recurring. This article explains what RCA is and how to write one using the 5W1H framework, in a way that works in real IT Support environments.

What Is RCA?

RCA stands for Root Cause Analysis — the practice of systematically answering "Why did this problem really happen?" rather than just "What did we do to make it go away?" It is a core process in IT Service Management (ITSM) and Problem Management under both ITIL and ISO 20000.

A clear example: an employee reports that their computer won't start. IT restarts it, the problem goes away, and the ticket is closed. But the same thing happens again next week. An RCA would reveal the real cause — a failing power supply — requiring a component replacement rather than just another restart.

Why Must IT Support Do RCA?

The 5W1H Framework for RCA

The most practical and memorable framework for IT teams is 5W1H — six questions that cover every dimension of a problem.

What — What happened?

Describe the problem clearly: the observable symptoms, the impact that occurred, and the scope of the problem (how many users, how many devices).

Why — Why did it happen?

This is the heart of RCA. Ask "Why?" repeatedly 3–5 times (the 5 Whys technique) until you reach the true root cause. Don't stop at the first symptom.

When — When did it happen?

Record the time the problem first occurred, how frequently it recurs, and whether there's a pattern (e.g., every Monday morning, or after a Windows update).

Where — Where did it happen?

Specify the location, device, system, or application affected. This helps narrow the scope of the root cause search.

Who — Who is involved?

The affected users, those responsible for the impacted system, and who took action to resolve it. This determines who should be notified and who must act.

How — How was it resolved?

Describe the resolution in step-by-step detail that any other support team member can follow — and include preventive measures to avoid recurrence.

Real RCA Example: Office-Wide Internet Outage

What — What happened?

All 47 employees on Floor 3 lost internet access from 09:15 onwards. Both the ERP system and cloud storage were inaccessible.

Why — Why did it happen? (5 Whys)

Why is there no internet? → The Floor 3 switch stopped responding.
Why did the switch stop? → The UPS power failed suddenly.
Why did the UPS fail? → The UPS battery had degraded and could no longer support the load.
Why didn't anyone know the battery had degraded? → There was no UPS monitoring system in place.
Root Cause: Lack of preventive maintenance and monitoring for network infrastructure.

When — When did it happen?

April 21, 2026 at 09:15 — outage lasted 2 hours 15 minutes.

Where — Where did it happen?

Floor 3, Building A, Server Room — Rack 2, UPS Model APC SMT1500RM.

Who — Who was involved?

47 users on Floor 3 / Reported to: IT Support (Somchai) / Resolved by: Network Team / Asset owner: IT Infrastructure Team

How — How was it resolved and prevented?

Short-term fix: Replaced UPS battery on Floor 3 immediately; temporary failover through Floor 2 switch.
Long-term prevention: 1) Inspect all UPS batteries in the building 2) Install SNMP monitoring 3) Add PM schedule to check batteries every 6 months.

The 5 Whys Technique: Drill Down to the Real Cause

The 5 Whys technique involves asking "Why?" repeatedly until you reach a cause that can actually be fixed and prevented. In practice, 3–5 iterations are usually sufficient. The key is not to stop at the first answer.

LevelQuestionAnswer
Why 1Why can't email be sent?Server not responding
Why 2Why is the server not responding?Disk is 100% full
Why 3Why is the disk full?Log files accumulating without deletion
Why 4Why aren't logs being deleted?Cleanup script not running
Root CauseWhy isn't the script running?Cron job was removed during an OS upgrade last month

The final answer is something that can actually be fixed and prevented — add cron job monitoring and a permanent log rotation policy. Simply deleting the logs and closing the ticket would guarantee the same problem returns.

RCA and the Knowledge Base — Turning Insights into Assets

A well-written RCA can be converted into a FAQ Article that employees can search and follow themselves in the future — especially the "How" section, which should be written as clear step-by-step instructions. A good ITSM system lets support staff click "Publish FAQ" directly from the RCA without rewriting anything.

Tickets with complete RCA data also help AI work more accurately — for instance, suggesting resolution approaches automatically for new tickets with similar symptoms.

Common Mistakes in Writing RCA

RCA in ITSM: How Does a Good System Help?

An ITSM system with built-in RCA features ensures that this process actually happens in the organization — not just in policy documents. What the system should provide:

1StopService includes a 5W1H RCA Form with AI suggestions drawn from past tickets, along with a complete ITSM Checklist every IT team should know.

Try a System with RCA Form + AI — All in One Place

Try 1StopService free for 30 days — full features, no credit card required.

Start Free Trial