Microsoft’s massive outage explained in 10 points: Blue Screen of Death, Azure, CrowdStrike and all we know so far – Times of India

A massive IT outage has thrown businesses and critical services into chaos globally, with Microsoft Azure and Microsoft 365 services experiencing widespread disruptions. The root cause? A faulty update from cybersecurity provider CrowdStrike, affecting countless Windows PCs and servers across various sectors.

What is the root of this global IT crisis

The primary trigger was a defective update deployed by CrowdStrike, a major player in the cybersecurity field.It offers a suite of services including endpoint security, threat intelligence, and cyber attack response. A vast number of businesses worldwide lean on CrowdStrike to shield their Windows PCs and servers from cyber threats.
This update from CrowdStrike caused Windows machines to encounter the dreaded “Blue Screen of Death” (BSOD), effectively brick-walling them from booting up properly. The issue specifically affects Windows PCs and servers, while Mac and Linux systems remain unaffected.

Windows systems encounter the BSOD

Thousands of Windows machines worldwide are experiencing the dreaded Blue Screen of Death (BSOD) error, preventing systems from booting properly.
The Blue Screen of Death (BSOD) is a critical error screen that pops up on Windows systems when they hit a brick wall – a severe problem that brings normal operations to a screeching halt. In this instance, affected machines found themselves trapped in a recovery boot loop, unable to start up as usual.

The domino effect on Microsoft systems

The faulty CrowdStrike update acted like a wrecking ball, causing Windows machines to crash and get stuck in a boot loop, essentially rendering them useless. This didn’t just affect individual PCs but also servers running mission-critical business applications.
The simultaneous failure of millions of Windows systems triggered a domino effect, placing extraordinary strain on Microsoft’s data centres and network infrastructure. This sudden loss of countless endpoints, coupled with the overwhelming flood of reconnection attempts, severely compromised the foundation of Microsoft’s cloud services.
The crisis has been further worsened by the immense pressure on Microsoft’s authentication and identity management systems, as millions of devices and users simultaneously attempt to reconnect and verify their identities. This overload of critical systems has sparked a chain reaction of failures, causing widespread disruptions across Azure, Microsoft 365, and various other cloud services.

Microsoft’s 365 service also down

In a separate but equally disruptive incident, Microsoft 365, the company’s suite of productivity applications, hit a snag due to an Azure backend configuration change, further complicating matters for many organisations. Services like Outlook, Teams, SharePoint, and OneDrive are experiencing widespread disruptions.

How far-reaching is the outage

The outage has cast a wide net, ensnaring businesses and services globally. Airlines, banks, broadcasters, and even emergency services have reported significant disruptions. Some of the most notable impacts include:

Flights grounded and airports in disarray across multiple continents
Banking services thrown into turmoil
TV broadcasters forced off-air
911 emergency services compromised in several U.S. states
London Stock Exchange struggling to maintain operations

Microsoft’s official response

Microsoft acknowledged the issue, officially relating it to the CrowdStrike’s faulty update. The company further acknowledged the separate issues plaguing its 365 services, pointing to an Azure backend configuration change as the culprit.

Not a Cyberattack

CrowdStrike CEO George Kurtz has confirmed that this is not a security incident or cyberattack. The company has identified the issue, isolated it, and deployed a fix. However, affected machines will require manual intervention to resolve the problem.

Ongoing recovery efforts

While CrowdStrike has deployed a fix, the recovery process is expected to be gradual. Affected systems need to be brought back online individually.

Manual Fix Required

Resolving the CrowdStrike-related BSOD issue requires manual intervention. IT administrators need to boot affected Windows machines into safe mode and manually remove the faulty driver, a process that could take considerable time for large-scale deployments.

How long could the outage last

While CrowdStrike has deployed a fix, recovery may take considerable time. IT administrators face the challenge of manually resetting affected machines, a process that could take days or even weeks for large organisations. The fix advised by Microsoft and CrowdStrike is particularly challenging for cloud-based servers and remotely deployed laptops.
The widespread nature of the outage means that even after systems are restored, there may be lingering effects as businesses catch up on backlogged work and rescheduled operations.

Source link