What went wrong: A breakdown of the incident
On 19 July 2024, a faulty software update from cybersecurity firm CrowdStrike led to one of the most significant IT outages in recent history.
The update impacted systems running Microsoft Windows, causing disruptions worldwide. This incident, which affected approximately 8.5 million computers globally, highlighted the vulnerabilities and dependencies in our digital infrastructure.
A timeline of events
- Update Release: At 14:09 AEST on July 19, CrowdStrike distributed a driver update for its Falcon software for Windows PCs and servers.
- Immediate Impact: Almost immediately after 14:09 AEST, Windows virtual machines on the Microsoft Azure cloud platform began rebooting and crashing.
- Initial Response: At 15:27 AEST, CrowdStrike reverted the content update.
- Problem Identification: At 16:48 AEST, Google Compute Engine also reported the problem.
- Root Cause Identified: By 17:15 AEST, Google confirmed that the CrowdStrike update was at fault.
- Fix Deployment: At 19:45 AEST, CrowdStrike CEO George Kurtz confirmed that a fix had been deployed.
Ripple effects across the globe
The scope of the outage was vast. Approximately 8.5 million computers were affected, representing about 1% of all Windows computers worldwide. The sectors most impacted included airlines, media broadcasting, banking, and retail. Airlines faced flight cancellations and delays, media outlets experienced broadcasting interruptions, and banks and retailers dealt with service outages.
Recovery efforts have been extensive, with hundreds of engineers from Microsoft and CrowdStrike working around the clock. Although a fix has been developed, full recovery is expected to take up to two weeks, illustrating the magnitude of the disruption.
Impact on Australia: A closer look
In Australia, the outage caused significant disruptions across multiple sectors. Airlines such as Qantas and Virgin Australia experienced difficulties, leading to flight delays and cancellations. Retail and banking services were also affected, with many systems going offline. Despite the widespread impact, critical infrastructure and emergency services, such as triple zero call centres and healthcare systems, remained operational.
The Australian government activated the National Coordination Mechanism to manage the recovery efforts. Assistant Energy Minister Jenny McAllister emphasised the ongoing collaboration between the government and affected sectors to restore services.
Sector snapshots: How different industries were hit
Airlines and Airports
- The airline industry was hit particularly hard. Qantas and Virgin Australia faced significant disruptions, leading to numerous flight delays and cancellations. Sydney and Melbourne airports reported operational issues, with some airlines scrambling to normalise operations. The incident underscored the heavy reliance of the aviation sector on stable IT systems and the cascading effects of such outages on passengers and logistics.
Retail
- The retail sector saw widespread disruptions, with major supermarket chains like Coles and Woolworths being affected. Woolworths reported blue screen errors on some self-checkout systems, while Coles experienced delays in certain liquor stores. Many retailers faced issues with their point-of-sale systems and self-service checkouts, causing frustration for both customers and staff. This incident highlighted the critical role of IT infrastructure in maintaining seamless retail operations.
Banking and Financial Services
- Banks such as NAB, Bendigo Bank, and Commonwealth Bank (CBA) reported significant issues. Online banking services were disrupted, with some transaction delays, and PayID payments were temporarily unavailable for some customers. The outage demonstrated the vulnerability of financial systems to IT disruptions and the importance of robust contingency plans to maintain service continuity.
Telecommunications
- Telstra, Australia’s leading telecommunications provider, reported issues, though specific details were not provided. The telecommunications sector’s reliance on IT systems means that such outages can have far-reaching effects on communication services, impacting both businesses and individuals.
Media
- Some media outlets experienced broadcasting interruptions, affecting their ability to deliver timely news and entertainment. The media sector’s dependence on digital systems for content delivery means that IT outages can significantly disrupt operations and audience engagement.
General Business Operations
- Many businesses, both small and large, were unable to use Windows operating systems, Microsoft 365 applications, and Xero. The widespread disruption affected daily operations, highlighting the importance of reliable IT systems for business continuity. Approximately 8.5 million devices worldwide were impacted, emphasising the global nature of the incident.
Government response: Coordination and recovery
The Australian government activated the National Coordination Mechanism to manage recovery efforts. The Australian Cyber Security Centre issued warnings about potential scams exploiting the situation. IT support staff had to implement fixes in person, one computer at a time, making the recovery process labour-intensive and time-consuming. Some businesses deployed additional staff to assist customers during the recovery process.
Ongoing effects and economic impact
As of 22 July 2024, it was anticipated that some sectors might experience “teething issues” for up to two weeks. Economists like Shane Oliver from AMP suggested that the cost to Australia could be in the billions of dollars, although the overall impact might be mitigated by the timing of the outage and subsequent recovery efforts.
Professor Tim Harcourt from UTS indicated that while there was significant short-term disruption, long-term economic damage was unlikely, as much of the disrupted activity was likely delayed rather than completely lost.
Lessons learned: Building a resilient future
The incident underscored the vulnerabilities associated with heavy reliance on a single technology provider. Experts suggest that diversifying technological alliances is crucial to enhance national security and resilience. Additionally, the outage has prompted calls for a review of cybersecurity and software systems. Both the government and private sectors need to improve their IT management practices to prevent similar incidents in the future.
From a geopolitical perspective, the event highlighted the strategic risks of relying on foreign technology. Countries with more insulated IT infrastructures, such as China, were less affected, demonstrating the importance of diversified and secure technological dependencies.
Strengthening our IT systems
The Microsoft-CrowdStrike outage serves as a critical reminder of the fragility of our interconnected digital infrastructure. It underscores the need for robust cybersecurity measures, diversified technological dependencies, and improved IT management practices to safeguard against future disruptions. For Australia, the event highlighted the importance of national resilience and the need for collaborative efforts between government and industry to enhance digital security.
This incident, while disruptive, provides valuable lessons for the future, emphasising the need for preparedness, resilience, and adaptability in our increasingly digital world.