Lessons learned from global outage
text size

Lessons learned from global outage

IT firms provide list of recommendations

TECH
AirAsia passengers queue at counters within Don Mueang International Airport Terminal 1 amid system outages disrupting the airline's operations in Bangkok on July 19. Systems were down from about noon on Friday until 10pm. (Photo: Reuters)
AirAsia passengers queue at counters within Don Mueang International Airport Terminal 1 amid system outages disrupting the airline's operations in Bangkok on July 19. Systems were down from about noon on Friday until 10pm. (Photo: Reuters)

The worldwide IT outage on July 19 underscores the need for public and private organisations to have a robust business continuity management plan, IT disaster recovery plan and rigorous system testing between security software vendors and clients, to deal with unexpected future incidents, say IT consulting and security firms.

They also recommend organisations adopt diverse technology portfolios and a balanced approach to cloud-based technology adoption.

According to Bloomberg, a faulty software update from cybersecurity firm CrowdStrike Holdings Inc affected 8.5 million devices globally that rely on the Microsoft Windows operating system.

According to Reuters, CrowdStrike said on July 21 that a significant number of the 8.5 million Microsoft-impacted devices were back online and operational.

Pochara Arayakarnkul, chief executive of digital transformation consulting firm BlueBik Group, told the Bangkok Post that the implications of the recent IT system outage have been far-reaching.

Multiple businesses reliant on CrowdStrike have encountered substantial disruptions, impacting critical service providers such as hospitals, airlines and government agencies.

"This incident has highlighted the vulnerabilities inherent in our interconnected digital infrastructure, exposing businesses to operational risks that can cascade across sectors," Mr Pochara added.

There are crucial lessons for developers and chief information officers to learn from the incident. It underscores the extensive risks associated with third-party software dependencies in modern business operations, he added.

The widespread reliance on such software means a single flaw can have cascading effects, disrupting entire business ecosystems.

The event serves as a reminder of the importance of rigourous system testing between software vendors and clients, along with organisations' contingency planning.

Regardless of the technology infrastructure employed, organisations will inevitably depend on critical software systems. Therefore, the focus of organisations' policymakers should pivot towards effective risk management and robust contingency planning to mitigate the impact of unforeseen events, Mr Pochara added.

Managing the risk of downtime, especially in mission-critical business operations, becomes very important. Effective strategies to mitigate such risks include robust business continuity management and IT disaster recovery planning.

Key considerations include assessing critical processes, establishing recovery objectives and timelines, and maintaining clear communication channels with stakeholders.

While third-party software dependencies pose inherent risks, thorough preparation through business continuity management and IT disaster recovery planning can significantly enhance an organisation's ability to manage and recover from unexpected events.

"By prioritising resilience and readiness, businesses can minimise the impact of outages and uphold operational continuity, safeguarding their reputation and ensuring customer trust," said Mr Pochara.

Satnam Narang, senior staff research engineer at Tenable Inc, a US-based cybersecurity company, told the Bangkok Post that the faulty software update pushed out to systems via the auto update mechanism caused the underlying operating system -- Microsoft Windows -- to crash.

Because this software is used in a large number of systems globally, it resulted in widespread outages affecting a variety of systems including those in critical infrastructure.

"This unprecedented outage has shown us just how reliant our society and critical systems are on software working correctly at all times. And this is a perfect example of why we should never put all our eggs in one basket and rely only a single vendor -- whether this is CrowdStrike, Microsoft or anyone else," said Mr Narang.

He added it is just like one wouldn't employ a singular approach to retirement savings -- in this case, though, a diversification of technologies used is the key to limiting risk.

The key takeaway is that the modern approach to IT architecture needs to include a broad portfolio of technologies, with companies not relying on any single platform to run critical business functions.

Firms should also continue to have robust outage and incident response plans that are drilled and updated regularly so that in the event of a major disruption, they can recover rapidly and deploy alternative solutions.

Lastly, this incident would have likely led to IT managers pausing the process of blindly applying vendor updates automatically without testing them first, Mr Narang noted.

He said policymakers should consider promoting a balanced approach to cloud-based technology adoption. While cloud services offer scalability and efficiency, relying heavily on a few major tech vendors can create single points of failure.

Diversifying technology providers and encouraging the development of robust, multi-cloud strategies can enhance resilience.

Additionally, promoting open standards and interoperability between different cloud services can help mitigate risks associated with vendor lock-in and improve overall system resilience.

Do you like the content of this article?
0 3
COMMENT (2)

By continuing to use our site you consent to the use of cookies as described in our privacy policy and terms

Accept and close