Self-healing software for a more reliable future
Institutional Communication Service
30 June 2025
Our dependence on digital technology drives us to allocate significant resources towards securing our systems against potential local or global blackouts. This perspective is highlighted by Prof. Mauro Pezzè, Full Professor at the Faculty of Informatics at Università della Svizzera italiana (USI), in an article published by laRegione.
In July 2024, the world suddenly came to a standstill - flights were cancelled, trains stopped running, the media ground to a halt, and banks and financial centres went haywire... - due to a global informatics panic, demonstrating, if proof were needed, our dependence on digital. The fault was a bug in the update of CrowdStrike, a US cyber security company with more than 20,000 subscribers, which affected Microsoft's operating systems with knock-on effects worldwide.
To prevent incidents like this in the future, IT researchers are working on developing adaptive software that can avoid errors and self-correct autonomously rather than striving for entirely error-free systems. One of the figures in this effort is Mauro Pezzè, a Professor of Software Engineering at the Faculty of Informatics at Università della Svizzera italiana and the University of Milan Bicocca.
Professor Pezzè, let us start with last summer's case: how can an update send the world into a tailspin?
Software is arguably the most complex product created by humans. For instance, Windows has approximately 50 million lines of code. We have reached a point where it is no longer feasible for a single person to fully comprehend the entire system's complexity. As a result, forming a team with individuals with various skills is essential. When a defect is discovered, the correction process involves adjusting only that specific element, and it must be expertly integrated; otherwise, severe issues can arise. While Windows is well-constructed, it is nearly 40 years old, and using it can feel like flying in a Boeing 747 from the 1980s. The same goes for its repair process.
So, if there is a lack of a broad perspective, are there knowledge problems?
Sure. That is why here at USI, we do not just teach programming; we also emphasise the importance of abstraction. This means developing the ability to understand the complexity and uniqueness of a system beyond its detailed components. This capacity is a defining characteristic of human beings.
Does error-free software exist?
Any human product that has a certain complexity cannot be error-free. I wrote a book, I read it carefully and aloud word by word, an editor corrected it, it was checked in pre-press, and yet it still contains errors. And we are talking about an object of infinitely less complexity than software.
If there is an error within the software or algorithm, is there not a risk of perpetuating that error through ongoing replication?
Absolutely. Software can make mistakes just like humans do, but it operates differently, leading to different types of errors. While software is not intelligent in the human sense, it presents an opportunity to improve our work processes and accomplish tasks that would otherwise be impossible or would take an enormous amount of time to complete.
We humans, however, can notice our mistakes and correct them. The software cannot...
That is precisely why at USI, we are working on what they call a "self-healing software system", a system in which the software corrects itself. The software must always be under our control and used intelligently. Let me give an example. No one constructs a building with the intention of it burning down; however, the possibility of fire always exists. That is why we have developed systems to detect and stop fires before they can spread. Similarly, in the field of informatics, we design systems to prevent general short circuits from occurring.
Instead of correcting the error, we prevent its consequences...
There are two main lines of research in this area. Some researchers focus on developing systems that automatically correct errors while also providing suggestions to programmers to help them resolve issues. Others, like our team, aim to predict and prevent errors, acknowledging that it is impossible to eliminate them entirely.
And do these systems work?
They function similarly to fire-fighting systems: while fires still occur, there are significantly fewer today than in the last century or in countries that invest less in prevention. Thus, we can conclude that they are effective, even if not in an optimal manner. It is important to note that self-adaptivity in fire-fighting systems is relatively new and needs further development. However, without this adaptability, we would be facing more bugs than we do today.
Will we ever arrive at software that is capable of not making mistakes?
If we distinguish between error and reliability, we are already there: we cannot guarantee that the bridge will not collapse, and we cannot guarantee that the plane will not fall, but we can guarantee a certain degree of reliability, which means precisely that the software is designed to prevent errors. Trivial example: when the plane lands, it shuts down and starts up again. What is the point of that? A man at a toll station does not turn his car off and on again, right? The point is that if I know that an error will occur sooner or later, but I also know that this will not happen before a specific time (say 24 hours), every time I land, I reset the system and start counting again. This is called the "mean time between failures ": no one can guarantee that a piece of software will not make errors, but one can guarantee that it will function reliably for a sufficient time.
Is it possible to develop intelligent or para-intelligent software that is both self-adaptive and self-healing?
Yes and no. We will undoubtedly reach a point where self-healing software becomes a reality. However, the concept of "intelligent" software warrants further discussion. Unlike humans, who can assess situations and protect themselves from significant mistakes based on judgment, software lacks this ability. That is why, despite all our advancements, we still need to code in strict programming languages, as they are interpreted with precision. In contrast, natural language is often ambiguous, while programming languages are designed to be clear and precise.
Will we still write programmes?
Skills will increasingly shift from writing to testing the programme. When I started my career, it was said that testing was done by those who could not programme; nowadays, software engineers do a lot of testing, because we have moved from the difficulty of programming to the difficulty of verifying the correctness of the programme. A profound change will likely occur: instead of having a programmer writing the code, a machine that someone will check will do it.
Will we understand this?
Probably not, just as we do not understand many complex things. No engineer knows all the details of a car or a plane, no linguist all the Italian vocabulary, but a man of average intelligence and average education can decrypt a text, depending on the level, in a more or less correct and profound manner. David Parnas, a noted scholar in the history of software engineering and recipient of an honorary doctorate from USI in 2009, emphasises that software engineering is a discipline that requires collaboration among multiple individuals. A single programmer may be capable of reading and understanding the workings of thousands of lines of code. Still, it is impossible for one person to thoroughly examine software of the scale seen in today's applications within a limited timeframe.
Do machines and humans speak two different or similar languages?
Programming languages are designed for machines to understand how to execute tasks rather than the meaning behind the commands. This makes them less intuitive for humans. However, it is possible that in the future, machines will evolve to comprehend vocabulary not just as instructions but as entities in their own right, applying these concepts in a more meaningful way.
So, will it have a semantic evolution?
Probably.
You explained that at USI, you teach students not only programming but also abstraction. In light of the latest statements, what does that mean?
Teaching programming languages should not be an end goal but rather a means to a greater understanding. While I may not be able to teach someone how to write a poem, I can certainly teach them how to use language as a tool for poetry. Our aim is not merely to show how to use a programming language but to help students look beyond it. We teach various programming languages because of their richness and because they allow us to grasp the underlying programming concepts. This understanding enables us to communicate effectively with machines. A self-healing system has four basic components: it must monitor the system, understand what it does, predict or interpret and finally implement. In collaboration with other research institutes, we are currently working on the second, which is what data we need and how we can use it to predict an error. I will soon be going to Sweden to an invited workshop to present our results on failure prediction in dynamic clouds.
Are there other fields of research?
The other thing we do is "automatic workarounds", an idea I think has a promising future. This concept involves instructing the software to automatically identify an equivalent piece of code that can perform the same function as the code that has failed. So far, we have tested this approach on Google Maps and achieved promising results.
Content produced and published in collaboration with laRegione.