I find system performance problems utterly engrossing. Although CPUs have accelerated (in line with Moore’s Law), and multiplied, and although memory has expanded, network communication speeds have increased and access times for storage media have shortened, software has become more sophisticated and we still find ourselves drumming our fingers whilst waiting for our systems to respond.
When you’re dealing with a performance problem you must think imaginatively and picture the way computing resources are being deployed. If you’re working on a client-server system, where large numbers of PCs might be accessing the same server software (often a database) it is more complicated still.
Sometimes a server processor is too slow (statistics might show that it is never idle), and processes must wait for its availability. In this case you must try to reduce the work it’s doing, by improving the efficiency of code, or by shifting processing from the server to the client.
Sometimes network communications are too slow, and you must reduce the volume of data being transferred between client and server, or reduce the frequency of transfer (web page size can matter in these circumstances, and program logic).
Sometimes a database is poorly designed and an application must ask too many times for data from too many records. This reveals itself as too much data storage activity, or too much database software processing. The answer, then, is to redesign the database, or configure it more effectively so that frequently accessed data are better cached.
Sometimes there is contention for resources. A large database system will often contain ‘control records’ that ensure consistent use of a system (such as a record holding the ‘next invoice number’, a record that many processes may need to update). If access to such records isn’t managed cleverly, one (or many) processes may have to wait for another process to finish its work. Sometimes the result is ‘deadlock’ when one process has ‘locked’ a record that another process wants, and that other process has ‘locked’ a record that the first one wants.
Instinct, and an ability to imagine these resources vying with each other, is a useful talent when it comes to solving problems. Doctors will attest that instinct plays an important role in the diagnosis of human sickness.
But sometimes you solve one problem and create another. Thirty years ago, when I was a junior programmer, I was working on the development and implementation of a financial management system that used a complex database to store its transactions. The client was exasperated by the poor performance of the system. Analysis showed that the CPU was idle most of the time, whilst the disk storage devices were overheating (metaphorically). I looked at the database design and I could see that it resulted in fragmented, almost randomly distributed data, when smooth physical distribution of sequential data was what was needed.
So I redesigned the database and built a caching mechanism that used an algorithm (using chains of pointers) to retain the most frequently accessed records in memory, where they could also be accessed without ‘locking’. Data access activity fell tenfold, but so busy was my algorithm (I mistakenly used bits rather than bytes as indicators in an unnecessarily sophisticated attempt to save on program size) that the CPU started steaming (metaphorically) instead. The CPU simply couldn’t get through its work. I’d improved overall performance slightly, but insufficiently, and the client abandoned the system. I’d moved the bottleneck from one location to another. There’s always a bottleneck.
But perhaps the most fascinating and frustrating performance problem I was involved in (on behalf of LLP Group) was at Otopeni Airport in Bucharest. Our client, Air BP, was using SunSystems for its financial accounting and reporting (for both local and corporate purposes). They called us, stumped, because, from time to time, and unpredictably, the system slowed markedly and became too frustrating to use. It was a typical client-server system. There were about eight PCs in regular use, and a server sitting in the corner of the office that contained the SQL database that SunSystems used.
We spent hours at Air BP’s airport office. We played around with parameters on their PCs and on the server, and when we were there (as so often happens) the system worked perfectly. It was the exact opposite of the ‘demonstration effect’ which causes software to go wrong just when you want to show it to someone. In this case, it never went wrong when they wanted to show us the problem.
We were stumped too. From time to time we would put our (metaphorical) spanners down and sit and think about it, and have a cup of tea. And usually whilst we were thinking and sipping the system would start to slow down again. Was there a malicious child sitting inside it? We’d go and look at the server parameters, and, hey presto, it started working well again. We looked at the server statistics. The CPU wasn’t busy, the disk storage device wasn’t busy. There was plenty of memory.
And then one of my more logical or imaginative employees had a brainwave.
‘When we’re working on the server, it works well. But when we’re not, after five minutes or so it gets slower. What happens on the server when we’re not working on it?’
‘No idea.’
‘The screensaver switches on.’
And that was it. There was a lovely swirly screensaver that came on after five minutes of inactivity on the server console. So complex and lovely were its graphical calculations that it used 95% of the CPU’s capacity. We simply switched it off.
Sometimes it’s the last thing you would think of.