Blog

All Blog Posts

The Impacts of a Slow Software System and How to Prioritize Fixing It

Slow software system

Have you ever looked back at the work you did years ago and wonder, "Why did I do that?"? Or, after you solve a problem, doesn’t the answer seem so clear? This happens all the time in software development. 

We recently had a client who reported system slowness—on a platform we built many years ago. Slowness had plagued the system for a year or so, but it finally reached the point where it was unbearable. When the system was originally developed, there wasn’t an issue with speed. But as the amount of data continued to increase over time, performance continued to slow. 

Performance degradation of software is comparable to the fable of the frog that slowly gets boiled alive. You don't really notice the slowness until it gets painful. I believe that was the case with this client. They didn't really notice 1 or 2 seconds of wait time, but 1-2 seconds turns into 5-10 seconds and it becomes more noticeable. And waiting more than 30 seconds for something to load is impossible to miss. 

The client reached the point where waiting 30 seconds wasn’t abnormal—system speed was impacting their day-to-day business. Other than speed issues, the system still works well for their business case. 

They eventually approached us about the issues they were experiencing. Here’s the approach we took, what we learned, and how we’re using that information to improve what we do for all clients. This process can be used for any system experiencing slowness or other issues that come up as software ages.

1. Re-create the Issue

If you can’t re-create the issue, at least prove that the issue exists and eliminate the go-to fixes (clear caches, try browser-based systems in private mode, log out and log back in, etc.).

The client told us the system was performing slowly, but when we checked the system on our end, we didn’t experience the delays. Since their custom CRM system was built more than 12 years ago, before Application Insights (the Azure tool we now use to monitor systems) existed, we didn’t have good visibility into the system’s performance. But we realized we did have access to all the IIS logs, which we could report on using Log Parser Studio. Using this tool, we were able to process the log files and determine, on average, which pages had performance issues. This was a very primitive way to get the information we needed, but it was effective.

2. Prioritize 

Once we were able to see the slowness, we helped the client understand the implications of it. We worked together to prioritize the order in which we should tackle the issues using some simple math. We didn’t just prioritize the slowest requests first—we also took into account usage. We multiplied the total number of requests by the average response time, which gave us an idea of how long the users, in aggregate, were waiting for the requests. The calculations represented the amount of time users could save when interacting with the system. This helped us identify which issues, once fixed, would give us the biggest bang for the client’s buck.

3. Identify the Cause and Fix the Issues

Based on priority, we dug in. To find the root cause, we implemented a tracing tool called Prefix, which helped us visualize what was causing the slowness. We discovered there were several N+1 issues in the system. Basically, if there was a grid displaying 100 items, each item required a database call to be made. So instead of getting all the data in 1 call, it required 101 database calls. Each call was very fast, but in the aggregate, it significantly slowed the system.  

In this case, it wasn't 101 calls. We were working with thousands of database requests. And the reason the slowness was sporadic and difficult to re-create is because of a data caching strategy used in the system. All of Far Reach’s systems are monitored for long-running queries, but we had a gap in detecting requests that generate a large number of database calls. (We discussed this in a recent retro and now have plans to proactively detect these issues in all the systems we build.)

We fixed the root cause, ran the updates through our testing process, and released the newly sped-up system into production. 

4. Measure and Follow Up

After we implemented the improvements, we measured response times again and compared them to the metrics from before the updates. The initial changes we made saved our client's staff 7 hours per day—time they can now spend helping their customers instead of waiting for data to load. 

We did a second round of changes that saved them roughly 45 minutes per day. For one staff member, a task that was taking three hours of elapsed time now takes under an hour. This type of transformation can have major positive implications for workflow and productivity, not to mention the user's state of mind. Can you imagine spending half of your day just waiting for data to load as you work in your CRM? I can’t imagine the frustration this client’s team must have experienced on a daily basis. 

Lessons Learned

As always, we learned a lot through this project, and the client did too. Here are some things we can all think about as we consider our software strategy

Time Delays Add Up

When it comes to load time, think in the aggregate. Five seconds is not a long time to wait for one person accessing a page once per day. But 5 seconds is significant if the page is used hundreds of times per day by a lot of different users. 

If the system is only for internal use, like the CRM in this example, load speed can tie directly to the bottom line. When there are delays, you’re paying people to wait and they’re not able to complete as much work as they could with a faster system. What else could your team be doing instead of waiting? (And don’t pretend they can be productive while they wait...we all know multitasking doesn’t work.) It’s frustrating to work in a slow software system; most of us have experienced that. It can impact team morale and add an unnecessary barrier to them doing their best work.

When a system is external facing, used by customers, system load speed can frustrate users enough that they stop working with you. System slowness and system quality are usually mutually exclusive—you can’t have both. Delays reduce user confidence and impact buying decisions. How much revenue might you be losing because of poor system performance? 

Is your custom system experiencing slowness or other issues? It might be time for some upgrades. Reach out.

 

Categories