Fixing An Unstable System

Well, this post took a bit longer to finish than I planned, but here we go:

It seems I finally have a stable system again. Getting there was kind of an odyssey… Tech is a curious thing, one problem can lead to something completely different, it just snowballs from there and finding the actual cause can be… a journey. For me, this essentially all started with a forced Windows 10 update.

Some weeks ago, Windows 10 decided it’s finally time to force an update from the old 1909 Feature Update (the second major update to Windows 10 that came out in 2019). In the past, I always switched back to 1909 after voluntarily trying out the more recent Feature Updates, mainly due to some issues with Voicemeeter in combination with Discord (my audio got distorted so I sounded robotic in voice calls when I used Voicemeeter as the input for Discord. Since I use Voicemeeter as a noise gate and to splice audio clips into voice calls, not being able to use that was a showstopper to me). There were also some general instabilities/freezes, but as those disappeared once I went back to 1090, I chalked them up to additional side-effects of the audio issue or maybe some bugs in the newer Feature Updates and didn’t pay it much thought.

Fixing the Audio issue

So, now that it seemed that the usual rolling back to 1909 after an update would not be a viable long-term solution anymore, I did some research to try and fix the audio issue and by luck stumbled upon the solution: the audiodg process of Windows must be set to high priority and limited to one CPU core. This can be automated by using Process Lasso, a rather useful little app. This incidentally also significantly improved the audio quality of my streaming through Discord (going by reports of people watching my streams, previously there were short audio distortions around every minute, now it’s around once every 20-40 minutes and the distortion also seems to be weaker).

The general instability that I experienced before seemed to not be there anymore (in previous updates, the system experienced short freezes from time to time, sometimes coming to a halt altogether). Notice how I wrote “seemed” … and this is how we go into chapter 2

Hardware problems

During regular usage of the system, i.e., relatively low loads (Office, browsing, Discord calls / video-streaming, etc.), there were no noticeable issues. But once I started gaming (at the time, just shortly before this all started, I began playing Borderlands 3), the system began to freeze after some time. Since it only happened during something that put load on the system, of course I suspected some temperature issue. Keeping an eye on hwinfo while running Borderlands 3, both the CPU and GPU did not reveal any “dangerous” temperatures. The CPU did go higher of course, given that I only had a silent-focused cooler on there, but it was nowhere near a critical temperature. After that I turned my attention to the SSD (an NVME drive), which was getting quite warm, but nothing dangerous either. But to be on the safe side, I decided to move the OS partitions to another SATA SSD that I had in the system. During this time, I also exchanged the graphics card and power supply, to see if that changed anything (it didn’t). Since the freezes still happened afterwards, that at least seemed to rule out the SSD as the problem. To see if anything in the behavior changes, I set all the fans (CPU, case, GPU) to a fixed 100% speed. This made a difference and Borderlands 3 ran for quite some time without issues. At some point it did still crash, but while it would previously last at most 20-30 minutes, now it would run for hours, at some point I let the game run through the night, which admittedly would not put as much strain on the system as actually playing it, but the next morning the system was still running, which very likely would not have been the case before the fan speed change (on a sidenote this also prompted me to add another 2 case fans to the setup).

Putting all these tests together, it seemed rather clear that the heat build-up caused some part of the hardware to heat up too much, likely some part of the motherboard (possibly the main chipset) or maybe the main system memory. As these tests limited the potential culprit list to CPU/motherboard/RAM, I decided to go ahead and upgrade the system (something I planned to do in the relatively near future anyway), ordering a current gen AMD Ryzen CPU with a good cooler, a fitting motherboard and a decent amount of RAM. After the new parts arrived and I exchanged them, a new problem appeared: Prime95 detected a hardware failure and Borderlands 3 crashed with an engine error a short while after it started. Thankfully putting a copy of memtest on one of my USB drives quickly cleared up the cause: 2 of the ram sticks was faulty, removing them made the issue disappear and now the system was finally running smooth and stable again. Another day later the replacement memory arrived, and everything was fine.

I really hope that the current setup is going to keep running fine for some years to come, I’m not really in the mood to go through something like that again anytime soon…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.