Intel CPUs have a problem

A long time ago around 90’s Intel was king. Sure you can buy Intel-compatible CPUs like AMD, Cyrix, and others but that was mostly a choice for poor people. If you want real quality you will have to pay for it. At least it was the message from Intel.

They and their fans consistently push this message that true quality is only available from Intel and any other choice is a compromise that will lead to strange problems and most importantly to instability. If you want a trouble-free system then you must buy Intel.

To a certain degree, this was the case because Intel systems were more popular and they cost much more. As a result, other systems must cut corners to be more attractive from a price point of view. As a result, they do less testing, and in general, due to worse economy of scale in general they were not as good as Intel’s systems.

But despite all of that non-Intel systems slowly start to get popularity. As I wrote above, while initially it was mostly for budget-oriented customers, after some time it started to gain popularity in the masses because often they were faster and cheaper.

And to be honest, I wasn’t aware of any stability issues for non-Intel systems, and usually, they just require more fiddling to make it work. I’m not saying there were absolutely no stability issues, I'm just not aware of them.

But everybody knows about the infamous Pentium FDIV bug. It was so bad, that many compilers of that era had to introduce a fix for it. What is interesting is that it is exactly when Intel patented everything around, so competitors cannot make their chips compatible with Pentium anymore.

Some can say that this was mostly because Intel was much more popular but for example, in 1996, AMD had close to 10% of desktop market share. If they had any big issues, I’m sure we would hear about it. Especially from Intel fans. But anyway, bad stuff happened to anyone, so let’s move on.

The next bad thing happened to Intel at the end of the 90s. AMD released its first good CPUs from K7 architecture and then they had killer CPUs from K8 architecture that were the first 64-bit CPUs for the x86 platform. They were fast, cool, and affordable.

They got Intel by surprise and Intel started a megahertz race that started to lead to different kinds of instabilities because they required too much power that systems of that time were simply not used to.

Probably everybody heard about Pentium 4’s Prescott CPUs that were extremely hot. I already wrote some time ago that I have seen a motherboard that simply melted the 24-pin connector as well CPU power connector. And it was quite a good and expensive ASUS motherboard with a very good power supply.

Then everybody heard about Meltdown and Spectre. While it affects pretty much all CPUs including ARM it is harder to exploit it for non-Intel chips. And after all CPUs were fixed, performance on the Intel CPU was reduced much more than on other chips. Not in every application, but many were affected by this.

Then history repeats in the last 7 years. Intel had to make their CPUs consume more and more power because otherwise, they could not compete with AMD. In some cases, only the CPU consumes around 300W of power.

And then I found this video that stated that the last 2 Intel generations had serious stability issues. I suggest to watch this video for full details. That guy stated that this affects mostly Intel 13900 and 14900 but looks like other models in these generations are also affected just much less often.

For example, here is the link to the explanation of that problem from the company that develops Oodle which is used in many games. Here is what they are stating: “We believe that this is a hardware problem which affects primarily Intel 13900K and 14900K processors, less likely 13700, 14700 and other related processors as well”. 1431 out of 1584 of all crashes in Oodle happened on Intel 13th and 14th gen CPUs.

And it didn’t affect only the desktop segment. We may blame users who overclock their systems, buy low-quality power supplies and motherboard vendors trying to auto overclock CPU to gain more points in tests and sell more motherboards.

No, it affects servers as well. For example, Intel 13900 and 14900 are very popular for game servers because they provide very good single-threaded performance. Let's compare contracts for Intel and similar AMD chips: 3 years Parts & Labor – 24/S – Next Business Day Onsite repair. For AMD it will be $139, for Intel $1280. Almost 10 times more.

And that guy called the server provider and asked why there was a huge difference, they said that “ support incidents have been unusually high for that configuration”. They meant Intel 13900 and 14900.

Keep in mind, that server motherboards use very conservative settings because the server must be very stable and work for months without restarting. Stability is much more important than performance. And surely nobody will overclock these CPUs on the server and the server motherboards simply don’t have any settings for overclocking.

Keep in mind, that we are not talking about a single crash once in a blue moon. It can happen because of other factors. Three companies stated that the failure rate from about 250 systems is around 50%! In the server world it just simply unacceptable. Some of the servers were even underclocked and considerably reduced memory speed and the problem is still there.

And you can just imagine the frustration of people who are playing on these servers when the server crashes. And people who host games on these servers are frustrated even more because they are losing money.

Imagine you buy a game, and then try to play but the server is constantly crashing. You will refund the game or perhaps cancel your subscription or just leave the game and do not spend money on it. And at least one game developer reported that this issue cost them at least $100K.

Here is a quote from another developer:

“We have identified failures in five main areas:

  • End Customers: Thousands of crashes on Intel CPUs on 13th and 14th Gen CPUs in our crash reporting tools.
  • Official Dedicated Game Servers: Experiencing constant crashes, taking entire servers down.
  • Development Team: Developers using these CPUs face frequent instability while building and working on the game. It can also cause SSD and memory corruption.
  • Game Server Providers: Hosting community servers with persistent crashing issues.
  • Benchmarking Tools: Decompression and memory tests unrelated to Path of Titans also fail.”

And another “We are swapping all our servers to AMD, which experience 100 times fewer crashes compared to Intel CPUs that were found to be defective… The failure rate we have observed from our own testing is nearly 100%, indicating it's only a matter of time before affected CPUs fail”.

The developers of the Warframe game wrote: “While investigating crashes in Warframe we came across a particular series that were not crashing in our code (they were crashing in nvgpucomp64.dll, a component of Nvidia drivers). After aggregating hundreds of reports from helpful players we discovered a pattern: almost all were coming from systems with 13th and 14th generation Intel processors.”.

So it looks like this problem affects CPUs on desktops and servers. Many people trying to figure out what is wrong. They played quite a bit with settings and in many cases, they could not fix it completely. Moreover, it looks like with time failure rates only increase which points to some kind of degradation inside of the chip.

I would say that Intel is pushing chips to their physical limits and as a result, we see all these problems. Perhaps they need much more extensive testing to ensure that the chip is stable or perhaps due to very high power consumption there is an unknown process in the chip that degrade it very quickly.

And yet another interesting thing is that when that person contacted his contacts who work for different motherboard manufacturers and asked how Intel explains this they said that there is no explanation. Intel is not saying anything and just replacing the CPU. But they are very quiet about the whole thing.

In contrast, when motherboard manufacturers for AMD “juiced up” CPUs to the point that they simply burnt up, AMD released a statement that they would replace all affected CPUs, even if it was not their fault.

In conclusion, to me, it looks like Intel losing its grip on the business. It looks like their business is not about satisfying customer's needs but just about making money. They are still in good shape because of the huge inertia of corporate consumers for whom computer means Intel but it cannot last long if problems like this will continue. And it is sad because it will look like there will be no competition.