Today's Hard|Forum Post
Today's Hard|Forum Post

NVIDIA on the Cause of RTX 2080 Series Card Failures

If you are a fan of games and computer hardware, especially GPUs, you have undoubtedly seen anecdotal evidence that NVIDIA's new RTX 2080 series cards are failing due to artifacting, simply stopping working, and in some cases actually catching on fire. Now it seems that NVIDIA is owning up to there actually being an issue with its new RTX cards.

NVIDIA has announced that the culprit behind some of the defective early boards for the NVIDIA RTX 2080 Ti Founders Edition was limited test escapes. Test escapes refers to the testing of the parts such as resistors, capacitors, etc., that are on the PCB. Sometimes these parts are defective or of marginal quality. By using Big Data Analytics, companies can test, monitor, track, and find bad batches of parts before they enter the final product assembly manufacturing stage. As alluded to by NVIDIA, something was missed along the way and the bad components made it through quality control and into the final product. So, from our understanding of what a "test escape" is, it would seem that NVIDIA is actually owning up to putting bad RTX 2000 series cards into the market.

Article Image

Tim@NVIDIA - Limited test escapes from early boards caused the issues some customers have experienced with RTX 2080 Ti Founders Edition. We stand ready to help any customers who are experiencing problems. Please visit www.nvidia.com/support to chat live with the NVIDIA tech support team (or to send us an email) and we'll take care of it.

This is an example of a test escape from the document linked above.

ATE freeze occurs when measurements return the same or similar test results across several parts. This sporadic and hard-to-catch event can be caused by parts getting stuck in the tester or by tester hardware and software malfunctions. So, test results from previous chips are recorded on subsequent tests until the ATE resets itself. In one instance, a semiconductor production line allowed 2,200 units to go untested before detecting a freeze.

Human error can also be an issue with test escapes.

Human error is one of the main contributors for test escapes and RMAs. For example, an engineer may forget to adjust testing limits, and therefore, less stringent tests are administered to the current wafers, allowing many more questionable dice[sic] to pass inspection. Big data solutions catch these errors by delving into the test results and determining which chips were tested properly.

While we have been sure that we have been seeing more RTX 2080 Ti failure reports than would be normal, it seems that NVIDIA is now at least a tiny bit owning up to something surely going on with these new RTX cards.

Only this morning, we have seen an EVGA 2080 Ti XC literally burst into flames, and of course one of the two RTX 2080 Ti Founders Edition cards we purchased directly from NVIDIA has failed with what looks to be a RAM issue. Our card had Micron RAM. Some folks over on the GeForce forums are suggesting that they are seeing changes to Samsung RAM on replaced cards. This does not add up to any kind of proof of a widespread RAM issue, but what we have seen surely does not point to a "test escape" issue, at least in our opinion, unless the component that accidentally escaped testing would be the VRAM. That would be a very big test to miss. NVIDIA would not lie to us would they?

Given that these RTX 2080 Ti cards are the most expensive non-Titan cards ever launched, we hope to get more information from NVIDIA in the future on exactly what the issue is, rather than pointing to a generic test escape explanation.

Big thanks to forum member @iamjanco for the information.

If you like our content, please support HardOCP on Patreon. You can see all of our glowing RTX 2000 series reviews here.

Discussion