Today's Hard|Forum Post
Today's Hard|Forum Post

[H]ard|OCP 11/13/03 Editorial

Synthetic benchmarking is changing again. Who is cheating and who is optimizing and who can't tell the difference? I know we can't and should we care?

Futuremark "Fixes" Problems:

I am sure many of you are thinking, "Here we go again." And honestly that is exactly what I was thinking too. Here is this weekآ’s Benchmark Brouhaha storyآ…

Futuremark, the company behind the 3DMark series of benchmarks, announced a patch for their current 3DMark2003 video card benchmark. Here is what they had to say:

According to Tero Sarkkinen, Executive Vice President of Sales and Marketing at Futuremark Corporation, آ“the new version is published to make sure that our customers can get an objective 3DMark03 performance comparison with the latest hardware and drivers. Our customers will be able to perform apples-to-apples performance comparisons between the various IHVsآ’ graphics cards.آ”

Apparently 3DMark2003 has not been objective in the past.

While "apples-to-apples" video card benchmarking sounds good in theory, we feel as though the way games have changed, it does not offer up the honesty that it once did. GPU/VPU architectures are diversely different from company to company and they do their jobs in different ways. Do we buy video cards to play games or to run benchmarks? The fact of the matter is that a 3DMark2003 score is not going to reflect gameplay in Half Life 2 and Doom3 as it is very likely that different video cards will not be doing things in an آ“apples-to-applesآ” way when it comes to a real world gaming situation. In fact we know this to be the case with some current games and some games that we will see very soon. For Futuremark to be moving towards آ“objectivityآ” via apples-to-apples at this point seems to be moving backwards away from what their goal should be.

Still the synthetic benchmark is here and apparently the score means something to many people in all facets of the hardware business as it is still a hot topic of conversation. Certainly as I sit here typing one side of my brain is telling me that I should be doing more important things than addressing issues that are moot.

To get back on track, this patch greatly affects 3DMark2003 scores with NVIDIA's flagship product. Before the patch, the ATI and NVIDIA flagship cards were running roughly even. After the patch, the NVIDIA card, the GeForceFX 5950 Ultra, lost nearly 1000 points, dropping from the mid-6000s into the mid-5000s. The ATI Radeon 9800XT, stayed steadfast in its score.

You would have to think that after Futuremark's statement and patch release, considering the results that we have seen, that NVIDIA is doing something odd with their v52.16 drivers. Futuremark thinks that those actions in the NVIDIA drivers do not allow for 3DMark2003 to objectively score NVIDIA cards, otherwise the score would be unchanged. So logically, we are then lead to believe that Futuremark آ“fixedآ” it so that the playing field is now level between the two giants of video card technology. OK, I can respect that as being Futuremark's view of the situation as they are certainly entitled to have an opinion on driver optimizations.

But what is this on their 3DMark03 Approved Drivers page?

Here we have listed all WHQL drivers which are approved to be used only with Build 340 of 3DMarkآ®03. If you haven't got Build 340 of 3DMark03, you can download the patch here. By using these drivers & Build 340 you will get a valid and fully comparable 3DMark03 result. We will update the driver information continuously as new WHQL certified drivers are reviewed.

What you will find listed on that page are NVIDIA's 52.16 drivers. Yes, the same ones that scores much differently when comparing the last two versions of 3DMark2003. It is apparent that Futuremark is doing something to the NVIDIA driver so that it does things differently in this newer version of the benchmark. Being آ“approvedآ” of course even made us more curious, so we pinged NVIDIA.

NVIDIA Q&A:

We asked NVIDIA this:

"I am sure that by now your company has had time to test the new build of 3DMark2003. You will see as we have that you have had an incredible decrease in overall score. Futuremark has stated this publicly about this new build. "...the new version is published to make sure that our customers can get an objective 3DMark03 performance comparison with the latest hardware and drivers. Our customers will be able to perform apples-to-apples performance comparisons between the various IHVs' graphics cards."

This of course coupled with the new results of the 340 build when compared to the last public 330 build suggests that there are optimizations being made in your 52.16 driver that Futuremark does not find valid. Seeing that NVIDIA is a member of Futuremark BETA Program, I would think that explaining this would be fairly easy.

Could you please explain why a 3DMark2003 score using an NVIDIA GeForce FX 5950 Ultra with v52.16 driver would score almost a thousand points lower from 3DMark2003 build 330 to 340?"

Derek Perez with NVIDIA's PR Dept replied:

Kyle.

This latest patch from Futuremark is yet another revision of 3DMark03 specifically designed to defeat our Unified Compilier Technology, which evaluates shaders and in some cases substitutes hand tuned shaders, but increasingly simply applies the run-time compiler to generate optimal code. With the 52.16 drivers and the new patch, our perf drops 15%.

Clearly our compiler has gotten much better, as image quality remains exactly the same, the only thing that happens is a 10-15% drop in performance.

We're not sure why anyone would want to reduce their performance by 10-15% for the same image quality, but apparently Futuremark feels that is something relevant.

What we expect will happen is that we'll be forced to expend more engineering effort to update our compiler's fingerprinter to be more intelligent, specifically to make it intelligent in its ability to optimize code even when application developers are trying to specifically defeat compilation and optimal code generation.

This is yet another example of how 3DMark03 doesn't behave like a game - as a game developer would never specifically try to make their application run poorly or disable optimizations that produce the correct image while delivering better performance.

Our Unified Compilier Technology is accepted by the development community, our OEMs, Add-in-Card partners and more.

Derek also went on to quote Tony Tamasi of NVIDIA:

This reminds me of the early days of CPU's, when for some weird reason the industry felt the need to run "unoptimized code" through fear of the "new" optimizing compilers. It took the CPU industry a couple decades to accept optimizing compilers as legitimate, and of course now everyone assumes that as standard practice. Lets hope that we've learned from that experience and that it doesn't take the GPU industry anything near that time to accept compiler technology as legitimate and proper in this new age of programmable GPU's.

Quite frankly I have shared Tony's view since the first half of this year.

What is a Benchmark?

I think this is a really good question. In terms of 3DMark2003, I am not really sure it is a legitimate benchmark. One day it gives me one score, then the next day it gives me another score for the same exact hardware and software configuration? Futuremark seems to have a whim about something and all of sudden the scores change drastically without any true explanation. I am supposed to read a press release and go, "Oh, so now after three quarters, 3DMark2003 is finally doing what it is supposed to." No specifics mentioned, no reason given, just "we made it objective now..."

I am not out to get Futuremark, but to say they have not dropped the ball this year would be lying to ourselves. I would like nothing more to see a benchmark from Futuremark that was great and would unite the world of hardware geeks. Futuremark has tremendous reach, an incredible brand penetration, and the ability to give their products to the end users for free. That simply is an incredible thing. Too bad their current benchmark is junk.

What about NVIDIA?

NVIDIA has been caught with their hand in the cookie jar this year. They got busted cheating at 3DMark2003. I am not sure if there were upper level management involved in the decision to "aggressively optimize" for 3DMark2003, but surely it was done regardless of motive. While the synthetic video card benchmark was already headed down a rocky and treacherous road, NVIDIA stepped in with their cheating and drove the bus right off the cliff and into a blazing fireball. Yes, NVIDIA killed the synthetic benchmark in my eyes. Did we know the benchmark could be compromised? Of course we did. Did we ever think any company would have the guts to twist the results the way NVIDIA did? Honestly, I did not. How naive I was.

NVIDIA is far from being "clean" on this whole deal, but I am unsure of whether or not Futuremark has not stepped too far the other direction this time. Is it OK for Futuremark to say, we like the way ATI does things with their drivers so we are going to leave that alone? Then look at NVIDIA and decide that NVIDIA is doing it "wrong"?

I could care less if NVIDIA has to hire 100 times more engineers to get the same results as ATI. I do not care if they have to optimize. I do not care if they have to do more work than ATI to get the same image quality. It really makes no difference to me and really it will not make much difference to the end user buying the card. He just wants the games he plays to perform well with stunning visual quality.

Our Thoughts:

A benchmark is worthless to me if overnight the results can change by 15% without proper explanation as to why exactly that happened. Futuremark knows very well what exactly has changed with their benchmark but has not filled in the public that pays attention to their tool. That is inexcusable. Shame on you Futuremark. If they are going to take such actions, they should be accountable for them beyond a spineless PR ramble that tells us nothing.

Who knows what NVIDIA is doing to twist the benchmark results? We know from their track record that they are not above cheating. Is that what they are doing here again? I do not know and I would not count on Futuremark to ever tell us the truth as we saw them eat a lot of crow last time they came out and said NVIDIA was cheating. Right or wrong, NVIDIA has more money and a lot more lawyers. I do not much believe anything that I am told by NVIDIA anymore. NVIDIA has lost their credibility this year and that is not something that is easy to regain. They need to stop آ“PRingآ” us to death about products that donآ’t deliver and start getting technology into the hands of board partners that will sell itself once again. Is what they said above true about the latest 3DMark2003 instance? I do not know that either, and I really do not care either as it does not impact realworld gameplay. NVIDIA has shoveled tons of cash Futuremarkآ’s way. They helped build the beast and they are still helping keep it alive. My thought is آ“deal with itآ”.

As we said in the opening, games are getting so very complex and diverse that a single tool such as 3DMark2003 is of little or no value. Aside from all the cheating and optimization, 3DMark2003 is simply too narrow of a look at gaming to accurately represent the big picture. Is it a fun tool that can easily be used by the enthusiast? Of course it is and it can be a ton of fun to use and watch or compare data with other enthusiasts. It has a great value there. Should hardware buyers, all the way from you and me to giants like Dell and Compaq/HP, be using 3DMark2003 as a tool for making a purchase? No, we should not. I think it is irresponsible to do so.

Our Options?

If you check our last couple video card reviews (here and here), you will see that we no longer "review" video cards using traditional benchmarks. It came to me one night that what we were doing was all wrong. As a computer hardware reviewer, we are still stuck in that 3dfx frames per second mindset, even though I thought I had already come to terms with that. It seems we were in denial or truly did not understand the whole issue, more likely being the latter.

My thought was that we should NOT be reviewing video cards, but rather evaluating the experiences they provide while gaming. What value is a 3DMark2003 score of an incredible 10,000 points, if playing games on my new video card simply sucks? Exactly, it is of no value.

We have gotten away from the normal "benchmarks" and started focusing on actual performance and image quality in real gameplay with retail games and demos. Yes, the same exact ones that you will buy or download. Yes, and the same drivers that you will have access to as well. There will still be some instances that we have product and drivers that are not public yet, but we are going to stringently focus on retail product. Performance and IQ are the two things that really matter, with driver stability and hardware compatibility being a close third and fourth.

Conclusions & Delusions:

Bottom line is that no company is beyond reproach when it comes to your money being on the table. You need to be able to make an informed decision about the hardware you are buying and that decision is becoming more difficult and not as clear as it used to be. We are going to try our best to help you make that informed decision. My thought is that no current synthetic benchmark is going to tell you really what you need to know when it comes to the gaming experience that a video card and driver will deliver.

If you take but one thing away from our editorial, remember this. The gaming experience supplied by a video card is of the utmost importance for most of you purchasing a video card. So logic would tell us that we should base our decisions on that. Games donآ’t lie.

Be smart and steer clear of brand loyalty as the landscape can change quickly. Vote with your wallet, it is the only real voice we have when it comes to computer hardware.