NVIDIA GeForce 7800 GTX Preview

NVIDIA is today launching their next generation video card, codenamed “G70.” Inside we cover what this new NVIDIA powerhouse is bringing to the table and how it stacks up against other high-end video card solutions in true gaming experience.

continued...

GeForce 7800 GTX Chip Diagram:

Article Image

This is the chip diagram of how the GeForce 7800 GTX (G70) is laid out. Starting at the top, you will see it has 8 vertex units where the GeForce 6800 Ultra had 6. Moving down you will see that the GeForce 7800 GTX works on pixels in quads of 4 with 6 quads total making a total of 24 pixel-pipelines or, more accurately, 24 pixels-per-clock. These feed down into the ROP system, which is made up of 16 ROP pipelines. Note that the GeForce 6800 Ultra has the same number of ROP pipelines.

In terms of fillrate, the GeForce 6800 Ultra at 400MHz and 16 pixels-per-clock gives it 6.4 GigaPixels/Sec. The GeForce 7800 GTX at 430MHz and 24 pixels-per-clock gives it 10.3 GigaPixels/Sec. of texel fillrate, which is a 61% increase in raw texel fillrate. Compared to a Radeon X850 XT-PE, however, at 540MHz and 16 pixels-per-clock it has a fillrate of 8.6 GigaPixels/Sec., which means the GeForce 7800 GTX has a 19.4% increase in raw fillrate. NVIDIA has done more than just increase the raw fillrate available to the video card however; they have increased the card’s raw computational shader power as well, which you will see below.

*Update 6/29/05*

We have an update from NVIDIA on how fillrate is calculated on the GeForce 7800 GTX in lieu of the difference between pixel (fragment) pipelines and ROP numbers.

In an early draft of the reviewers guide I also listed pixel fill rate as 10.3G Pixels/sec (thinking 24 pixel pipes x 430MHz core). While that's certainly the bilinear-filtered texel fill rate, Tony corrected me stating that for peak pixel fill rate, we'd multiply number of ROPs x core clock rate, since "pixels" are output from the ROPs, so it's really 16x430MHz = 6.88GB/sec. Given the number of times fragments might cycle through pixel pipes, we really didn't need to increase # of ROPs to maintain a balanced architecture.

Thank you Nick and Tony. Our corrected statement is that the texel fillrate is indeed 10.3 GigaPixels/Sec but the peak pixel fillrate, since pixels are output through the ROPs, is 6.88 GigaPixels/Sec. You can think of this as the "effective" fillrate. Certainly you do not need to get caught up in these details as the real test of this video card is how it performs in games, which you will see later on this review. The gameplay evaluation speaks for itself.

Vertex Shader Pipeline:

Article Image

The Vertex Shader Pipeline is laid out exactly like the NV40. There is one FP32 Scalar unit, one FP32 Vector Unit, and one Vertex Texture Fetch unit. This is a Multiple Instruction, Multiple Data (MIMD) architecture that is Dual Issue. Like the NV40, the Vertex Texture Fetch unit can handle up to 4 textures. NVIDIA has worked on the triangle setup part of the pipeline to improve efficiency. The fixed-function portion of the geometry pipeline has been improved by 30% over a GeForce 6800 Ultra. Texture fetch has also been improved for large textures.

Accelerating the triangle setup increases the overall throughput of the 3D pipeline. This is especially true in geometry- or vertex-bound applications such as shadow rendering.

Vertex Shader 3.0 is supported, which allows flow control, displacement mapping, vertex texture lookups, vertex program streams, and infinite length vertex programs.

Pixel Shader Pipeline:

Article Image

Here is a breakdown of the pixel shader pipeline. As you can see, it is laid out the same as the NV40 in a true Single Instruction, Multiple Data (SIMD) architecture with two ALUs per pixel. There are two FP32 shader units. The first unit can compute arithmetic and texture reads and the second unit arithmetic only. They can operate in Dual and Co-Issue. This means that in one clock they can operate both a texture value with ALU 1 and a math operation with ALU 2 or use both ALUs for math if they aren’t operating on a texture value. In addition, there is a free FP16 normalize out of Shader Unit 1.

The FP Texture Unit is capable of FP16 texture filtering and 16:1 Anisotropic Filtering with Trilinear (128-tap) otherwise known to you all as 16x AF. NVIDIA has re-designed the texture-processing engine for efficiency to improve performance.

CineFX 4.0 includes a redesigned texture processing engine. Textures can be grabbed and accessed faster, and developers can take advantage of a variety of texel sample sizes. These improvements deliver major benefits for high-precision texture applications such as high dynamic-range (HDR) rendering. Anisotropic filtering also benefits from the improved cache design of the latest texture engine.

Accelerating MADD Computations:

Article Image Article Image Article Image

The most important feature of the shader pipeline is the fact that both shader units are capable of 4 FP Multiply-Adds (MADDs) per pixel. With the NV40, ALU 1 was not able to do the simple ADD operation, as it was only able to do a MUL. ALU 2 was able to do an ADD and a MUL making 2 MULs and an ADD per pipeline. With the G70, NVIDIA has improved the shader units so that now 4 FP MADD operations can be handled per shader unit, which gives us 8 FP MADDs per clock in each pixel pipeline. The G70 is also able to do MADDs in a single cycle in the vertex shader.

Multiply and accumulate are commonly used math functions in 3D graphics. Also referred to as multiply-add (MADD) operations, they show up in transformations, lighting, normal map calculations, and many other operations.

The CineFX 4.0 engine accelerates MADD operations for overall increased throughput for the pixel shader. In fact, the GeForce 7800 can perform up to twice the MADDs of the previous-generation GPUs.

What this means for you is that the G70 has 2x the raw shader performance of NV40. Of course, Pixel Shader 3.0 and all the features associated with it are supported with FP32 throughout the entire pipeline.

ROP Pipeline:

Article Image

The ROP pipeline is also laid out the same as in NV40. The G70 ROP supports OpenEXR Floating point blending, Rotated Grid AA, fast Z operations, and Multiple Render Targets.