- Date:
- Wednesday, November 13, 2002
- Author:
- Steve Lynch
- Editor:
- Kyle Bennett
- Google +1

Intel 3.06GHz CPU with Hyper-Threading
Intel bangs the 3GHz barrier before anyone else and throws in HT for free. We know 3GHz is going to be flying, but the world really wants to know all about HT and what it has to offer.
The concept behind Hyper-Threading technology may seem confusing at first, but looking at this graph, it becomes clearer how Hyper-Threading via a single CPU can benefit all applications regardless of whether they are optimized to take advantage of the process or not.

A common misconception is that software must be written to take advantage of Hyper-Threading technology which, in fact, is not true. Everyday multi-tasking such as listening to your favorite music while doing a little Photoshop work or browsing the internet can greatly benefit from Hyper-Threading technology. How many programs do you have open at this very minute? Look at your taskbar and ask yourself “How often do you have at least three or more programs running at one time”? If you are anything like me, the number is closer to five programs or more running at any given time, and Hyper-Threading is a welcome addition to the daily computing experience. We will take a look at the performance benefits of Hyper-Threading technology after we look at the processor itself.
Pentium 4 3GHz Features:
The hyper-pipelined technology of the NetBurst micro-architecture doubles the pipeline depth compared to the P6 micro-architecture used on today’s Pentium III processors. One of the key pipelines, the branch prediction / recovery pipeline, is implemented in 20 stages in the NetBurst micro-architecture, compared to 10 stages in the P6 micro-architecture. This technology significantly increases the performance, frequency, and scalability of the processor.
533 MHz and 400 MHz System Bus:
The Pentium 4 processor supports Intel’s highest performance desktop system bus by delivering 4.2 GB or 3.2 GB of data per second into and out of the processor. This is accomplished through a physical signaling scheme of quad pumping the data transfers over a 100-MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers. This compares to 1.06 GB/s delivered on the Pentium III processor’s 133-MHz system bus.
Level 1 Execution Trace Cache:
In addition to the 8KB data cache, the Pentium 4 processor includes an Execution Trace Cache that stores up to 12K decoded micro-ops in the order of program execution. This increases performance by removing the decoder from the main execution loop and makes more efficient usage of the cache storage space since instructions that are branched around are not stored. The result is a means to deliver a high volume of instructions to the processor’s execution units and a reduction in the overall time required to recover from branches that have been mis-predicted.
Rapid Execution Engine:
Two Arithmetic Logic Units (ALUs) on the Pentium 4 processor are clocked at twice the core processor frequency. This allows basic integer instructions such as Add, Subtract, Logical AND, Logical OR, etc. to execute in half a clock cycle. For example, the Rapid Execution Engine on a 1.50 GHz Pentium 4 processor runs at 3 GHz.
256KB or 512KB Level 2 Advanced Transfer Cache:
The Level 2 Advanced Transfer Cache (ATC) is either 256KB or 512KB in size and delivers a much higher data throughput channel between the Level 2 cache and the processor core. The Advanced Transfer Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, the Pentium 4 processor 1.50 GHz can deliver a data transfer rate of 48GB/s.
|
The Advanced Dynamic Execution engine is a very deep, out-of-order speculative execution engine that keeps the execution units executing instructions. The Pentium 4 processor can also view 126 instructions in flight and handle up to 48 loads and 24 stores in the pipeline. It also includes an enhanced branch prediction algorithm that has the net effect of reducing the number of branch mis-predictions by about 33% over the P6 generation processor’s branch prediction capability. It does this by implementing a 4KB branch target buffer that stores more detail on the history of past branches, as well as by implementing a more advanced branch prediction algorithm.
Enhanced Floating-Point and Multimedia Unit:
The Pentium 4 processor expands the floating-point registers to a full 128-bit and adds an additional register for data movement which improves performance on both floating-point and multimedia applications.
Internet Streaming SIMD Extensions 2 (SSE2):
With the introduction of SSE2, the NetBurst micro-architecture now extends the SIMD capabilities that MMX technology and SSE technology delivered by adding 144 new instructions. These instructions include 128-bit SIMD integer arithmetic and 128-bit SIMD double-precision floating-point operations. These new instructions reduce the overall number of instructions required to execute a particular program task and as a result can contribute to an overall performance increase. They accelerate a broad range of applications, including video, speech, and image, photo processing, encryption, financial, engineering and scientific applications.
Features Used for Test and Performance / Thermal Monitoring:
Built-in Self Test (BIST) provides single stuck-at fault coverage of the microcode and large logic arrays, as well as testing of the instruction cache, data cache, Translation Lookaside Buffers (TLBs), and ROMs.
IEEE 1149.1 Standard Test Access Port and Boundary Scan mechanism enables testing of the Pentium 4 processor and system connections through a standard interface.
Internal performance counters can be used for performance monitoring and event counting.
Includes a new Thermal Monitor feature that allows motherboards to be cost effectively designed to expected application power usages rather than theoretical maximums.
|
Physical Features:
The outward appearance does not give any indication of what's “under the hood”. The 3.06GHz Pentium 4 looks almost identical to any other socket 478 Pentium 4.
The underside of the CPU features a different resistor layout, allowing for more room between each of the twelve resistors. Power requirements remain the same for the new Hyper-Threaded 3.06GHz CPU, but Thermal Design Power has shot over the top with a blistering 81.8W. We're not aware of any changes in cooling requirements for this CPU. The AVC Sunflower performed very well, though it did run warm.

There were also early concerns that the initial 3.06GHz Hyper-Threaded CPUs would require a special power supply at the mainboard level, which is something you will have little control over if you own an old socket 478 mainboard and are looking to upgrade. We've tested our 3GHz Pentium 4 on a board designed to do so. I'm sure we'll see backwards compatibility testing coming forward soon from the community. At this time we have not had time to do any long term testing, and that is what will be required. It's our opinion that most of us with boards designed for overclocking are going to be OK with a 3GHz upgrade. Time will tell, though.
