#### Clock Generation and Distribution for High-Performance Processors

Stefan Rusu Senior Principal Engineer Enterprise Microprocessor Division Intel Corporation

stefan.rusu@intel.com



## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

SoC 2004 – Stefan Rusu

## **Clock Definition and Parameters**

• The clock is a periodic synchronization signal used as a time reference for data transfers in synchronous digital systems



- Skew
  - Spatial variation of the clock signal as distributed through the chip
  - Global vs. local skew
- Clock jitter
  - Temporal variation of the clock with respect to a reference edge
  - Long-term vs. cycle-to-cycle jitter
  - Duty cycle variation50/50 design target

#### Processor Frequency Trend



SoC 2004 – Stefan Rusu

#### Clock Skew Trend



Source: ISSCC and JSSC papers

#### **Relative Clock Skew**



Clock skew accounts in average for ~5% of the cycle time

#### Source: ISSCC and JSSC papers

#### Sources of Clock Skew

• With a perfectly balanced distribution, device mismatch is the largest contributor to the clock skew



Geannopoulos, ISSCC-1998

int

#### **Clock Jitter Trend**



Source: ISSCC and JSSC papers

## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

SoC 2004 – Stefan Rusu

#### **Clock Distribution Networks**







Tree



Grid







**H-Tree** 

**X-Tree** 

**Tapered H-Tree** 

#### Inductance Effect



Xanthopoulos, ISSCC-2001

## Itanium<sup>®</sup> Processor Clock Hierarchy



Rusu, ISSCC-2000



## Local Clock Distribution

- Local clock distribution enables flexible skew management to support:
  - Intentional clock skew insertion for timing optimization
  - Clock gating for power reduction



Rusu, ISSCC-2000

#### Itanium<sup>®</sup> 2 Processor Clock Distribution

- First level: Pseudo-differential, impedance matched branching, balanced h-tree
- Second level: balanced, width and length tuned binary h-tree
- Second Level Clock Buffers: adjustable delay buffer
- Gaters: all constant input loading with load-tuned drive strength



Anderson, ISSCC-2002

**m**t

## **Optical Skew Probing**



- Clock edge generates infrared photon emission
- Emission peak indicates clock transition edge

Tam, VLSI Symposium, 2003



int

## **Optical Probing Results**



Tam, VLSI Symposium, 2003



SoC 2004 – Stefan Rusu

## 130nm Itanium<sup>®</sup> 2 Skew Profile



Tam, VLSI Symposium, 2003

int

#### Pentium<sup>®</sup> 4 Processor Clock Network



• 2GHz triple-spine clock distribution (180nm)

Kurd, JSSC-2001



SoC 2004 – Stefan Rusu

#### 90nm Clock Distribution



 Sub-10ps clock skew demonstrated in a 90nm processor using clock tree averaging

Bindal, ISSCC 2003



## Pentium<sup>®</sup> 4 Processor Clock Skew



130nm Pentium<sup>®</sup> 4 Processor

90nm Pentium® 4 Processor

90nm design has 3x lower clock skew than the 130nm

Schutz, ISSCC 2004



SoC 2004 – Stefan Rusu

## Alpha\* Processors Clocking

| Product               | 21064                                     | 21164                       | 21264                                                                                    |  |
|-----------------------|-------------------------------------------|-----------------------------|------------------------------------------------------------------------------------------|--|
| Frequency             | 166MHz                                    | 300MHz                      | 600MHz                                                                                   |  |
| Transistors           | 1.7M                                      | 9.3M                        | 9.3M                                                                                     |  |
| Process               | 0.75um 4ML                                | 0.5um, 4ML 0.35um, 6ML      |                                                                                          |  |
| Power                 | 25W                                       | 50W                         | 72W                                                                                      |  |
| Clock load            | 2.75nF                                    | 3.75nF                      | 2.8nF                                                                                    |  |
| Clock<br>Floorplan    |                                           | final drivers<br>pre-driver |                                                                                          |  |
| Clock<br>skew<br>plot | Turturturturturturturturturturturturturtu |                             | 75<br>60<br>60<br>60<br>60<br>60<br>60<br>60<br>10 Vertical Ation<br>Chip Hotranul Ation |  |

\* Other names and brands may be claimed as the property of others

Gronowski, JSSC 1998



SoC 2004 – Stefan Rusu

#### 1.2GHz Alpha\* Processor Clock



\* Other names and brands may be claimed as the property of others

Xanthopoulos, ISSCC-2001



#### Power4\* Clock Distribution



- Dual core, SOI process, 174M transistors
- Measured clock skew below 25ps

\* Other names and brands may be claimed as the property of others

Restle, ISSCC-2002

int

#### Power4\* - 3D Skew Visualization



## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

SoC 2004 – Stefan Rusu

#### Dual-Zone Clock Deskew



Geannopoulos, ISSCC-1998



SoC 2004 – Stefan Rusu

## Itanium<sup>®</sup> Processor Clock Deskew



- Distributed array of deskew buffers to reduce process related skew
- 8 deskew clusters each holding up to 4 buffers
- 30 deskew zones

Rusu, ISSCC-2000



## Itanium<sup>®</sup> Processor Deskew Buffer



- Small step size enables fine skew control over a wide range
- TAP read / write access to Control Register enables faster timing debug and performance tuning

Rusu, ISSCC-2000



#### Pentium<sup>®</sup> 4 Processor Deskew



Logical diagram of the skew optimization circuit

Phase detector network

Kurd, JSSC-2001

int<sub>el</sub>

## Deskew Techniques Summary

| Author       | Source   | Clock<br>Zones | Skew<br>Before | Skew<br>After | Step<br>Size |
|--------------|----------|----------------|----------------|---------------|--------------|
| Geannopoulos | ISSCC-98 | 2              | 60ps           | 15ps          | 12ps         |
| Rusu         | ISSCC-00 | 30             | 110ps          | 28ps          | 8ps          |
| Kurd         | ISSCC-01 | 47             | 64ps           | 16ps          | 8ps          |
| Stinson      | ISSCC-03 | 23             | 60ps           | 7ps           | 7ps          |

- Clock deskew techniques compensate for device and interconnect within-die variations
- Deskew circuits cut clock skew to less than a quarter of the original value

#### Useful Clock Skew



- Use de-skew buffers to insert intentional skew to maximize the processor operating frequency
- Larger benefit achieved in early steppings

Tam, VLSI Symposium, 2003

int

## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

intel SoC 2004 – Stefan Rusu

## Pentium<sup>®</sup> 4 Processor Jitter Reduction



 RC-filtered power supply for clock drivers reduces clock distribution jitter

Kurd, JSSC-2001



#### Alpha\* Processor Voltage Regulator



- Voltage regulator ensures optimum DLL tracking
- Supply noise frequencies over 1MHz are attenuated by more than 15dB

Xanthopoulos, ISSCC-2001

\* Other names and brands may be claimed as the property of others



SoC 2004 – Stefan Rusu

#### **On-Die Clock Jitter Detector**



Kuppuswamy, VLSI Symposium 2001

#### Array Phase Detector



- 7 elements above and below center, with increasing positive and negative built-in offset away from center
- Phase offset created by progressively delaying data wrt clock

## Histogram Mode Operation



#### **Graph Mode Operation**



time

## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

SoC 2004 – Stefan Rusu

#### Clock Power Breakdown Example

- 30% of the total power is attributed to clock
- Most of the clock power is used in the final clock buffers and flip-flops



## **Clock Power Reduction**

- Reduce clock frequency
  - Multiple frequency domains
  - Dual edge triggered flip-flops
- Reduce voltage swing
  Low swing clocks

# Clock Power = $f * C * V^2$

- Reduce clock loading
  - Clock gating
  - Clock-on-demand flip-flop
  - Optimized routing

## Half Swing Clocking



- Requires four clock signals
  - Two clock phases with a swing between
    Vdd and Vdd/2 drive the PMOS devices
  - The other two phases with a swing between Gnd and Vdd/2 drive the NMOS transistors
- Experimental savings of 67% were demonstrated on a 0.5µm CMOS test chip with only 0.5ns speed degradation
- Requires additional area for the special clock drivers and suffers from skew problems between the four phases

m

#### Clock-on-demand Flip-Flop



- Activates internal clock only when the input data will change the output equivalent to single bit clock gating
- Longer setup time and sensitive to hold time violations

Hamada, ISSCC 1999



## **XScale Processor Clock Gating**

- Three hierarchical clock gating levels
  - Top level stop clock
  - Unit level 83 enables
  - Local clock buffers –
    400 unique enables



Clark, JSSC 11/2001

SoC 2004 – Stefan Rusu

## Dual Edge Triggered Flip-Flop



- Operates at half the clock frequency
- Requires tight control of the clock duty cycle

Nedovic, ESSCIRC 2002



## Outline

- Clock Distribution Trends
- Distribution Networks
- De-skew Circuits
- Jitter Reduction Techniques
- Clock Power Dissipation
- Future Directions
- Summary

SoC 2004 – Stefan Rusu

## **Rotary Clock Distribution**



 Transmission line based, self-regenerating rotary clock generator

Wood, ISSCC-2001



## Standing Wave Oscillator

Standing wave





O'Mahony, ISSCC-2003

## 10GHz Clock Grid Test Chip



Fabricated in a 0.18µm
 1.8V 6M CMOS process



- Very low clock skew and power consumption
- Attractive alternative for 10GHz clocking and beyond

| Locking range    | 9.8 GHz – 10.5 GHz (6.4% range) |
|------------------|---------------------------------|
| Skew             | 0.6ps                           |
| Jitter (added to | 0.5ps rms                       |
| external source) | (1.4ps rms external source)     |
| Power            | 430mW                           |
|                  |                                 |

## **Optical Clock Distribution**

- Board-level guided-wave H-tree distribution
- Monolithic silicon-based detection
- Couplers provide tolerance for horizontal and vertical misalignment of the flip-chip assembly
- Optical transmission is immune to process variations, power-grid noise and temperature



J.D. Meindl, Georgia Institute of Technology, 2000

int

## Summary

- High performance processors require a low skew and jitter clock distribution network
- Clock distribution techniques are optimized to achieve the best skew and jitter with reduced area and power consumption
- Deskew techniques are demonstrated to cut the skew to ¼ of its original value
- On-die supply filters are used to reduce jitter
- Intensive research focuses on novel clock
  distribution techniques

