Innovation and Creativity

# Design of a 5.8 GHz Multi-Modulus Prescaler 

Vidar Myklebust

Master of Science in Electronics Submission date: June 2006
Supervisor: Jukka Tapio Typpø, IET
Co-supervisor: Oddgeir Fikstvedt, Micrel

## Problem Description

En prescaler er en viktig byggeblokk i en PLL der den deler ned VCO frekvensen til en lavere frekvens før fasedetektoren. En 5.8 GHz multi-modulus prescaler skal designes i en 0.18 um mixedsignal
CMOS prosess. Denne skal brukes i en ISM-bånd transceiver.
Studenten skal basert på litteraturstudie finne arkitekturer som er egnet for on-chip implementasjon i CMOS. Utfra disse aktuelle arkitekturene skal hun/han finne den beste med tanke på strømforbruk og areal.
Den valgte arkitekturen skal implementeres 0.18 um CMOS.

Frekvens: 5.8 GHz
Modulu: 64
Strømforbruk: 4 mA

Assignment given: 16. January 2006
Supervisor: Jukka Tapio Typpø, IET

## Abstract

A 64 -modulus prescaler operating at 5.8 GHz has been designed in a $0.18 \mu \mathrm{~m}$ CMOS process. The prescaler uses a four-phase high-speed $\div 4$ circuit at the input, composed of two identical cascaded $\div 2$ circuits implemented in pseudo-NMOS. The high-speed divider is followed by a two-bits phase switching stage, which together with the input divider forms a $\div 4 / 5 / 6 / 7$ circuit. The phase switching stage is mostly implemented in complementary CMOS. After this follows four identical $\div 2 / 3$ cells with local feedback, also implemented in complementary CMOS.

Other architectural approaches are also described and tried out. An architecture based solely the $\div 2 / 3$ cells with local feedback is presented. The $\div 2 / 3$ cells were implemented and simulated, and worked up to 2.3 GHz . An alternative high-speed divider based on an inverter ring interrupted by transmission gates is also described. Simulations showed that a divider using pseudo-NMOS inverters and CMOS transmission gates operated well and gave out four signals evenly spaced in phase at a input frequency of 4.8 GHz .

## Preface

This report has been written for Micrel as part of my master thesis at Department of Electronics and Telecommunications at the Norwegian University of Science and Technology (NTNU).

The work with this thesis has lasted for 20 weeks, and was finished in June 2006. At times the work has been hard and frustrating, but I feel that I have also learned a lot.

My technical teacher at NTNU has been Jukka Typpö, and my teaching supervisor at Micrel has been Oddgeir Fikstvedt. Thanks to both of them for valuable guidance during this period.

Trondheim, June 2006

Vidar Myklebust

## Contents

1 Introduction ..... 1
2 Background Theory ..... 3
2.1 Basic Circuits ..... 3
2.1.1 The Johnson Counter $(\div 2)$ ..... 3
2.1.2 $\div 3$ Circuit ..... 4
2.1.3 Dual-Modulus $\div 2 / 3$ Circuit ..... 5
2.2 Multi-Modulus Circuits ..... 6
2.3 Phase Switching ..... 7
2.4 Pseudo-NMOS Logic ..... 8
$3 \div 2 / 3$ Cells With Local Feedback ..... 11
$3.1 \div 2 / 3$ Cells Using Pseudo-NMOS Latches ..... 11
$3.2 \div 2 / 3$ Cells Using CMOS Latches ..... 12
4 High-Speed Inverter Ring Divider ..... 15
5 Architecture Based on Phase-Switching ..... 19
5.1 High-Speed Four-Phase $\div 4$ Circuit ..... 19
5.2 Four-Phase Phase Switcher ..... 22
5.2.1 Four-to-One Multiplexer ..... 22
5.2.2 Phase Select State Machine ..... 23
5.2.3 Mapping Logic ..... 25
$5.3 \div 2 / 3$ Stages ..... 27
5.3.1 $\div 2 / 3$ Core ..... 27
5.3.2 Control Qualifier ..... 29
5.3.3 Mapping Logic ..... 31
5.3.4 The Last $\div 2 / 3$ Stage ..... 32
5.4 Version $2-\div 2 / 3$ Cells With Local Feedback ..... 33
5.5 Version 3 - Four-to-One Multiplexer Implemented in CMOS ..... 35
5.6 Version 4 - Alternative Local Feedback $\div 2 / 3$ Cell ..... 37
6 Simulations ..... 41
$6.1 \div 2 / 3$ Cells With Local Feedback ..... 41
6.2 High-Speed Inverter Ring Divider ..... 41
6.3 Architectures Based on Phase Switching ..... 42
7 Results ..... 45
$7.1 \div 2 / 3$ Cells With Local Feedback ..... 45
7.2 High-Speed Inverter Ring Divider ..... 46
7.3 Architectures Based on Phase Switching ..... 46
8 Discussion ..... 51
8.1 Choice of Architecture ..... 51
8.1.1 High-Speed Input Divider ..... 51
8.1.2 Phase Switching Stage ..... 52
8.1.3 Low Frequency Stage ..... 52
8.2 Implementation ..... 53
8.3 Simulations ..... 53
9 Conclusion ..... 55
A Simulation Plots ..... 59

## List of Figures

2.1 Johnson counter ..... 3
2.2 Timing diagram for Johnson counter ..... 4
$2.3 \div 3$ circuit ..... 4
2.4 Timing diagram for $\div 3$ circuit ..... 5
$2.5 \div 2 / 3$ circuit ..... 5
2.6 Timing diagram for $\div 2 / 3$ circuit in $\div 2$ mode $(M=1)$ ..... 6
2.7 Two-bits prescaler ..... 6
2.8 Timing diagram for $\div 4 / 5 / 6 / 7$ circuit in $\div 5$ mode ..... 7
2.9 Timing diagram for a phase switcher in $\div 1.25$ mode ..... 8
2.10 NAND gate implemented in standard CMOS ..... 9
2.11 NAND gate implemented in pseudo-NMOS ..... 9
3.1 Cascaded $\div 2 / 3$ cells with local feedback ..... 12
3.2 Topology of $\div 2 / 3$ cell using pseudo-NMOS latches ..... 12
3.3 Implementation of an improved biphase pseudo-NMOS latch ..... 13
3.4 Topology of $\div 2 / 3$ cell using CMOS latches ..... 13
4.1 High-speed divider based on an inverter ring interrupted by transmission gates ..... 15
4.2 Timing diagram for high-speed inverter ring divider ..... 16
4.3 Four-phase high-speed divider based on an inverter ring inter- rupted by transmission gates with outputs evenly spaced in phase ..... 16
4.4 Timing diagram for four-phase high-speed inverter ring divider ..... 17
5.1 Architecture based on phase switching, version 1 ..... 20
5.2 High-speed $\div 4$ circuit ..... 21
5.3 Simulation results from the four-phase $\div 4$ circuit with a 2.9 GHz input signal ..... 21
5.4 Simulation results from the four-phase $\div 4$ circuit with a 5.8 GHz input signal ..... 22
5.5 High-speed four-phase $\div 2$ circuit ..... 23
5.6 Four-to-one multiplexer in pseudo-NMOS ..... 24
5.7 Timing strategy to avoid glitches ..... 24
5.8 Selective-blocking register using PMOS-coupled latches ..... 25
5.9 Phase select state machine ..... 26
5.10 Mapping logic ..... 27
5.11 Two-bits adder ..... 28
$5.12 \div 2 / 3$ core ..... 28
5.13 The part of the multiplexer generating the OUT signal ..... 29
5.14 Phase select state machine used in $\div 2 / 3$ cores ..... 30
5.15 PMOS-coupled latch ..... 30
5.16 Timing diagram for the phase select state machine ..... 31
5.17 Control qualifier for the first $\div 2 / 3$ stage ..... 32
5.18 Mapping logic used in the $\div 2 / 3$ stages ..... 32
5.19 Control qualifier for the last $\div 2 / 3$ stage ..... 33
5.20 Architecture based on phase switching, version 2 ..... 34
$5.21 \div 2 / 3$ cell with local feedback ..... 35
5.22 Architecture based on phase switching, version 3 ..... 36
5.23 Four-to-one multiplexer implemented in complementary CMOS ..... 37
5.24 Topology of $\div 2 / 3$ cell using CMOS latches ..... 38
5.25 Architecture based on phase switching, version 4 ..... 39
6.1 Test bench set-up ..... 42
A. 1 Simulation plot for $\div 2 / 3$ cell with pseudo-NMOS latches at maximum operation frequency in $\div 2$ mode ..... 59
A. 2 Simulation plot for $\div 2 / 3$ cell with pseudo-NMOS latches at maximum operation frequency in $\div 3$ mode ..... 60
A. 3 Simulation plot for $\div 2 / 3$ cell with CMOS latches at maximum operation frequency in $\div 2$ mode ..... 60
A. 4 Simulation plot for $\div 2 / 3$ cell with CMOS latches at maximum operation frequency in $\div 3$ mode ..... 61
A. 5 Simulation plot for version 1 of the high-speed inverter ring divider at maximum operation frequency ..... 61
A. 6 Simulation plot for version 2 of the high-speed inverter ring divider at maximum operation frequency ..... 62
A. 7 Simulation plot for version 3 of the high-speed inverter ring divider at maximum operation frequency ..... 62
A. 8 Version 1 of the phase switching based architecture in $\div 127$ mode ..... 63
LIST OF FIGURES ..... ix
A. 9 Version 2 of the phase switching based architecture in $\div 127$ mode ..... 63
A. 10 Version 3 of the phase switching based architecture in $\div 127$ mode ..... 64
A. 11 Version 4 of the phase switching based architecture in $\div 127$ mode ..... 64

## List of Tables

7.1 Current consumption for $\div 2 / 3$ cell using pseudo-NMOS latches ..... 45
7.2 Current consumption for $\div 2 / 3$ cell using CMOS latches ..... 45
7.3 Simulation results for high-speed inverter ring divider ..... 46
7.4 Simulation results for version 1 of the phase switching based architecture ..... 47
7.5 Simulation results for version 2 of the phase switching based architecture ..... 47
7.6 Simulation results for version 3 of the phase switching based architecture ..... 48
7.7 Simulation results for version 4 of the phase switching based architecture ..... 48

## Chapter 1

## Introduction

A prescaler is an important building block in a PLL, where it divides the VCO frequency to a lower frequency before the phase detector. A multi-modulus prescaler will typically be used in a fractional- $N$ synthesizer (in which the separation between the output frequencies can be given as a fraction of the input frequency) to acheive very good resolution in frequency, and at the same time have a high PLL bandwidth.

A 5.8 GHz 64 -modulus prescaler for use in a ISM band transceiver, is to be designed in a $0.18 \mu \mathrm{~m}$ CMOS process. One of the main challenges will be to acheive the wanted operation at high enough frequency. Different approaches will be tried out, in order to find the best architecture possible. An architecture based on a chain of identical $\div 2 / 3$ cells with local feedback presented by Cicero S. Vaucher et al. in [1], and one utilizing an interesting phase switching technique presented by Michael H. Perrott in his PhD thesis from MIT [2], are two approaches that will be investigated closer.

## Chapter 2

## Background Theory

### 2.1 Basic Circuits

### 2.1.1 The Johnson Counter ( $\div 2$ )

A Johnson counter is an easy and popular implementation of a $\div 2$ prescaler. As shown in fig. 2.1, two D latches are coupled in a loop, and clocked by inverse clocks. When $I N$ goes low the signal at $\overline{O U T}$ is being transferred to the output of the first latch, and transferred further to $O U T$ when $I N$ goes high again. This is shown in the timing diagram in fig. 2.2. OUT inverts every time $I N$ goes high, and thus the frequency is divided by two.


Figure 2.1: Johnson counter


Figure 2.2: Timing diagram for Johnson counter

### 2.1.2 $\div 3$ Circuit

Fig. 2.3 shows a $\div 3$ circuit utilizing two flip-flops and an AND-gate. Both flip-flops are being clocked at the rising edge of $I N$. The following logic function is obtained:

$$
\begin{equation*}
\overline{Q_{2}}(n+1)=\overline{\overline{Q_{1}(n) \overline{Q_{2}}(n)}=\overline{\overline{Q_{2}}(n-1) \overline{Q_{2}}(n)}=Q_{2}(n-1)+Q_{2}(n) . . n(n)} \tag{2.1}
\end{equation*}
$$

As can be seen from the timing diagram (fig. 2.4) this circuit swallows one extra period of the input signal per output period compared to the Johnson counter.


Figure 2.3: $\div 3$ circuit


Figure 2.4: Timing diagram for $\div 3$ circuit

### 2.1.3 Dual-Modulus $\div 2 / 3$ Circuit

The circuit in fig. 2.5 divides the frequency of the input signal by either 2 or 3, depending on the logic state of the modulus control signal $M$. When $M$ is low the output of the OR-gate will be controlled directly by $Q_{1}$, and the circuit will operate in the same way as the previously described $\div 3$ circuit, i.e. the circuit will be in $\div 3$ mode.


Figure $2.5: \div 2 / 3$ circuit

When $M$ is set high the output of the OR-gate will be high independent of the output of the first flip-flip. Thus the output of the AND-gate follows $\overline{Q_{2}}$. On every rising edge of $I N \overline{Q_{2}}$ will be inverted, and thus the frequency is divided by two. The timing diagram for the $\div 2$ mode is shown in fig. 2.6.


Figure 2.6: Timing diagram for $\div 2 / 3$ circuit in $\div 2$ mode $(M=1)$

### 2.2 Multi-Modulus Circuits

By cascading two or more dual-modulus prescalers one can obtain a multimodulus prescaler. An example of how that can be done is shown below.

The circuit in fig. 2.7 consists of two of the $\div 2 / 3$ circuits described above coupled in series. The modulus control signals are binary weighted, so the period of the output signal will be $T_{O U T_{2}}=T_{I N} \cdot\left(2^{2}+2^{1} \cdot M_{1}+2^{0} \cdot M_{0}\right)$, and the prescaler can divide on moduli ranging from 4 to 7 . To acheive this there is an OR-gate on the M -input of the first $\div 2 / 3$ circuit which lets this be in $\div 3$ mode only once per $\mathrm{OUT}_{2}$ period. Note that it is the inverses of the modulus control signals that are applied at the inputs. This is to achieve the given function for the output period, since the cells that are used divide by two when $M=1$ is applied and by three when $M=0$ is applied. A timing diagram for this circuit in $\div 5$ mode $\left(M_{1} M_{0}=01\right)$ is shown in fig. 2.8.


Figure 2.7: Two-bits prescaler


Figure 2.8: Timing diagram for $\div 4 / 5 / 6 / 7$ circuit in $\div 5$ mode

This circuit can easily be extended to an $n$-bits prescaler by cascading $n$ $\div 2 / 3$ circuits, and gating the modulus control signal for each of them through an OR-gate together with the OUT signals from all of the following $\div 2 / 3$ circuits.

### 2.3 Phase Switching

Another important principle in frequency division is phase switching. To utilize phase switching, signals at the same frequency, but separated in phase are required. Most phase switching prescalers operate at four phases that are equally spaced. This signals may for instance come from a $\div 2$ circuit that generates quadrature outputs or a divider based on an inverter ring and transmission gates [3].

A phase switcher is often implemented with a multiplexer that passes on the chosen signal to the output. A logic function generates the signal that chooses the correct phase. If the multiplexer once every period of the output switches to the phase that is $90^{\circ}$ after the previous one, that will equal adding a quarter of a period of the input signal to the output. Thus the output frequency for this example becomes:

$$
\begin{equation*}
T_{\text {out }}=T_{\text {in }}+\frac{1}{4} \cdot T_{\text {in }}=\frac{5}{4} \cdot T_{\text {in }} \Rightarrow \frac{1}{f_{\text {out }}}=\frac{5}{4} \cdot \frac{1}{f_{\text {in }}} \Rightarrow \underline{f_{\text {out }}=\frac{4}{5} \cdot f_{\text {in }} .} \tag{2.2}
\end{equation*}
$$

Fig. 2.9 shows what the timing diagram would look like for this example. $\varphi_{1}-\varphi_{4}$ are the phases at the input. $S E L$ is the signal that selects which phase should be passed on by the multiplexer. This could be implemented as two or four bits, but for simplicity it is here just shown as the selected phase. And finally, OUT is of course the output from the multiplexer. To avoid glitches in the output signal it is important that the switching operation between two phases is made when both of the phases are in the same logic state.


Figure 2.9: Timing diagram for a phase switcher in $\div 1.25$ mode

### 2.4 Pseudo-NMOS Logic

Pseudo-NMOS is an alternative technique to standard CMOS when highspeed operation is required. The increase in speed comes at the cost of an increased power consumption. Only at very high frequencies does a circuit implemented in pseudo-NMOS consume less power than an equivalent circuit implemented in standard CMOS. At those frequencies standard CMOS is often not applicable.

The principle of pseudo-NMOS is that when CMOS (Complementary MOS) uses both PMOS and NMOS transistors to realize a logic function, pseudo-NMOS uses only NMOS transistors to realize the function and pull the output low when that is required, and one single PMOS transistor with the gate grounded to pull the output high when there is no short circuit from the output through the NMOS transistors to ground.

Fig. 2.10 and 2.11 shows a standard NAND gate implemented in respectively CMOS and pseudo-NMOS. The function of a NAND gate in standard CMOS is well known; if either $A$ or $B$ is high there will be a path from ground to the output, and at the same time at least one of the PMOS transistors will be blocking the path from the supply voltage to the output, thus the output goes low. On the other hand, if both $A$ and $B$ are low there will be a path from the supply voltage to the output, whilst the NMOS transistors are blocking, thus the output goes high. In either case there will be a well defined full-range signal at the output.


Figure 2.10: NAND gate implemented in standard CMOS


Figure 2.11: NAND gate implemented in pseudo-NMOS

The behaviour of the pseudo-NMOS gate is not very different. If both inputs are low the NMOS transistors will block, whilst the gate-grounded PMOS transistor leads, and the output is pulled up to the level of the supply voltage. On the contrary, if one of the inputs are high there will be a path between ground and the output. At the same time there will also be a path from the supply voltage to the output, since the gate of the PMOS is grounded and hence leading constantly. The output voltage is thus given by the ratio of the resistances from the output to ground and the supply voltage respectively.

In a standard CMOS logical circuit each input is connected to the gate of both a PMOS and an NMOS transistor. In pseudo-NMOS it is only connected to the gate of an NMOS. This reduces the input capacitance, and thus increases the maximum speed. This increase in speed comes at the expense of a reduction in the signal swing. Pseudo-NMOS also consumes more power than CMOS for operation at low and moderate frequencies. That is because there will be a constant, relatively high current pull from supply to ground while the NMOS transistors are leading.

## Chapter 3

## $\div 2 / 3$ Cells With Local Feedback

The architecture in fig. 3.1 is based on a prescaler presented in [1]. It consists of six identical $\div 2 / 3$ cells which forms a 64 -modulus prescaler. The output period, $T_{\text {out }}$, is given by eq. (3.1), where $T_{i n}$ is the period of the input signal and $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}$ is the digital modulus control word.

$$
\begin{equation*}
T_{\text {out }}=T_{\text {in }} \cdot\left(2^{6}+M_{5} \cdot 2^{5}+M_{4} \cdot 2^{4}+M_{3} \cdot 2^{3}+M_{2} \cdot 2^{2}+M_{1} \cdot 2+M_{0}\right) \tag{3.1}
\end{equation*}
$$

Two $\div 2 / 3$ cells based on the cells used in [1] is presented; one using pseudo-NMOS latches and the other using complementary CMOS latches.

## $3.1 \div 2 / 3$ Cells Using Pseudo-NMOS Latches

The circuit in fig. 3.2 is composed of improved biphase pseudo-NMOS latches from [4] and differential output AND gates implemented in complementary CMOS. The AND gates are standard AND gates, where the extra output is coupled from the input of the inverter inside the gate. Once in every division period the last $\div 2 / 3$ cell in a chain will set its $\bmod _{\text {out }}$ signal high. This signal will propagate up through the chain, being re-clocked in each cell. A high mod signal allows a cell to divide by three once in a period, if its modulus control signal $M$ is set high. If the modulus control signal is low, the cell will always divide by two.

The latch is shown in fig. 3.3. When the clk signal is high the bottom NMOS will lead. If then also $D$ is high $\bar{Q}$ is short-circuited to ground and goes low, or if D is low $Q$ will be short-circuited to ground. When clk goes low $Q$ and $\bar{Q}$ will hold their values.

All the transistors in both the latch and the AND gate have a channel length $L=0.18 \mu \mathrm{~m}$. The PMOSes in the latch have a channel width $W_{P}=$
$25 \mu \mathrm{~m}$, and the NMOSes $W_{N}=50 \mu \mathrm{~m}$. In the AND gate the channel width of the PMOSes are $W_{P}=50 \mu \mathrm{~m}$, and of the NMOSes $W_{N}=25 \mu \mathrm{~m}$.


Figure 3.1: Cascaded $\div 2 / 3$ cells with local feedback


Figure 3.2: Topology of $\div 2 / 3$ cell using pseudo-NMOS latches

## $3.2 \div 2 / 3$ Cells Using CMOS Latches

The $\div 2 / 3$ cell in fig. 3.4 uses standard single-ended AND gates and Dlatches composed of NAND gates, all implemented in complementary CMOS. This $\div 2 / 3$ cell operates in the same manner as the one using pseudo-NMOS latches.

The channel length for all transistors in this $\div 2 / 3$ cell is $L=0.18 \mu \mathrm{~m}$. The width of all PMOSes is $W_{P}=50 \mu \mathrm{~m}$, and all NMOSes $W_{N}=25 \mu \mathrm{~m}$.


Figure 3.3: Implementation of an improved biphase pseudo-NMOS latch


Figure 3.4: Topology of $\div 2 / 3$ cell using CMOS latches

## Chapter 4

## High-Speed Inverter Ring Divider

Fig. 4.1 shows the architecture for a high-speed divider using an inverter ring interrupted with transmission gates [3]. The inverters used here are complementary CMOS implementations. The implementation of the transmission gate is a standard CMOS implementation. This divider can give out signals in five different phases, as shown in the timing diagram in fig. 4.2. The maximum input frequency for this implementation is about 5.2 GHz .


Figure 4.1: High-speed divider based on an inverter ring interrupted by transmission gates

The output signals are shown in fig. 4.2, and are not very well suited for phase switching, where four signals evenly spaced in phase $\left(0^{\circ}, 90^{\circ}, 180^{\circ}\right.$ and $270^{\circ}$ ) are required. To produce such outputs, one could simply take two outputs which have a phase difference of $90^{\circ}$ (e.g. $V_{2}$ and $V_{4}$ ) and invert those. The problem with this solution is that the inverters would introduce an extra delay, which would be quite considerable at high frequencies. Therefore an extended version of the architecture is presented in fig. 4.3.

The improved architecture is shown in fig. 4.3. The nodes $n_{A}$ and $n_{B}$ equal the outputs $V_{5}$ and $V_{3}$ respectively from the architecture in fig. 4.1, and have a phase difference of $90^{\circ}$. The signal at $n_{A}$ is passed through transmission gate 5 when $I N$ goes high, and then inverted before it reaches output $\varphi_{1}$. The signal at the input of transmission gate 1 will be the inverse of that


Figure 4.2: Timing diagram for high-speed inverter ring divider


Figure 4.3: Four-phase high-speed divider based on an inverter ring interrupted by transmission gates with outputs evenly spaced in phase
at node $n_{A}$, and will be passed through transmission gate 1 when $I N$ goes high, and is inverted once more before reaching output $\varphi_{3}$. All transmission gates are identical, and all inverters are identical, and will thereby have the same delays. $\varphi_{1}$ and $\varphi_{3}$ will thus change at the same time, and will (ideally) be exactly complementary. The same is the case for $\varphi_{2}$ and $\varphi_{4}$, and the wanted phase relationship between the outputs is achieved. (See fig. 4.4.) This implementation has been simulated successfully with a 4.5 GHz input signal.


Figure 4.4: Timing diagram for four-phase high-speed inverter ring divider
In an attempt to increase the speed of the divider, the CMOS inverters were substituted with pseudo-NMOS inverters. This change led to an increase in the maximum operation frequency to 4.8 GHz .

The dimensions for the transistors used in the inverter ring dividers are $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes in the pseudo-NMOS inverters, $W_{P}=75 \mu \mathrm{~m}$ for the PMOSes in the transmission gates and the CMOS inverters, $W_{N}=$ $25 \mu \mathrm{~m}$ for all NMOSes, and $L=0.18 \mu \mathrm{~m}$ for all transistors.

The output signals of all the versions of this architecture can be seen in appendix A. All results of interest are summarized in section 7.2.

## Chapter 5

## Architecture Based on Phase-Switching

This architecture is based on a frequency divider used in Michael H. Perrott's PhD thesis from MIT [2]. The first version of the architecture is just a slight modification of the original, and can be seen in fig. 5.1.

### 5.1 High-Speed Four-Phase $\div 4$ Circuit

The high-speed $\div 4$ circuit is built up by two identical four-phase $\div 2$ circuits [5] connected in series, as shown in fig. 5.2. These $\div 2$ circuits require two complementary input signals to operate correctly, and give out four signals at $0^{\circ}, 90^{\circ}, 180^{\circ}$ and $270^{\circ}$ phase, with a duty cycle slightly exceeding $25 \%$.

At lower frequencies this circuit would fail if the outputs $\varphi_{A 2}$ and $\varphi_{A 4}$ were used directly as inputs to the next four-phase $\div 2$ circuit the way it is done here. However it works very well in the input frequency range of interest, around 5.8 GHz . This is shown in the simulation plots in fig. 5.3 and fig. 5.4. The $\overline{I N}$ signal, which is exactly complementary to $I N$, is not shown here. Also, the $\varphi_{B 1}$ signal is shown alone, in addition to being shown together with the other outputs of the second $\div 2$ circuit, to make it easier to see the shape of it. All the outputs of that circuit have the same shape, and are evenly spaced in phase. As can be seen from fig. 5.3 (2.9 GHz input), the outputs of the second $\div 2$ circuits have main peaks at one fourth of the frequency of the input, which is the wanted signal. But there are also unwanted spikes, due to the delays from $\varphi_{A 2}$ going low to $\varphi_{A 4}$ going high, and from $\varphi_{A 4}$ going low to $\varphi_{A 2}$ going high. In the case of a 5.8 GHz input these delays are significantly shorter, and as can be seen from fig. 5.4 the spikes are eliminated, and only the wanted signal is still there.


Figure 5.1: Architecture based on phase switching, version 1


Figure 5.2: High-speed $\div 4$ circuit


Figure 5.3: Simulation results from the four-phase $\div 4$ circuit with a 2.9 GHz input signal


Figure 5.4: Simulation results from the four-phase $\div 4$ circuit with a 5.8 GHz input signal

Fig. 5.5 shows the implementation of the $\div 2$ circuit. The PMOS transistors have a channel length of $L_{P}=0.18 \mu \mathrm{~m}$, and a width of $W_{P}=50 \mu \mathrm{~m}$. The corresponding dimensions for the NMOS transistors are $L_{N}=0.18 \mu \mathrm{~m}$ and $W_{N}=25 \mu \mathrm{~m}$.

### 5.2 Four-Phase Phase Switcher

### 5.2.1 Four-to-One Multiplexer

The multiplexer passes on signals from the input to the output according to the select signals $S_{1}-S_{4}$. If $S_{1}$ is high $\varphi_{1}$ will be passed on, if $S_{2}$ is high $\varphi_{2}$ will be passed on, and so on. If more than one select signal is high the output will be high as long as at least one of the corresponding phases is high. In normal operation the multiplexer will let through two adjacent signals, which gives an output with a $50 \%$ duty cycle. During transistions three phases will be let through simultaneously for a short period of time. The multiplexer, which is implemented in a pseudo-NMOS technique, is shown in fig. 5.6. The dimensions stated in the circuit diagram are channel widths in $\mu \mathrm{m}$ The inverters marked with a * are implemented in complementary CMOS, and have channel widths $W_{P}=75 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. The other inverters are pseudo-NMOS, and have channel widths $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.


Figure 5.5: High-speed four-phase $\div 2$ circuit

### 5.2.2 Phase Select State Machine

The phase select state machine has two main purposes; to make sure that the right number of transitions is made in the multiplexer during each division period, and to make sure that those transitions are properly timed to avoid glitches in the output signal from the multiplexer. Fig. 5.7 illustrates the timing strategy to obtain glitch-free transitions. For simplicity the select signals are presented on one line, indicating which ones are high at any given moment.

As can be seen from the timing diagram, a change in state for a select signal will only occur when the signal of interest is low. A phase transition is done in two steps. For a transistion from $\varphi_{1}$ and $\varphi_{2}$ to $\varphi_{2}$ and $\varphi_{3}, \varphi_{3}$ will be switched in on the rising edge of $O U T_{1}$ when $\varphi_{1}$ is high, and on the following rising edge of $\overline{O U T_{1}}$ when $\varphi_{4}$ is high $\varphi_{1}$ will be switched out. In this way glitches are avoided. To implement this state machine registers built up of PMOS-coupled latches are utilized. These registers allow both $Q$ and $\bar{Q}$ to be high at the same time during certain falling transitions of $\overline{O U T_{1}}$.

The state machine will now switch the phase in every period of $O U T_{1}$ without inducing any glitches to the output. What it lacks though, is the


Figure 5.6: Four-to-one multiplexer in pseudo-NMOS


Figure 5.7: Timing strategy to avoid glitches
functionality needed to stop in a given state determined by the control signals $C_{0}, \overline{C_{0}}, C_{1}$ and $\overline{C_{1}}$. The solution to this is to add a selective-blocking functionality to the register used (see fig. 5.8). When $B_{Q}$ and $B_{\bar{Q}}$ are equal there will be a path from both nodes $n_{1}$ and $n_{2}$ to ground, thus the register will be able to pass on signals to both $Q$ and $\bar{Q}$. However, if $B_{Q}=1$ and $B_{\bar{Q}}=0 \bar{Q}$ will be blocked from going high, and consequently $Q$ will be blocked from going low. The other way round, if $B_{Q}=0$ and $B_{\bar{Q}}=1 Q$ will be blocked from going high, and $\bar{Q}$ will be blocked from going low. When two of these selective-blocking registers are connected in a negative feedback loop, and the control signals $C_{0}, \overline{C_{0}}, C_{1}$ and $\overline{C_{1}}$ are properly connected to the blocking inputs of the registers (fig. 5.9), the state machine will stop in the state determined by the control signals, and generate the select signals making the multiplexer accomplish the required number of phase transistions per division period.


Figure 5.8: Selective-blocking register using PMOS-coupled latches
In the selective-blocking register all the transistors on the left side (both NMOSes and PMOSes) have channel width $W=50 \mu \mathrm{~m}$, and those on the right side $W=75 \mu \mathrm{~m}$. The lengths of all transistors are $L=0.18 \mu \mathrm{~m}$.

### 5.2.3 Mapping Logic

The state of the phase select state machine goes through the following cycle: $00 \rightarrow 10 \rightarrow 11 \rightarrow 01 \rightarrow 00$. And the stop state is set by $\left\{C_{1} C_{0}\right\}$. Thus


Figure 5.9: Phase select state machine
shifting $\left\{C_{1} C_{0}\right\} n$ times according to this sequence will make the state machine go through $n$ states, and the multiplexer will swallow $n$ pulses of the prescalers input signal. $n$ can be an integer from 0 to 3 . The mapping logic is clocked by the output signal of the entire prescaler, $O U T_{5}$, and should therefore shift its output $\left\{C_{1} C_{0}\right\}\left\{D I V_{1} D I V_{0}\right\}$ times to make the phase switcher swallow $\left\{D I V_{1} D I V_{0}\right\}$ pulses per division period. This functionality is easily implemented with an adder, a flip-flip and a XNOR-gate. Since it is running on a relatively low frequency it should be implemented utilizing the standard CMOS technique to minimize the power consumption. The mapping logic is illustrated in fig. 5.10, and the adder it contains in fig. 5.11. The flipflops used are standard complementary CMOS implementations composed of NAND gates. All the logic gates are also standard complementary CMOS implementations.


Figure 5.10: Mapping logic
All PMOSes in the mapping logic have $W_{P}=50 \mu \mathrm{~m}$, except for those in the inverters which have $W_{P}=75 \mu \mathrm{~m}$. The channel widths of all NMOSes are $W_{N}=25 \mu \mathrm{~m}$, and the lengths of all the transistors are $L=0.18 \mu \mathrm{~m}$.

## $5.3 \div 2 / 3$ Stages

### 5.3.1 $\div 2 / 3$ Core

The implementation of the first three $\div 2 / 3$ cores is shown in fig. 5.12. The four-phase $\div 2$ circuit is the same one as used in the input divider.


Figure 5.11: Two-bits adder


Figure 5.12: $\div 2 / 3$ core

## Two-to-One Multiplexer

The multiplexer includes a NOR-functionality that turns the four phases on the input into two complementary signals; $\varphi_{A}=\overline{\varphi_{1}+\varphi_{2}}$ and $\varphi_{B}=\overline{\varphi_{3}+\varphi_{4}}$. The multiplexer consists of two separate identical circuits implemented in pseudo-NMOS, one for the OUT output and one for the $\overline{O U T}$ output. The one for the OUT output is shown in fig. 5.13. The control signals are quite self-explanatory; when $\varphi_{A \rightarrow O U T}$ is high $\varphi_{A}$ is directed to $O U T$, and so on.

The dimensions for the multiplexer are $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.


Figure 5.13: The part of the multiplexer generating the $O U T$ signal

## Phase Select State Machine

The phase select state machine is illustrated in fig. 5.14. The topology of the latches used is shown in fig. 5.15. The inverter is implemented in pseudoNMOS. Note that the implementation of these PMOS-coupled latches allow the outputs $Q$ and $\bar{Q}$ to be high at the same time under certain conditions [2]. Also note that the clocking signals $O U T$ and $\overline{O U T}$ are not complementary. To understand the operation of the state machine, see the timing diagram in fig. 5.16

The dimensions for the transistors used in the latches and the inverter are $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.

### 5.3.2 Control Qualifier

The control qualifier passes on the control signal $C$ from the mapping logic to the phase select state machine when the $\overline{O U T}$ signal from the stage it


Figure 5.14: Phase select state machine used in $\div 2 / 3$ cores


Figure 5.15: PMOS-coupled latch


Figure 5.16: Timing diagram for the phase select state machine
belongs to is high, and all the following $O U T$ signals are low. The circuit that implements this functionality for the first $\div 2 / 3$ stage is shown in fig. 5.17. For the next stages the input $\overline{O U T_{2}}$ will be replaced be the respective stages' $\overline{O U T}$ signal, and the NMOSes with $O U T_{3}$ and $O U T_{4}$ will be removed one by one for each stage, such that only the $O U T$ signals from the stages later on in the chain are being used as input signals.

The dimensions for the transistors used in the control qualifier are $W_{P}=$ $50 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.

### 5.3.3 Mapping Logic

The mapping logic maps the modulus control signals into the control signals required for the phase select state machine to generate the correct select signals for the multiplexer, so that the phase switching is carried out correctly. The circuit that implements this is shown in fig. 5.18. If the modulus control signal $M$ is high on the rising edge of $\mathrm{OUT}_{5}$, the control signal $C$ will be inverted. In the other case, when $M$ is low, then $C$ will hold its current state.

The flip-flop is a standard CMOS D-flip-flop composed of NAND gates.


Figure 5.17: Control qualifier for the first $\div 2 / 3$ stage

The XOR gate is also a standard complementary CMOS implementation. The dimensions for the transistors used in the flip-flop and the XOR gate are $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.


Figure 5.18: Mapping logic used in the $\div 2 / 3$ stages

### 5.3.4 The Last $\div 2 / 3$ Stage

The last $\div 2 / 3$ core is similar to the previous ones, except for that the phase select state machine is implemented in standard CMOS. The architecture for the state machine is the same as for those used in the other $\div 2 / 3$ stages, but
the latches and the inverter are implemented in complementary CMOS. The channel widths for the PMOSes used in the latches are $W_{P}=50 \mu \mathrm{~m}$, and for the PMOS in the inverter $W_{P}=75 \mu \mathrm{~m}$. All NMOSes have $W_{N}=25 \mu \mathrm{~m}$. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.

The phase select state machine only needs a single-ended signal from the control qualifier. The circuit for the control qualifier in the last stage is shown in fig. 5.19. The dimensions for the transistors used in the control qualifier are $W_{P}=50 \mu \mathrm{~m}$ for the PMOSes, and $W_{N}=25 \mu \mathrm{~m}$ for the NMOSes. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.


Figure 5.19: Control qualifier for the last $\div 2 / 3$ stage
The mapping logic is the same as used in the previous stages, but since only a single-ended signal is required only one output is being used.

### 5.4 Version $2-\div 2 / 3$ Cells With Local Feedback

Fig. 5.20 shows version 2 of the architecture. The high-speed $\div 4$ circuit and the four-phase phase switching stage are the exact same as in the first version, while the $\div 2 / 3$ stages are replaced by $\div 2 / 3$ cells with local feedback.

The circuit in fig. 5.21 is a $\div 2 / 3$ cell which allows $\div 3$ operation only when the signal timer $_{i n}$ is low (which happens only when the OUT signals from all the cells after it in the chain are low), and it sets timer out low when both timer $_{i n}$ and its own OUT signal are low. These $\div 2 / 3$ cells divide by two when the $M$ input is high, and by three when it is low. By cascading cells like this multi-modulus functionality is achieved, without the need of long feedback loops. Note that the timer $_{\text {in }}$ input at the last cell should


Figure 5.20: Architecture based on phase switching, version 2

### 5.5. VERSION 3 - FOUR-TO-ONE MULTIPLEXER IMPLEMENTED IN CMOS35

be connected to ground. The flip-flops and the logic gates are standard complementary CMOS implementations.

The channel widths for all PMOSes, except the one in the inverter, are $W_{P}=50 \mu \mathrm{~m}$, and for the PMOS in the inverter $W_{P}=75 \mu \mathrm{~m}$. All NMOSes have $W_{N}=25 \mu \mathrm{~m}$. All transistors have channel lengths $L=0.18 \mu \mathrm{~m}$.


Figure 5.21: $\div 2 / 3$ cell with local feedback

### 5.5 Version 3 - Four-to-One Multiplexer Implemented in CMOS

In this version of the architecture the four-to-one multiplexer is implemented in complementary CMOS. The output of the multiplexer is now single-ended. The architecture is shown in fig. 5.22.

The high-speed $\div 4$ circuit, and the phase select state machine and the mapping logic in the four-phase phase switching stage is still the same as in the previous two versions of the architecture. The $\div 2 / 3$ cells utilizing local feedback are the same as in version 2.

To implement the multiplexer each input phase is AND-ed with its corresponding select signal, and the output of the four AND gates are connected to a four-input OR gate. Thus this multiplexer will perform the same operation as the one used in the previous versions. It will probably not be able to work properly as high up in frequency as the original one, but it works at


Figure 5.22: Architecture based on phase switching, version 3

### 5.6. VERSION 4 - ALTERNATIVE LOCAL FEEDBACK $\div 2 / 3$ CELL 37

the frequency of interest, and consumes less current at that frequency. The new multiplexer topology can be seen in fig. 5.23.

All PMOSes used in this multiplexer have $W_{P}=50 \mu \mathrm{~m}$, and all NMOSes $W_{N}=25 \mu \mathrm{~m}$, except for the one in the output inverter of the OR gate which is $75 \mu \mathrm{~m}$. The channel lengths of all the transistors are $L=0.18 \mu \mathrm{~m}$.


Figure 5.23: Four-to-one multiplexer implemented in complementary CMOS

### 5.6 Version 4 - Alternative Local Feedback $\div 2 / 3$ Cell

In the final version of the architecture the $\div 2 / 3$ cells are replaced with a CMOS version the $\div 2 / 3$ cells used in [1]. Also these use local feedback between the cells. The high-speed $\div 4$ circuit, and the phase select state machine and the mapping logic in the four-phase phase switching stage is the same as in all versions of the architecture. The four-to-one multiplexer is implemented in complementary CMOS, and is the same one as used in version 3 of the architecture. The final architecture is shown in fig. 5.25.

The $\div 2 / 3$ cells used here are the same as the one presented in section 3.2. The topology (fig. 5.24) and the transistor dimensions are repeated for convenience. The channel length for all transistors in this $\div 2 / 3$ cell is $L=0.18 \mu \mathrm{~m}$. The width of all PMOSes is $W_{P}=50 \mu \mathrm{~m}$, and all NMOSes $W_{N}=25 \mu \mathrm{~m}$.


Figure 5.24: Topology of $\div 2 / 3$ cell using CMOS latches
5.6. VERSION 4 - ALTERNATIVE LOCAL FEEDBACK $\div 2 / 3$ CELL 39


Figure 5.25: Architecture based on phase switching, version 4

## Chapter 6

## Simulations

All simulations are performed with a complementary pair of square-wave input signals applied, having a voltage swing from 0 to 1.8 V . The rise/fall time for these signals is 10 ps for all simulations. The period varies for the different simulations. The supply voltage is always 1.8 V .

## $6.1 \div 2 / 3$ Cells With Local Feedback

To find out if any of the two presented $\div 2 / 3$ cells can be suitable for using in the multi-modulus prescaler architecture shown in fig. 3.1, they are first simulated to find the maximum operation frequency when they are running isolated from other circuitry. If the results from these simulations are positive, the entire circuit should be simulated.

Both versions of the $\div 2 / 3$ cell are simulated repeatedly with gradually increasing frequency to find the highest frequency where they operate properly in both $\div 2$ and $\div 3$ mode. The rms current consumption is also measured. The results can be found in section 7.1, and relevant simulation plots in appendix A.

### 6.2 High-Speed Inverter Ring Divider

The simulations that are presented for this architecture are those for the maximum operation frequencies, for the lowest frequencies where proper operation were not achieved, and for 1 GHz (to get a fair comparison of the current consumption between the different versions).

### 6.3 Architectures Based on Phase Switching

All simulations on these architectures are performed with input signals having a period of $T_{I N}=172.4 \mathrm{ps}(\approx 5.8 \mathrm{GHz})$. The inputs of each bit of the modulus control word, $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}$, are either 0 (logical 0 ) or 1.8 V (logical 1). The test bench set-up is shown in fig. 6.1.


Figure 6.1: Test bench set-up

The modulus control word is binary weighted, and the period of the output signal is given by:

$$
\begin{equation*}
T_{\text {OUT }}=\left(64+32 M_{5}+16 M_{4}+8 M_{3}+4 M_{2}+2 M_{1}+M_{0}\right) \cdot T_{I N} \tag{6.1}
\end{equation*}
$$

To verify the operation of the circuits, they should ideally have been tested for every single modulus. Due to very time demanding simulations that is not done. Instead the circuits are tested for some chosen moduli, meant to cover the most critical operations. As long as the circuits operate properly for these it is very likely they will also operate properly for the other possible moduli. A full test could be a topic for further work.

The chosen moduli are:

- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{000000\}(\div 64)$
- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{000100\}(\div 68)$
- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{010011\}(\div 83)$
- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{101110\}(\div 110)$
- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{111001\}(\div 121)$
- $\left\{M_{5} M_{4} M_{3} M_{2} M_{1} M_{0}\right\}=\{111111\}(\div 127)$

The architectures in chapter 5 are simulated on this test bench, and the output periods and rms current consumptions of these are measured for the given moduli. The results from the simulations can be found in section 7.3, and the simulation plots in appendix A .

## Chapter 7

## Results

## $7.1 \div 2 / 3$ Cells With Local Feedback

Tables 7.1 and 7.2 gives the current consumption at 1.6 GHz and maximum operation frequency for the $\div 2 / 3$ cell using pseudo-NMOS latches and the one using CMOS latches respectively.

In $\div 2$ mode the $\div 2 / 3$ cell using CMOS latches does not produce a good $\bmod _{\text {out }}$ signal at the maximum operation frequency. However the out signal is correct, and the mod $_{\text {out }}$ signal is not needed if the cell is used as the first in a chain. At 1.6 GHz the $\bmod _{\text {out }}$ signal is correct. The fact that the circuit consumes less current at 2.3 GHz than at 1.6 GHz is related to this.

Table 7.1: Current consumption for $\div 2 / 3$ cell using pseudo-NMOS latches

|  | $\div 2$ mode | $\div 3$ mode | Average |
| :--- | :---: | :---: | :---: |
| 1.6 GHz | 17.048 mA | 16.733 mA | 16.891 mA |
| $2.0 \mathrm{GHz}\left(f_{\text {max }}\right)$ | 21.351 mA | 21.629 mA | 21.490 mA |

Table 7.2: Current consumption for $\div 2 / 3$ cell using CMOS latches

|  | $\div 2$ mode | $\div 3$ mode | Average |
| :--- | :---: | :---: | :---: |
| 1.6 GHz | 13.754 mA | 15.301 mA | 14.528 mA |
| $2.3 \mathrm{GHz}\left(f_{\text {max }}\right)$ | 13.101 mA | 16.836 mA | 14.969 mA |

The plots showing the simulations of the circuits in $\div 2$ and $\div 3$ mode at their maximum operation frequencies are shown in fig. A. 1 - A.4.

### 7.2 High-Speed Inverter Ring Divider

Table 7.3 summarizes the results of interest from the simulations of the different versions of the high-speed inverter ring divider. The current consumptions given are the rms values measured over three output periods. The three versions of the architecture are defined in the list below.

Version 1: The initial architecture, implemented in complementary CMOS (fig. 4.1)

Version 2: Extension of version 1, giving out four phases that are evenly spaced in phase (fig. 4.3)

Version 3: Same as version 2, but implemented with pseudo-NMOS inverters

Table 7.3: Simulation results for high-speed inverter ring divider

|  | Maximum <br> operation | Current |  |
| :--- | :---: | ---: | ---: |
| frequency | $@ f_{\max }$ | @ 1 GHz |  |
| Version 1 | $5.2 \mathrm{GHz}(192.3 \mathrm{ps})$ | 8.994 mA | 4.001 mA |
| Version 2 | $4.5 \mathrm{GHz}(222.2 \mathrm{ps})$ | 12.663 mA | 5.792 mA |
| Version 3 | $4.8 \mathrm{GHz}(208.3 \mathrm{ps})$ | 54.585 mA | 48.800 mA |

The plots showing each version of the circuit at their respective maximum operation frequencies are shown in fig. A. 5 - A. 7 .

### 7.3 Architectures Based on Phase Switching

Table 7.4 summarizes the simulation results for version 1 of the architecture. The values given for the current consumption are the root-mean-square values measured over two output periods. The periods measured from the simulations are the average of two consecutive output periods. All deviations are $\ll T_{\text {in }}=172.4 \mathrm{ps}$. The circuit functions properly for all tested moduli. The estimated area ${ }^{1}$ of version 1 of the architecture is $0.045 \mathrm{~mm}^{2}$.

[^0]Table 7.4: Simulation results for version 1 of the phase switching based architecture

| Modulus | Current | Simulated | Intentional | Deviation |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| binary |  | (decimal) | consumption | period | period | absolute | relative

Table 7.5: Simulation results for version 2 of the phase switching based architecture

| Modulus |  | Current | Simulated | Intentional | Deviation |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| binary |  | (decimal) | consumption | period | period | absolute | relative

Table 7.5 summarizes the simulation results for version 2 of the architecture. The values given for the current consumption are the root-mean-square values measured over two output periods. The periods measured from the simulations are the average of two consecutive output periods. All deviations are $\ll T_{i n}=172.4 \mathrm{ps}$. The circuit functions properly for all tested moduli. The estimated area of version 2 of the architecture is $0.039 \mathrm{~mm}^{2}$

Table 7.6: Simulation results for version 3 of the phase switching based architecture

| Modulus |  | Current | Simulated | Intentional | Deviation |  |
| :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| binary (decimal) |  | consumption | period | period | absolute | relative |
| 000000 | $(\div 64)$ | 78.397 mA | 11.033525 ns | 11.0336 ns | -0.075 ps | -6.80 ppm |
| 000100 | $(\div 68)$ | 77.836 mA | 11.723225 ns | 11.7232 ns | +0.025 ps | +2.13 ppm |
| 010011 | $(\div 83)$ | 77.198 mA | 14.311595 ns | 14.3092 ns | +2.395 ps | +167.37 ppm |
| 101110 | $(\div 110)$ | 76.759 mA | 18.963949 ns | 18.9640 ns | -0.051 ps | -2.69 ppm |
| 111001 | $(\div 121)$ | 76.778 mA | 20.857078 ns | 20.8604 ns | -3.322 ps | -159.25 ppm |
| 111111 | $(\div 127)$ | 76.549 mA | 21.897113 ns | 21.8948 ns | +2.313 ps | +105.64 ppm |

Table 7.6 summarizes the simulation results for version 3 of the architecture. The values given for the current consumption are the root-mean-square values measured over ten output periods. The periods measured from the simulations are the average of ten consecutive output periods. All deviations are $\ll T_{i n}=172.4 \mathrm{ps}$. The circuit functions properly for all tested moduli. The estimated area of version 3 of the architecture is $0.038 \mathrm{~mm}^{2}$

Table 7.7: Simulation results for version 4 of the phase switching based architecture

| Modulus |  | Current | Simulated | Intentional | Deviation |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| binary (decimal) |  | consumption | period | period | absolute | relative |
| 000000 | $(\div 64)$ | 75.619 mA | 11.033596 ns | 11.0336 ns | -0.004 ps | -0.36 ppm |
| 000100 | $(\div 68)$ | 75.091 mA | 11.723189 ns | 11.7232 ns | -0.011 ps | -0.94 ppm |
| 010011 | $(\div 83)$ | 76.120 mA | 14.309968 ns | 14.3092 ns | +0.768 ps | +53.67 ppm |
| 101110 | $(\div 110)$ | 74.597 mA | 18.964049 ns | 18.9640 ns | +0.049 ps | +2.58 ppm |
| 111001 | $(\div 121)$ | 74.978 mA | 20.858872 ns | 20.8604 ns | -1.528 ps | -73.25 ppm |
| 111111 | $(\div 127)$ | 74.661 mA | 21.893371 ns | 21.8948 ns | -1.429 ps | -65.27 ppm |

Table 7.7 summarizes the simulation results for version 4 of the architecture. The values given for the current consumption are the root-mean-square values measured over ten output periods. The periods measured from the simulations are the average of ten consecutive output periods. All deviations
are $\ll T_{\text {in }}=172.4 \mathrm{ps}$. The circuit functions properly for all tested moduli. The estimated area of version 4 of the architecture is $0.036 \mathrm{~mm}^{2}$

The plots showing each version of the arcitecture in $\div 127$ mode are shown in fig. A. 8 - A. 11.

## Chapter 8

## Discussion

### 8.1 Choice of Architecture

### 8.1.1 High-Speed Input Divider

Different architectural approaches were tested out. An arcitecture based on $\div 2 / 3$ cells with local feedback is presented in chapter 3 . Two versions of the $\div 2 / 3$ cell were designed; one using biphase pseudo-NMOS latches, the other using standard complementary CMOS latches. Unfortunately none of them were quick enough to work at the wanted input frequency of the prescaler, 5.8 GHz .

An other approach for a high-speed divider to use at the input of the prescaler is the inverter ring divider presented in chapter 4. The initial circuit that was tested out is a slightly modified version of the one that is presented in [3]. This $\div 4$ circuit were implemented in complementary CMOS, and achieved proper operation at 5.2 GHz . However this circuit does not give out signals in the phases required to be used as inputs to a phase switching stage. A small adjustment was made to generate the wanted phases. This improved circuit generated output signals in four evenly spaced phases, at an input frequency of 4.5 GHz . Using pseudo-NMOS inverters instead of the CMOS inverters initially used, it generated the wanted phases, at an input frequency of 4.8 GHz . The measured current consumptions (see tables 7.1 and 7.2 ) show that the current consumption in the CMOS version increases relatively much with frequency, while the current consumption in the pseudoNMOS version depends less on frequency, as expected. Even at frequencies as high as $4.5-4.8 \mathrm{GHz}$ it is clear that the pseudo-NMOS version consumes a lot more current than the CMOS version.

In section 5.1 is presented a high-speed divider that works at the required frequency. This is composed of two identical $\div 2$ circuits, which takes in
two complementary input signals and give out four signals evenly spaced in phase with about $25 \%$ duty cycle. In an input frequency range around 5.8 GHz two outputs from the first divider are able to drive the second one directly, even though these outputs are not exactly complementary. At lower frequencies this configuration causes unwanted spikes at the outputs of the second divider. This is explained closer in section 5.1.

The last discussed high-speed divider was a natural choice, as it was the only one able to operate at the required frequency. Using pseudo-NMOS inverters in the initial version of the inverter ring divider, in addition to some further optimizing, could have made that one run on 5.8 GHz . However it would still not give out the needed phases, and could not have easily been used before a phase switching stage.

### 8.1.2 Phase Switching Stage

Since the input divider only can divide on one modulus, a phase switching stage is a good way to achieve a programmable output signal with the resolution of one period of the input signal.

The phase switching stage is based on the architecture in [2]. It consists of a four-to-one multiplexer, a phase select state machine and a mapping logic circuit. The mapping logic circuitry is clocked by the final output signal of the prescaler, and operates thus on such low frequency that it implementing it in CMOS was a natural choice, with the current consumption in mind.

The phase select state machine and the multiplexer was initially implemented in pseudo-NMOS. It was attempted to implement the phase select state machine in complementary CMOS, but that attempt failed. The multiplexer, on the other hand, was easily implemented in complementary CMOS. Simulations showed that implementing the multiplexer in CMOS reduced the current consumption significantly.

### 8.1.3 Low Frequency Stage

The first attempt to implement the low frequency stage was to use the phase switcing architecture from [2]. This was implemented in pseudo-NMOS and contributed considerably to the total current consumption. Converting this circuits to complementary CMOS could maybe have been worth the effort, and would undoubtly have reduced the current consumption since they are running on such relatively low frequencies.

Also two chains of four $\div 2 / 3$ cells with local feedback were tried. The two types of $\div 2 / 3$ cells are presented in sections 5.4 and 5.6. The latter, based on [1], consumes a little less current, and was therefore chosen.

### 8.2 Implementation

To summarize; the final architecture is composed of the four-phase high-speed input divider presented in section 5.1, the phase select state machine and the mapping logic presented in section 5.2, the four-to-one CMOS multiplexer presented in section 5.5 , and four of the $\div 2 / 3$ cells presented in section 3.2.

The entire prescaler is implemented using RF transistors models. These have a minimum channel length of $0.18 \mu \mathrm{~m}$, which is used for all the transistors. The minimum channel width is quite large for these transistor models, $25 \mu \mathrm{~m}$. Using other transistor models, allowing smaller channel widths, for the parts of the circuit that do not operate at the maximum frequency would most likely reduce the current consumption quite a lot.

In addition to the circuits that have been tested, also a complementary CMOS implementation of the low frequency phase switching stages should have been tried, and an extra effort in trying to convert the phase select state machine in the first phase switching stage should have been made. Those changes might have improved the prescaler.

### 8.3 Simulations

The architectures in chapter 5 are simulated for only six different moduli. That is because the simulations are very time demanding. The moduli for which they are simulated are however chosen in such a way that they will most likely detect any errors in functionality

The simulations were done with a differential square-wave rail-to-rail input signal applied, having a rise/fall time of 10 ps . Having such a signal available in a real circuit is not very likely. Generating this signal would be hard. Another weakness by the simulations is that parasitic capasitances are not included. So, even though the circuit operates properly in the simulations, it would need further improvements before it could be manufactured.

## Chapter 9

## Conclusion

A multi-modulus prescaler, able to divide by any integer modulus in the range 64 to 127, has been designed in a $0.18 \mu \mathrm{~m}$ CMOS process, and works properly for a 5.8 GHz input signal, according to simulations.

The final architecture is composed of a four-phase high-speed input divider, a phase switching stage consisting of a four-to-one multiplexer, a phase select state machine and a mapping logic circuit, and four cascaded $\div 2 / 3$ cells with local feedback. The high-speed input divider is implemented in pseudoNMOS to achieve the required speed. The phase select state machine is also implemented in pseudo-NMOS. The rest of the circuit is implemented in complementary CMOS to minimize current consumption. Complementary CMOS implementations have turned out to be consuming less current than pseudo-NMOS implementations for the frequencies of interest. PseudoNMOS is however a little faster, and are used only when a proper working CMOS implementation could not be done.

The prescaler consumes a little more current than intended. In the design an RF transistor model is used, with a minimum channel width of $25 \mu \mathrm{~m}$. By using other transistor models, allowing smaller channel widths, the current consumption would probably have been reduced considerably. Another item for further work could be to try to implement the phase select state machine in complementary CMOS. This circuit is running on one fourth of the input frequency, and a CMOS implementation of this would probably also contribute to lowering the total current consumption.

## Bibliography

[1] Cicero S. Vaucher et al. A Family of Low-Power Truly Modular Programmable Dividers in Standard $0.35 \mu \mathrm{~m}$ CMOS Technology. IEEE Journal of Solid-State Circuits, 35(7), July 2000.
[2] Michael H. Perrott. Techniques for high data rate modulation and low power operation of fractional- $N$ synthesizers. PhD thesis, Massachusetts Institute of Technology, Sep. 1997.
[3] Carlos E. Saavedra. A Microwave Frequency Divider Using an Inverter Ring and Transmission Gates. IEEE Microwave and Wireless Components Letter, 15(5), May 2005.
[4] A. Mason. Lecture 30: Latches and Flip Flops, ECE 813 Advanced VLSI Design, Michigan State University College of Engineering. http://www.egr.msu.edu/classes/ece813/mason/files/Lecture30.pdf.
[5] B. Razavi, K. F. Lee and R. H. Yan. Design of High-Speed, Low-Power Frequency Dividers and Phase-Locked Loops in Deep Submicron CMOS. Journal of Solid State Circuits, 30(2):101-109, Feb. 1995.

## Appendix A

## Simulation Plots



Figure A.1: Simulation plot for $\div 2 / 3$ cell with pseudo-NMOS latches at maximum operation frequency in $\div 2$ mode


Figure A.2: Simulation plot for $\div 2 / 3$ cell with pseudo-NMOS latches at maximum operation frequency in $\div 3$ mode


Figure A.3: Simulation plot for $\div 2 / 3$ cell with CMOS latches at maximum operation frequency in $\div 2$ mode


Figure A.4: Simulation plot for $\div 2 / 3$ cell with CMOS latches at maximum operation frequency in $\div 3$ mode


Figure A.5: Simulation plot for version 1 of the high-speed inverter ring divider at maximum operation frequency


Figure A.6: Simulation plot for version 2 of the high-speed inverter ring divider at maximum operation frequency


Figure A.7: Simulation plot for version 3 of the high-speed inverter ring divider at maximum operation frequency


Figure A.8: Version 1 of the phase switching based architecture in $\div 127$ mode


Figure A.9: Version 2 of the phase switching based architecture in $\div 127$ mode


Figure A.10: Version 3 of the phase switching based architecture in $\div 127$ mode


Figure A.11: Version 4 of the phase switching based architecture in $\div 127$ mode


[^0]:    ${ }^{1}$ The estimates for the circuit area for the different versions of this architecture are based on the number of transistors used and their dimensions, and the layout rules for the process used. The process information is confidential, and therefore the calculations cannot be shown.

