

Harald Sæther

A Sub-100mV Supply Voltage
Standard-Cell Based Memory in 22nm FD-SOI

June 2022

## - NTNU

Norwegian University of
Science and Technology

## A Sub-100mV Supply Voltage StandardCell Based Memory in 22nm FD-SOI

## Harald Sæther

Electronic Systems Design<br>Submission date: June 2022<br>Supervisor: Snorre Aunet<br>Co-supervisor: Trond Ytterdal

## Abstract

The desire for reduction in power consumption has motivated the design of integrated circuits operating in the sub-threshold domain. Circuits operating at subthreshold supply voltages need robust architectures and techniques, that can withstand effects that are pronounced in the sub-threshold domain. Effects such as an increased sensitivity towards process variation and a diminished on-to-off current ratio can negatively affect the circuit functionality. Many applications such as energy harvesting in mm-scale nodes and biomedical devices, desire reliable and efficient integrated circuits that have as low of a supply voltage as possible. Currently, charge pumps/voltage converters, that have bad efficiency, need to be used to convert the output voltage of an energy harvester into a higher voltage, which the rest of the circuit runs on. This costly conversion can be reduced or avoided if the rest of the circuit runs at as low of a supply voltage as possible. Memory is often used in complex digital circuits like mm-scale nodes and biomedical implants, occupying a vast amount of circuit area, and is thus a component where huge power savings can be gained if memory is designed at as low of a supply voltage as possible. In this thesis, such a memory is designed for operation at sub- 100 mV supply voltage. Where robust techniques are applied at the transistor to architectural abstraction level of the memory. Several classic static logic gates are benchmarked in terms of power, performance, layout area, against so called Schmitt Trigger-structures that have been proven to operate at very low supply voltages. From standard cells(NAND, NOR and NOT -logic gates) a simple standard cellbased memory(SCM) is constructed which includes structures such as multiplexers, decoders, pre-decoders, clock gates and data-flip-flops(DFF). The constructed memory is then simulated and modelled to verify functional chip yield for a supply voltage of 87 mV . The complete SCM which is 1024-bits in size, with 128 different addresses, storing 8 -bits of data at each address, shows good functional chip yield, with the lower bound of yield above $90 \%$ with a maximum redundancy of 4 . Operating frequency is 150 Hz , with an average power consumption of 6.991 nW . The SCMs total layout area is $338741.1 \mu \mathrm{~m}^{2}$.

## Sammendrag

Ønsket om reduksjon i strømforbruk har motivert design av integrerte kretser som opererer i sub-terskel domenet. Kretser som opererer ved sub-terskel forsyningsspenninger trenger robuste arkitekturer og teknikker, som tåler effekter som er mer merkbare i sub-terskel-domenet. Effekter som en økt følsomhet for prosessvariasjoner og redusert on-to-off strøm-ratio som kan påvirke krets-funksjonaliteten negativt. Mange applikasjoner som energihøsting i mm-skala noder og biomedisinsk utstyr ønsker pålitelige og effektive integrerte kretser som har så lav forsyningsspenning som mulig. For tiden må ladepumper/spenningsomformere, som har dårlig effektivitet, brukes for å konvertere utgangsspenningen til en energihøster til et høyere spennings-nivå som resten av kretsen bruker. Denne kostbare konverteringen kan reduseres eller unngås hvis resten av kretsen bruker så lav forsyningsspenning som mulig. Minne brukes ofte i komplekse digitale kretser som mm -skala noder og biomedisinske implantater. Minne okkuperer som regel en stor mengde kretsareal, og er dermed en komponent hvor store strømbesparelser kan oppnås hvis minne er designet for ultralav forsyningsspenning. I denne oppgaven er nettop et slikt minne designet for drift ved sub-100mV forsyningsspenning. Hvor robuste teknikker brukes på transistor til gate -arkitektur abstraksjonsnivå i minnet. Flere klassiske statiske logiske porter er sammenlignet når det gjelder kraft, ytelse, samt layout-område, mot såkalte Schmitt Trigger-strukturer som har vist seg å fungere ved svært lave forsyningsspenninger. Fra standardceller (NAND, NOR og NOT -logiske porter) er det konstruert et enkelt standardcelle basert minne (SCM) som inkluderer strukturer som multipleksere, dekodere, pre-dekodere, klokkeporter og data-flip-flops(DFF). Det konstruerte minnet blir deretter simulert og modellert for å verifisere funksjonell utbytte/yield for en forsyningsspenning på 87 mV . Den komplette SCM som er 1024-bit i størrelse, med 128 forskjellige adresser, som lagrer 8-biter med data på hver adresse, viser god funksjonell chip yield, med nedre grense for yield over $90 \%$. Dette inkluderer en maksimal redundans på 4 . Drift frekvensen er 150 Hz , med et gjennomsnittlig strømforbruk på 6.991 nW . Det Standard-celle baserte minnets totale layout-areal er $338741.1 \mu \mathrm{~m}^{2}$.

## Preface

This thesis is a continuation of a project which ended in the fall of 2021, where a sub-100mV supply voltage memory element in 22 nm FD-SOI process technology is designed and simulated. During the fall of 2021, two small logic gate libraries consisting of Schmitt Trigger logic gates and classic logic gates were simulated and compared at a sub- 100 mV supply voltage using 7 -stage ring-oscillators. Each logic gate library consists of a NAND and a NOT logic gate. From these libraries, memory elements were constructed and simulated. All simulations during that period do not include physical layout effects and parasitic components, thus results are too some degree unrealistic. This continuation project uses the same transistor level schematic architecture of NAND and NOT-gates, sizing methodology of the logic gates in addition to similar 7-stage ring-oscillator gate level schematic architectures and memory element gate level architectures. The current thesis focuses on including physical layout effects and parasitic components to extract more accurate results. Each logic gate library is extended to include another logic gate type(NOR) and a simple standard cell memory is constructed and simulated. As the goal of the project which ended in fall of 2021, was to design memory elements for operation at sub- 100 mV . The goal of this thesis, is the design of a simple standard cell memory which operates at a sub-100mV supply voltage.

## Acknowledgement

I would like to express my gratitude to my supervisors, professor Snorre Aunet and professor Trond Ytterdal, for the periodic status meetings throughout the project. Throughout these interactions, i have gained a large amount of academic advice. And they have always been quick to answer any question i have sent per e-mail. Their continued support and guidance throughout this project have been invaluable.

I also want to thank my family and friends which have motivated me throughout this project which at times felt long and short. For their continued support, checking up on me to see how the good work is going. Thank you all.

## Contents

Abstract ..... iii
Sammendrag ..... v
Preface ..... vii
Acknowledgement ..... ix
Contents ..... xi
Figures ..... xiii
Tables ..... xv
Acronyms ..... xvii
1 Introduction ..... 1
2 Theoretical background ..... 3
2.1 Operating MOSFETs in deep sub-threshold ..... 3
2.1.1 Ultra-low voltage logic ..... 4
2.1.2 Schmitt-Trigger Structures ..... 5
2.1.3 Logic Gates Power consumption and Performance ..... 6
2.2 Integrated Circuit Variability ..... 7
2.2.1 High-Dimensional variation space and Monte Carlo Methods ..... 8
2.2.2 Circuit yield ..... 10
2.2.3 Confidence interval ..... 11
2.3 Fully Depleted Silicon on insulator(FD-SOI) technology ..... 12
2.4 Logic Gates ..... 12
2.4.1 Fan-out and Fan-in ..... 13
2.5 Ring-oscillator circuits ..... 15
2.6 D-flip-flop memory element ..... 17
2.6.1 Setup and hold time ..... 17
2.7 Multiplexers ..... 19
2.8 Decoders ..... 19
3 Sub-100mV Supply Voltage Memory Design and Simulation ..... 23
3.1 Simulation and design environment ..... 23
3.2 FD-SOI device choice ..... 23
3.3 Standard-Cell Based Memory implementation(SCM) and simulation ..... 24
3.3.1 Design of Static Logic gates ..... 24
3.3.2 Design of combinational and memory element circuits ..... 28
3.3.3 Write, Read, Hold -logic and functionality ..... 30
3.3.4 Verification and chip-yield estimation ..... 30
3.3.5 SCM Power and Performance ..... 33
3.4 7-stage ring oscillators ..... 34
3.4.1 Power, Performance extraction ..... 34
3.5 Layout and parasitic extraction ..... 35
3.6 Results extraction temperature, voltage, simulation number and confidence level ..... 36
4 Results ..... 37
4.1 FD-SOI device choice ..... 37
4.2 Logic gate library sizing and minimum supply voltage ..... 38
4.3 Logic gate library layout ..... 38
4.3.1 Logic-gate layout area comparison ..... 42
4.4 Sub-100mV SCM ..... 42
4.4.1 DFF Memory element ..... 42
4.4.2 SCM Layout and Area ..... 44
4.4.3 SCM Power and Performance ..... 49
4.5 7-stage ring-oscillators results ..... 49
4.5.1 Ring-oscillator layout and area ..... 49
4.5.2 Ring-oscillator Frequency, jitter and power consumption ..... 51
4.5.3 Individual logic gate power and performance ..... 56
5 Discussion ..... 61
5.1 Device consideration ..... 61
5.2 Regarding the logic gate sizing methodology ..... 61
5.3 Poisson yield model and critical path methodology ..... 62
5.4 Tools impact on results ..... 63
5.5 Ring-oscillator results accuracy ..... 63
5.6 Logic gate libraries results ..... 64
5.7 SCM discussion ..... 64
5.7.1 SCM comparison to similar circuits ..... 65
5.8 Further development ..... 66
6 Conclusion ..... 67
Bibliography ..... 69
A Additional Material ..... 73
A. 1 Copies from methods chapter ..... 73
A. 2 logic gate library layout designs ..... 76
A.2.1 Layout design of REG logic gate library ..... 76
A.2.2 Layout design of ST logic gate library ..... 79
A. 3 SCM subcircuit location ..... 81
A. 4 Ring oscillator circuit layout figures ..... 82

## Figures

1.1 Mm-scale sensor node from [2] ..... 1
2.1 Current-voltage NMOS curve sketch. ..... 4
2.2 Classic 2-transistor inverter/NOT logic gate schematic and voltage transfer curve(VTC) sketch. ..... 5
2.3 Schmitt trigger inverter/ST-NOT logic gate schematic. ..... 6
2.4 2-D example of failure region and a few samples. ..... 9
2.5 Classic 4-transistor NAND and Schmitt-trigger NAND. ..... 13
2.6 Classic 4-transistor NOR and Schmitt-trigger NOR. ..... 14
2.7 Buffer and Inverter Tree representations. ..... 15
2.8 7-stage ring oscillators constructed from NAND/NOR/NOT -gates ..... 16
2.9 Positive edge triggered data flip flop with its symbol. ..... 17
2.10 Illustration of setup and hold time for a simple positive edge triggered data flip flop. ..... 18
2.11 2-to-1 Multiplexer based on NAND and NOT logic gates. ..... 19
2.12 4-to-16 one-hot decoder. ..... 20
2.13 3-to-8 one-cold decoder. ..... 21
2.14 7-to-128 one-hot decoder. ..... 22
3.1 Simple standard cell-memory symbol ..... 25
3.2 VTC curve which does not reach the logic 1 green range ..... 26
3.3 ST and REG logic gate library, figure copy found in appendix, Fig.A. 1 ..... 27
3.4 Test-bench block diagram ..... 28
3.5 Simple SCM sketch ..... 29
3.6 DFF test-bench ..... 31
3.7 DFF timing-diagram ..... 31
3.8 A reduced critical path, copy found in the appendix Fig.A. 2 ..... 32
3.9 Signal timing diagram, for writing a single logic 1 to bit 0 at address "0000000", then reading this same address ..... 33
3.10 Ring-oscillator gate level schematic ..... 35
4.1 Illustration of threshold voltages ..... 37
4.2 Visual size comparison between ST and REG logic gate library ..... 40
4.3 Layout of the ST-NOT logic gate ..... 41
4.4 Layout design of the memory element DFF, measuring $35.964 \mu \mathrm{~m}$ in width and $5.546 \mu \mathrm{~m}$ in height, the layout design is here rotated 90 degrees clockwise ..... 43
4.5 Layout design of a smaller part of the memory column ..... 46
4.6 Layout design of the 7-to-128 WAD and a single memory column ..... 47
4.7 Completed layout design of the Standard Cell Memory, measuring $443.559 \mu \mathrm{~m}$ in width and $763.689 \mu \mathrm{~m}$ in height. ..... 48
4.8 The layout design of the six ring-oscillators ..... 50
4.9 PDP and EDP histogram for each logic gate at a supply voltage 100 mV ..... 58
4.10 Histogram showing the change in the ST logic gates PDP and EDP from 87 mV to 100 mV ..... 59
A. 1 ST and REG logic gate library ..... 74
A. 2 A reduced critical path ..... 75
A. 3 Layout design of REG-NOT logic gate, width and height of the cell is $1.246 \mu \mathrm{~m}$ and $2.918 \mu \mathrm{~m}$ ..... 76
A. 4 Layout design of REG-NAND logic gate, width and height of the cell is $2.252 \mu \mathrm{~m}$ and $4.634 \mu \mathrm{~m}$ ..... 77
A. 5 Layout design of REG-NOR logic gate, width and height of the cell is $2.252 \mu \mathrm{~m}$ and $5.898 \mu \mathrm{~m}$ ..... 78
A. 6 Layout design of ST-NOT logic gate, width and height of the cell is $2.988 \mu \mathrm{~m}$ and $2.618 \mu \mathrm{~m}$ ..... 79
A. 7 Layout design of ST-NAND logic gate, width and height of the cell is $4.822 \mu \mathrm{~m}$ and $2.928 \mu \mathrm{~m}$ ..... 80
A. 8 Layout design of ST-NOR logic gate, width and height of the cell is $4.82 \mu \mathrm{~m}$ and $3.288 \mu \mathrm{~m}$ ..... 80
A. 9 Figure showing approximate location of the WAD, the eight memory columns and the read address data flip-flops ..... 81
A. 10 Layout design of REG-NOT based 7-stage ring-oscillator, width and height of the layout is $7.282 \mu \mathrm{~m}$ and $2.918 \mu \mathrm{~m}$ ..... 82
A. 11 Layout design of REG-NAND based 7-stage ring-oscillator, width and height of the layout is respectively $14.324 \mu \mathrm{~m}$ and $4.629 \mu \mathrm{~m}$ ..... 82
A. 12 Layout design of REG-NOR based 7-stage ring-oscillator, width and height of the layout is respectively $14.324 \mu \mathrm{~m}$ and $5.898 \mu \mathrm{~m}$ ..... 83
A. 13 Layout design of ST-NOT based 7-stage ring-oscillator, width and height of the layout is respectively $19.476 \mu \mathrm{~m}$ and $2.618 \mu \mathrm{~m}$ ..... 83
A. 14 Layout design of ST-NAND based 7-stage ring-oscillator, width and height of the layout is respectively $32.314 \mu \mathrm{~m}$ and $2.928 \mu \mathrm{~m}$ ..... 83
A. 15 Layout design of ST-NOR based 7-stage ring-oscillator, width and height of the layout is respectively $32.3 \mu \mathrm{~m}$ and $3.32 \mu \mathrm{~m}$ ..... 83

## Tables

2.1 NAND and NOR function output for two inputs, IN. 1 and IN.0. ..... 14
4.1 ..... 38
4.2 Worst case logic gate configuration table ..... 39
4.3 ..... 42
4.4 Layout area of the six ring-oscillators ..... 49
4.5 Table showing oscillating frequency of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout in addition to the difference in Hz between the pre-layout simulation to the post- layout simulation ..... 51
4.6 Table showing jitter of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout simulation in addition to the dif- ference in Hz between the pre-layout simulation to the post-layout simulation ..... 52
4.7 Table showing average power consumption of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout sim- ulation in addition to the difference in W between the pre-layout simulation to the post-layout simulation ..... 52
4.8 Table showing oscillating frequency of the six logic gate based ring- oscillators at 100 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation ..... 53
4.9 Table showing jitter of the six logic gate based ring-oscillators at 100 mV , for pre and post -layout simulation in addition to the dif- ference in Hz between the pre-layout simulation to the post-layout simulation ..... 54
4.10 Table showing average power consumption of the six logic gate based ring-oscillators at 100 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simula- tion to the post-layout simulation ..... 55
4.11 Delay of each logic gate at 100 mV , computed from Table.4.8 ..... 56
4.12 Average power consumption of each logic gate at 100 mV , computed from Table.4.10 ..... 56

4.13 Delay of the ST logic gates at 87 mV supply voltage, computed from
Table.4.5 ..... 57

4.14 Average power consumption of the ST-logic gates at 87 mV , com
puted from Table.4.7 ..... 57
5.1 Comparison between this thesis SCM and 8x8-bit multiplier from[6] 65

## Acronyms

$V D D_{\text {min }}$ minimum supply voltage. 24

BOX Buried oxide scheme. 12

CMOS Complementary Metal-Oxide-Semiconductor. 12

DFF data-flip-flop. 17
DRC Design Rules Check. 36

EDA Electronic design automation. 23
EDP Energy-delay-product. 7

FBB Forward back biased. 12
FD-SOI fully depleted silicon on insulator. 12
FOM Figure Of Merit. 7

GND Ground. 4

IC Scaled Sigma Sampling. 11
IoT Internet of things. 1

LVS Layout Versus Schematic. 36

MC Monte Carlo. 9
MOSFET metal-oxide field-effect transistor. 2
MUX Multiplexer. 19

NAND Not AND. 12
NOT Not/inverting. xiii, 6

PDP Power-delay-product. 7

RBB Reverse body biased. 12
RCA Ripple-Carry Adder. 7

SCM Standard-cell based memory. 24
SoC System on Chip. 2
SPICE Simulation Program with Integrated Circuit Emphasis. 23
SRAM Static Random-Access Memory. 24
SSS Scaled Sigma Sampling. 10
ST Schmitt Trigger. 12

ULV Ultra-low voltage. 17

VDD Supply voltage/positive rail voltage. 4
VSS Ground/negative rail voltage. 4
VTC Voltage-transfer-curve. 5

## Chapter 1

## Introduction

In electronics the ongoing demand for reduction in energy consumption has motivated the design of sub-threshold digital circuits. As shown in [1] lowering supply voltage and operating transistors in the sub-threshold domain shows orders of magnitudes in reduction in energy and power consumption compared to operation at nominal supply voltage. As such the discussion around sub-threshold operation mostly focuses on circuits optimized for energy efficiency. However, there are applications where functionality at supply voltage as low as possible is advantageous. Such as mm-scale sensor nodes utilizing energy harvesters and biomedical devices and implants.


Figure 1.1: Mm-scale sensor node from [2]
Mm-scale or sub-centimetre nodes are electronic devices which are self-contained, small and usually part of some sort of wireless communication network such as an IoT[3] network. The common trait being the small device area as can be seen in Fig.1.1 from [2], where the mm-scale node is smaller than a rice grain. The small device area in addition to being self-contained means that supplying power to the device becomes a problem. Which is commonly solved by employing an energy harvesting component in the device. Current mm-scale nodes usually need voltage level converters or charge pumps as the rest of the circuit runs at a higher voltage than what the energy harvester itself provides. Thus, efficiency is lost in
conversion between voltage domains. In such applications the minimum supply voltage the circuit requires, often defines the instant when an active operation can start[4], which makes minimization of the supply voltage even at the cost of additional area or power worthwhile.

This idea extends to biomedical implants and devices. While large networks of mm -scale nodes will face a great difficulty if rechargeable batteries provide the power, and thus, needed to be recharged at some point. Implants that run on batteries are very intrusive, as the user of the device also need to recharge the implant at some point. Energy harvesting could then provide reliability if the device harvest energy from the immediate environment. In addition, circuitry running at ultra-low voltage ensures that if something went wrong with the device, the maximum charge delivered to the body might be unable to affect the body or its important cells. Studies like [5] shows that small and weak electrical currents have an impact on the healing of cells in the body.

Test circuits such as the $8 \times 8$-bit multiplier from [6] shows that sub-threshold digital circuits can operate at sub-100mV supply voltage. While an $8 \times 8$-bit multiplier is a circuit that can in all likelihood be used in applications like mm-scale nodes and biomedical implants, memory arrays might be more crucial circuits that takes up a large amount of silicon real-estate in current SoC's [7]. It is almost impossible to make complex digital SoC's without memory. Which would make it beneficial to investigate if sizeable memory arrays are feasible at sub-100mV supply voltage. Memory needs to be a reliable component, which is a hard task to achieve at ultra-low voltages, as most current MOSFET process technologies have an increased sensitivity to process variation at sub-threshold supply voltages[8].

The underlying goal of this thesis project is to study, develop and apply concepts in ultra low voltage design, by constructing a small memory which operates at sub-100mV supply voltage. This includes choosing and constructing promising transistor-level architectures to base the memory on. Construct and extract physical layout effects of the memory, which can be used to verify that a real produced memory will operate at a sub-100mV supply voltage. Such an approach is not new, however to the authors knowledge there has yet to be a small memory array that operates at as low of a supply voltage as sub-100mV.

In chapter.2, important theory and essential principles which the next chapters refer to can be found. Chapter. 3 describes the implementation and methodology used to collect results. Chapter. 4 contains the results of the implementation and methodology from the previous chapter. Finally, chapter. 5 contains discussion on the results, implementation choices and methods presented in this thesis.

## Chapter 2

## Theoretical background

### 2.1 Operating MOSFETs in deep sub-threshold

Decreasing the supply voltage of integrated circuits below the threshold voltage of MOSFET devices has been known of since the late 1960's[9]. This domain is usually referred to as the sub-threshold or weak inversion domain, where power and energy consumption can be reduced by several orders of magnitude[1][10]. Along with reduction in energy and power, there is often a performance decreases as supply voltage is lowered, which is usually not an issue for sub-threshold circuit applications. However, sub-threshold circuits have an increased sensitivity towards process variation[11]. Fig. 2.1 shows a sketch of an N-type MOSFETs drain current versus gate-to-source voltage. The sub-threshold domain can be regarded as where supply voltage is lower than the MOSFET device absolute threshold voltage. In the characteristic current voltage sketch, this would be where $V_{G S}<V_{T}$. In the figure, when $V_{G S}>V_{T}$ which is the Above $-V_{T}$ range, $I_{d s}$ is for a certain range almost constant, which is very useful for MOSFET logic gates and other constructions that require a steady max current. By decreasing the supply voltage into the sub-threshold domain or $S u b-V_{T}$ range, the current becomes rather exponential. There is no longer a range in which $V_{G S}$ can provide a very steady current. For logic gates operating in this domain this is bad news as decreasing the supply voltage below the threshold voltage diminishes the on-current of the gate almost exponentially. Thus, the ratio of on to off current can become problematic for number of commonly occurring logic and MOSFET circuits.

Analytical expressions such as the equation for the MOSFET drain current in Equation. 2.1 (from [6]), reveal how certain device parameters and voltage variations affect the device.

$$
\begin{equation*}
I_{D s u b}=I_{0} \cdot \exp \left(\frac{V_{G S}-V_{T}-\eta V_{D S}}{n V_{t h}}\right)\left(1-\exp \left(-\frac{V_{D S}}{V_{t h}}\right)\right) \tag{2.1}
\end{equation*}
$$

Where $I_{0}$ is a summarizing factor setting the transistor current strength, $V_{G S}$ is the gate to source voltage, $V_{T}$ is the threshold voltage, $V_{D S}$ the drain to source


Figure 2.1: Current-voltage NMOS curve sketch.
voltage, $\eta$ the DIBL coefficient, $n$ is the subthreshold ideality factor and finally thermal voltage which is $V_{t h}=\frac{k T}{q}$. This equation shows that there is an exponential dependency between the threshold, terminal voltages and the current in the sub-threshold domain.

### 2.1.1 Ultra-low voltage logic

Currently, digital circuits depend on the fundamental principle of encoding logic variables in voltage. Most commonly, 0 and 1 or false and true are used, as the mathematical branch of Boolean algebra is analogous to the behaviour of certain types of electrical circuits, such as the switch. This enables complex mathematical expressions to be constructed using MOSFETS acting as switches. While encoding logic 0 and 1 in voltage might be done easily, there are certain characteristics of the MOSFET in addition to architectures which can affect these voltages. Commonly, rail voltages are used, such as VDD(supply voltage) and GND/VSS(ground voltage). Where VDD represents logic 1 and GND/VSS represents logic 0 .
As section.2.1 introduces, digital circuits operating in sub-threshold might have a diminished on-to-off current ratio, which affects the circuits negatively. For instance, lets consider the classic 2-transistor inverter seen in Fig.2.2, comprised of a N-type, and P-type MOSFET, delivering a max current trough the active block to fan-out logic. The off-current(leakage) of the complementary block, might then
be too high, which results in the output voltage levels of the inverter starting to deviate from the ideal rail voltages(such as VDD/GND)[6]. Connected fan-out logic might then, no longer correctly interpret the logic levels resulting in circuit failure. The voltage-transfer-curve(VTC) sketch of an inverter seen in Fig.2.2 represents this graphically, as the $V_{o}$ curve needs to be within the green logic 1 and logic 0 voltage ranges for some $V_{I}$ voltages.


Figure 2.2: Classic 2-transistor inverter/NOT logic gate schematic and voltage transfer curve(VTC) sketch.

Mitigation of output logic level degradation/output voltage deviation is central to this thesis, and one of the main reasons for choosing certain transistor architectures and techniques in designing sub-threshold logic gates for memory. Source.[12] shows that at room temperature, the minimum theoretical allowable supply voltage of a classical 2-transistor inverter is approximately:

$$
\begin{equation*}
2 \ln (2) k T / q=36 m V \tag{2.2}
\end{equation*}
$$

While the classic 2-transistor inverter might have a very low theoretical supply voltage, other architectures have been proven to function in silicon at very low supply voltages, such as in [6] which reports $V D D_{\min }=62 \mathrm{mV}$ and [13] which reports $V D D_{\min }=76 \mathrm{mV}$. Both sources utilize a special transistor architecture in regards to the basic logic gates. So-called ST-structures or Schmitt-Trigger structures which mainly utilize a specific technique dubbed "leakage quenching" by [6].

### 2.1.2 Schmitt-Trigger Structures

The Schmitt-trigger inverter has peculiar properties that are effective in sub-threshold operation. Considered a very versatile component for both analog and digital applications[14], the Schmitt-trigger inverter(Fig.2.3) has a theoretical minimum
supply voltage of $2 \ln ((8+\operatorname{sqrt}(73)) / 9) k T / q=31.5 \mathrm{mV}$ at 300 K , reported by [15]. Source.[14] does report the theoretical hysteresis limit being $\cong 75 \mathrm{mV}$ supply voltage, for the 6-transistor architecture seen in Fig.2.3.


Figure 2.3: Schmitt trigger inverter/ST-NOT logic gate schematic.
The Schmitt-trigger has the leakage quenching property. Seen in Fig.2.3, the two transistors P2 and N2 are key. When input $V_{I}$ transitions from high to low, the output will invert, transitioning from low to high. At the same time, as the output voltage rises, the voltage at node VZ, will also increase. And the leakage current passing through N1 should be lowered. Like N2, P2 has the same function of leakage quenching, which can be seen as a positive feedback operation. The fundamental requirement of a ST-structure is the existence of a middle node within the P-type and N-type blocks which can be tied to the required voltage for leakage quenching.

### 2.1.3 Logic Gates Power consumption and Performance

In the current "Nanoera"[11], static/leakage power consumption has become of the highest concern, as leakage current increases as technology nodes gets smaller and smaller. This concern is vital for sub-threshold circuits, as such circuits usually have a low performance and activity factor. Big circuits usually utilize sleep-
ing schemes, where large parts of the circuit is inactive for long periods of time. Meaning that the static power consumption has a higher contribution in the total power consumed in a digital circuit. Below, equations(from [16]) for the power consumed in a digital circuit is shown:

$$
\begin{gather*}
P_{\text {Total }}=P_{\text {Static }}+P_{\text {Dynamic }}  \tag{2.3}\\
P_{\text {Dynamic }}=\alpha \cdot f \cdot C_{\text {tot }} \cdot V_{d d}^{2}  \tag{2.4}\\
P_{\text {Static }}=I_{\text {leak }} \cdot V_{d d} \tag{2.5}
\end{gather*}
$$

The total power consumption by a circuit can be explained as the energy dissipated by the circuit per time and is usually understood as the current drawn by the circuit from the power source times the supply voltage of the circuit: $P_{\text {Total }}=$ $I_{\text {source }} \cdot V_{D D}$. However, for digital circuits, Equation.2.3, might be more descriptive as the main contributors to the total power dissipated is dynamic and static power consumption[16]. Where dynamic power consumption can be seen as the power consumed when the circuit switches, while static is the power consumed when the circuit does not switch. This is seen in their respective Equation.2.4 and Equation.2.5. Where the dynamic power consumption $P_{\text {Dynamic }}$ is a function of the activity factor $\alpha$ of the circuit, operating clock frequency $f$, total switched capacitance $C_{\text {tot }}$ and supply voltage. While static power consumption $P_{\text {Static }}$ is a function of the total leakage current and supply voltage. Together they are the main contributors to total power consumption of a circuit. From Equation.2.3 we can also define the total energy consumed, by defining $t_{c l k}$ as a clock period:

$$
\begin{equation*}
E_{t o t}=P_{t o t} \cdot t_{c l k}=I_{l e a k} V_{d d} t_{c l k}+\alpha C_{t o t} V_{d d}^{2} \tag{2.6}
\end{equation*}
$$

Lowering of supply voltage will lower the total energy consumed tremendously as the dynamic power consumption shown in Equation. 2.6 contains a squared term of supply voltage. Source.[10] indicates while [11] shows for a 32-bit RCA circuit, that the longest delay path does not outweigh the lowering of power consumption before the sub-threshold domain. A balanced view can be made between the conflicting metrics that is power and performance, as stated by [11], using PDP and EDP -figures of merit(FOMs). Seen below are the equations for PDP and EDP respectively.

$$
\begin{align*}
& P D P=P_{\text {tot }} \cdot T_{\text {delay }}  \tag{2.7}\\
& E D P=P D P \cdot T_{\text {delay }} \tag{2.8}
\end{align*}
$$

### 2.2 Integrated Circuit Variability

As MOSFET device scaling reaches beneath the sub-100 nano-meter threshold, certain non-ideal factors effecting MOSFET devices becomes stronger. These main factors are process, voltage, and temperature(PVT for short). Although certain

MOSFET process technologies can mitigate PVT variations to a certain extent. PVT variations are still a big problem in manufacturing of integrated circuits. However, depending on what type of circuit is designed, such variations can have little effect on the circuit. Voltage and temperature being parameters that define the safe operating range of the circuit(where the circuit operates with full functionality). As can be seen from equation.2.1, temperature and voltage relate exponentially to the drain current, which is something that needs to be considered if the device/circuit might operate at a different than nominal temperature and voltage. Process variability however is a bit different from voltage and temperature variability, as it is directly correlated to the device manufacturing process. It is related to the inaccuracy in the process parameters control and non-uniformity of equipment which affects different attributes of a MOSFET, such as gate length, width, oxide thickness, etc, across wafer and from wafer to wafer. One thing to discern in this thesis when it comes to referring to process variability is how it relates to global and local variation. As process variability contains two distinct terms, process variation which refers to global variation and mismatch variation which refers to local variation. This means that process variation enacts global variation or wafer to wafer variation while Mismatch enacts variations within one wafer. There are certain steps during a design process which can be done to better mismatch variation. Work done by [17] shows that the variance of threshold voltage mismatch in close or adjacent transistors have a relation to the transistors gate width and length. Referred to as Pelgrom's law which shows that the standard deviation of the measured threshold voltage gaussian distribution decreases with an increase in transistor gate area. Which in essence means that one can trade increased area for less mismatch. Process variation(global variation) is more dependent on the foundry accuracy and strategy. Which have enabled new technology nodes as small as 22 nm . As such both process and mismatch variation need to be accounted for during design of a circuit.

### 2.2.1 High-Dimensional variation space and Monte Carlo Methods

Process variability can be very random/nonlinear. Thankfully the randomness of process variability is such that in most cases, if one measured a parameter Z affected by process variability over many produced devices, one would find that the sampled parameter Z has a gaussian or normal distribution. As such it is very common to use the Monte Carlo analysis and similar analysis when verifying design. The parameter can have a different nonlinear mapping such as Poisson or log-normal-distribution but suppose that the M-dimensional vector seen in Equation.2.9[18] contains all independent random variables with the joint $\operatorname{PDF}$ (probability density function) $f(x)$. We would then want to sample from this PDF to obtain a probability function defining the failure rate, which can be represented mathematically in Equation.2.10[18] or alternatively Equation.2.11[18].

$$
\begin{equation*}
x=\left[x_{1} x_{2} \cdots x_{M}\right]^{T} \tag{2.9}
\end{equation*}
$$

$$
\begin{gather*}
P_{f}=\int_{\Omega} f(x), d x  \tag{2.10}\\
P_{f}=\int_{-\infty}^{+\infty} I(x) \cdot f(x), d x \tag{2.11}
\end{gather*}
$$

Where $\Omega$ represents the failure region in the variation space, or where the design does not meet the specification. Monte Carlo analysis would estimate $P_{f}$ by drawing N random samples from $\mathrm{f}(\mathrm{x})$ and then compute the mean of the indicator function $\mathrm{I}(\mathrm{x})$ based on these samples as seen in Equation.2.12[18]. Problem is that, when MC(Monte Carlo) is used to estimate an extremely small failure rate, most random samples will not fall into the failure region and thus a large number of samples are needed to accurately estimate the probability function. A graphical sketch example can be seen in Fig.2.4. Where the failure region is far away from the origin area (area where most of the samples lie), and the amount of samples(circles) needed can be substantial for there to be one sample which ends up in the failure region.

$$
\begin{equation*}
P_{f}^{M C}=1 / N \cdot \sum_{n=1}^{N} I\left[x^{(n)}\right] \tag{2.12}
\end{equation*}
$$



Figure 2.4: 2-D example of failure region and a few samples.
Importance sampling can fix the first of the problems, as the method of operation is in principle to sample a distorted PDF $f(x)$, so that most random samples fall into the failure region $\Omega$ thus giving a high accuracy with a low amount of samples[18]. However, as designs get bigger, the dimensionality of the variation space can increase which as well means there could be thousands of nonlinear variables contributing to the variation space. Scaled Sigma Sampling(SSS)[18],
a statistical sampling method, which derives its model from the theorem of soft maximum seem to solve both problems. The principle behind SSS, is that, given the aforementioned $f(x)$ for the M dimensional vector X in Equation.2.9. Scale up the standard deviation of $X$ by a factor $s(s>1)$. Which will result in a new PDF $g(x)$ which spreads over a larger region. Thus, the probability chance that a sample reaches a faraway failure region increases.

### 2.2.2 Circuit yield

An extremely useful advantage memory circuits have versus other integrated circuits, is the fact that memory cells are highly replicated within the component. Memory cells are usually designed to a standard above 6 sigma, which translates to an approximate failure rate of 1 ppb . Equivalent sigma can be estimated from Equation. 2.13 where $\Phi^{-1}$ is the inverse of the standard normal cumulative distribution function[19] and $P$ is the probability. Suppose that a memory consisting of 1 million memory cells were to be produced, and that the wanted chip yield was $99.9 \%$. That means that for every 1000 chips of this memory, one would have 1 or more memory cell failures. Assuming that the memory cells are independent, the failure rate of each individual cell can be found from Equation.2.14[19] as $1 * 10^{-9}$ or 1 ppb . One could also calculate from cell-yield to approximate chip-yield, as $0.999999999^{1000000}=0.999$.

$$
\begin{gather*}
h=-\Phi^{-1}(P)  \tag{2.13}\\
F_{\text {cell }}=1-\left[1-F_{\text {chip }}\right]^{1 / N} \tag{2.14}
\end{gather*}
$$

Usually, redundancy is a part of a memory design, whether it be columns, banks or single cells that can be "replaced"/neglected purposefully in order for there to be a higher chip yield. Equation.2.15[19] and 2.16[19] shows a model for the total failure probability of a chip, which can be approximated as Equation.2.17[19] when the number of cells in the array is large.

$$
\begin{gather*}
F_{c h i p}=1-\left(F_{k, 0}+F_{k, 1} \cdots F_{k, r}\right)  \tag{2.15}\\
F_{k, r}=C_{k}^{N} F_{\text {cell }}^{k}\left(1-F_{\text {cell }}\right)^{N-k} \tag{2.16}
\end{gather*}
$$

$C_{k}^{N}$ is the number of combinations of k cells in an array of N cells. Equation.2.17 is the so called Poisson yield model[19], where $\lambda=F_{\text {cell }} \cdot N$.

$$
\begin{equation*}
F_{c h i p}=1-\sum_{k=0}^{r} \frac{\lambda^{k} e^{-\lambda}}{k!} \tag{2.17}
\end{equation*}
$$

While memory cells in a memory are usually the first component to fail under the effects of process variability. The connected overhead, which might consists of
different components assisting read, write, and other operations of a memory array, does affect the probability of which memory cell that fails first. A fundamental question regarding variability in memory design is how much "design margin" is enough without "over designing". As an example, suppose that a memory contains a structure where there are 10 memory cells connected to a single line address decoder. And this "10 memory cell to 1 line address decoder" -structure is replicated 1000 times within the memory. Logically the worst memory cell in a memory is usually not associated with the worst-case line address decoder. Thus, the "worstworst" case might be overly conservative when estimating circuit yield. In addition, running highly efficient SSS on a chip that consists of thousands of memory cells might be really time intensive. Sources.[20], [21] and [22] report similar reduction methodologies that seem to answer these questions to some extent. These methodologies depend heavily on the fact of high component replication within memories. That is memory elements are usually divided into columns in a memory where each column is almost identical in terms of connection to connected overhead.

### 2.2.3 Confidence interval

A confidence interval is a range of estimates on a limited population where within which the true value almost certainly lie[19]. Usually the sample population is limited, thus any estimate on a parameter will have some error compared to the true population statistic. The confidence interval is associated with a confidence level and confidence limit/interval. Commonly, 95\% confidence level is used, but what confidence level to be used, entirely depend on population application. There are many ways to compute the confidence interval for a certain confidence level, depending on what the distribution of population is. As an example, pass/fail specification which is common in IC design, also called Bernoulli trials, can be computed using the adjusted-Wald formula[23] seen in Equation.2.18. Where $p^{\prime}$ is seen in equation.2.19, $Z_{\alpha / 2}$ is the Z-score, which is 1.96 for $95 \%$ confidence intervals. As an example, if we assume that 1000 out of 1000 manufactured circuits have passed a specified trial, we will then have an adjusted-Wald confidence interval of [99.5383\%,100\%]. Meaning that, although 1000/1000 passed, the true pass-rate might be as low as $99.5383 \%$ given the confidence level.

$$
\begin{gather*}
p^{\prime} \pm Z_{\alpha / 2} \sqrt{\frac{p^{\prime}\left(1-p^{\prime}\right)}{(N+4)}}, \\
p^{\prime}=\frac{\text { numberofsuccesses }+2}{\text { numberoftrials }+4} \tag{2.19}
\end{gather*}
$$

### 2.3 Fully Depleted Silicon on insulator(FD-SOI) technology

A process technology with close ties to CMOS, FD-SOI or fully depleted silicon on insulator technology might be very useful for circuits operating in the subthreshold region. FD-SOI technology utilizes a buried oxide scheme(BOX)[24], which separates the silicon junction from the substrate using an electrical insulator, typically made from silicon dioxide or sapphire. This BOX scheme has several advantages over the typical CMOS process technology. The addition of the insulator reduces capacitance's, enabling higher maximum frequency at similar technology nodes. In addition, isolated wells beneath the fabricated device can be designed, which remain electrically isolated [25]. This means that the individual transistors threshold voltage can be tuned by exploiting the body effect of the MOSFETs. This well beneath the individual transistors are called "backgate" or fourth terminal. FD-SOI like common CMOS bulk process is divided into mainly two types, that is P and N -type MOSFETs. Further division can be made as the process technology offers flipped well and conventional well[26]. There is a catch, since the back-gate for flipped well devices are not well suited to be reverse back biased (RBB). Contrary to this, the conventional well devices can be both forward(FBB) and reverse back biased(RBB). Forward back biasing(FBB) can be very beneficial for increasing performance when supply voltage to a circuit is lowered. While reverse back biasing can reduce the off current of the device by raising the threshold voltage[26]. In addition, back gate biasing opens the possibility for adaptive body biasing[27] to improve the robustness of a circuit.

### 2.4 Logic Gates

From the Schmitt-Trigger Inverter introduced in section.2.1.2, other ST-structures can be made, utilizing the "leakage quenching" effect explained in section.2.1.2. Fig. 2.5 shows the transistor-schematic of the classical 4-transistor NAND gate and its ST-structure equivalent. Fig. 2.6 shows the transistor-schematic of the classic 4transistor NOR gate and its ST-structure equivalent. When designing ST-structures based on simple logic gate functions, the positive feedback MOSFETs, N2 and P2, need to be connected to the middle-nodes as seen in Fig.2.62.5. The other connected N-type MOSFETS needs to be conducting when a low value is required, and non-conducting otherwise[6]. The same is true for the P-type MOSFETS connected to the middle node during a high value. This is to ensure correct operation and to avoid shorts. Compared to the classical logic gates, the ST-structures introduce a new leakage path through the N2 and P2 MOSFETs. This new leakage path affects the output voltage level deviation. However, as source.[6] suggests, STstructures might be superior to classical logic gate architectures, since the output voltage level deviation of classical architectures are dependent on the strength between the N-type and P-type MOSFETs. The ST-structures on the other hand,


Figure 2.5: Classic 4-transistor NAND and Schmitt-trigger NAND.
utilize leakage quenching, trading output level deviation with leakage quenching efficiency. It is then suggested that sensitivity towards global variation is reduced since ST structures output voltage level deviation primarily depend on the relative strength of two NMOS or two PMOS blocks, instead of the relative NMOS to PMOS strength[6].

The ST-NOT(Schmitt-Trigger NOT), ST-NAND(Schmitt-NAND) and ST-NOR(SchmittNOR) architectures seen in Fig.2.3,2.5,2.6, are in essence constructs that apply a Boolean function to its input signals to decide its output. NAND, NOR and NOT are basic Boolean logic functions, seen in Fig.2.1 the NAND, NOR Boolean functions truth table is shown. If the input is " 11 "(IN. $1=1, \mathrm{IN} .0=1$ ) that is logic 1 on terminal A1 and logic 1 on terminal A0, then the NAND logic gate will output a logic 0 while the NOR logic gate will output a logic 1 . The NOT is not included as the function just inverts the input given, that is if the input is logic 1 then the output is logic 0 , while logic 1 if the input is logic 0 .

### 2.4.1 Fan-out and Fan-in

Output voltage level deviation caused by leakage in structures has been introduced in section.2.1.1. This problem is also exacerbated if fan-in or fan-out of the


Figure 2.6: Classic 4-transistor NOR and Schmitt-trigger NOR.
circuit is high[28]. Thus, fan-in and especially fan-out of the logic gates need to be kept as low as possible for output voltage deviation to be reasonable. Buffers or inverter trees are commonly used at the architectural abstraction level of a design in the case that a high fan-out is needed, such as in clock tree circuits. Where one clock signals need to be propagated to perhaps thousands of different structures in an integrated circuit. Fig. 2.7 shows a buffer and inverter tree where both the buffer and inverter tree are constructed from chaining NOT gates.

|  |  |  |  |
| :--- | :--- | :--- | :--- |
| IN. 1 | IN. 0 | NAND | NOR |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |

Table 2.1: NAND and NOR function output for two inputs, IN. 1 and IN.0.


Figure 2.7: Buffer and Inverter Tree representations.

### 2.5 Ring-oscillator circuits

Ring-oscillator circuits are useful constructs for benchmarking/comparing logic gates in terms of performance and power consumption. Ring-oscillator circuits consist of an odd number of 3 or more inverting stages. Ring-oscillators are selfoscillating constructs, commonly used in digital circuits for many different purposes and applications. Source. [10] use ring-oscillators to investigate power versus performance in the sub-threshold domain.

The 7 -stage ring-oscillators seen in Fig.2.8, is made up of seven NOT/NAND/NOR -logic gates connected in series. The included time-varying voltage represents the oscillating voltage generated by a ring-oscillator, which transitions between logic 1 voltage and logic 0 voltage. Because ring-oscillators are self-starting, the oscillator has some setup-time, after which the oscillator reaches a max frequency equal to 1 over the total delay path, or:

$$
\begin{equation*}
f_{o s c}=\frac{1}{2 t_{n} N} \tag{2.20}
\end{equation*}
$$

Where $N$ is the number of stages, and $t_{n}$ is the delay of a single stage. The period of the voltage signal will to some degree vary in a random manner. Commonly called jitter, in carefully constructed ring-oscillator circuits, this period can be very small[29].


Figure 2.8: 7-stage ring oscillators constructed from NAND/NOR/NOT -gates.

### 2.6 D-flip-flop memory element

DFF, or data-flip-flops are a type of logic device which is edge triggered, capable of storing one bit of data in its internal latch. Seen in Fig.2.9 a positive edge triggered DFF can be seen, the gate level architecture adapted from[13]. Which consists of six NAND gates and 13 inverters. Where most of the inverters acts as buffers to improve the operation robustness at ULV[13]. A positive edge appearing at the CLK-terminal, will trigger the device into copying the value at the D-terminal, storing the new value. The DFF's stored value can be seen at terminal Q or the inverse at terminal Q_bar. Since the stored value does not change until the next positive clock edge appears at the CLK-terminal, the DFF can act as some sort of memory element. Making it commonly used in many digital circuits where temporary storage of data is needed. Since the DFF is positively edge triggered the maximum frequency at which the output can change is half of the CLK-signal frequency.


Figure 2.9: Positive edge triggered data flip flop with its symbol.

### 2.6.1 Setup and hold time

Like many digital circuits, flip-flops have a characteristic setup and hold-time. The setup and hold time for a data-flip-flop in general defines a continuous time interval in reference to the triggering clock edge where the data signal(D-terminal) needs to be present for the flip-flop to take on the logic value. Fig.2.10 illustrates this as the data-flip-flop will only change Q to that of D if the D signal is present an interval before and after the positive clock edge. If the setup time for the device is too small, then the DFF might not change the output Q, while to small hold time might induce meta-stability in the internal latch, which would either cause a faulty output or increase the propagation delay of the data signal to the output.


Figure 2.10: Illustration of setup and hold time for a simple positive edge triggered data flip flop.

### 2.7 Multiplexers

Multiplexer or MUX, is a type of combinational logic circuit where several select input signals are used in order to choose between multiple input signals. One of the most basic multiplexers are logic gate-based multiplexers such as in Fig.2.11, where the 2-to- 1 multiplexer is based on NANDs and a NOT -logic gate(s). Such a multiplexer uses a select signal S, to choose which input, A or B, should appear at the output. Simple 2 to 1 MUX-es can also be chained to construct larger MUXes. Unfortunately, such MUX-es can suffer from glitches[30], which deteriorate energy consumption.


Figure 2.11: 2-to-1 Multiplexer based on NAND and NOT logic gates.

### 2.8 Decoders

Another combinational logic circuit is the decoder where n input signals are coded to a maximum $2^{n}$ unique outputs. One-hot and one-cold decoders are common types used in more complex digital circuits. A one hot decoder take n inputs and outputs a single logic 1 on one of its $2^{n}$ outputs while keeping the rest as logic 0 . The one cold decoder does something similar, outputting a single logic 0 on an output depending on the $n$ inputs and keeping the rest as logic 1. In Fig.2.12 and Fig. 2.13 a 4-to-16 decoder with low enable and 3-to-8 decoder can be seen, where based on 4 inputs A3,A2,A1 and A0, the 4-to-16 decoder outputs a single logic 1 , keeping the rest as logic 0 . While the 3 -to- 8 decoder outputs a single logic 0 dependent on input A6,A5,A4, while keeping the rest of the outputs as logic 1. The two decoders can be connected in such a way to construct a 7 -to-

128 decoder, as seen in Fig.2.14. Where the 3-to-8 decoder selects which of the 8 different 4-to-16 decoders to enable based on A6,A5,A4. The selected 4-to-16 decoder can then output a single logic 1 signal at one of its outputs depending on signal A3,A2,A1,A0.


Figure 2.12: 4-to-16 one-hot decoder.


Figure 2.13: 3-to-8 one-cold decoder.


Figure 2.14: 7-to-128 one-hot decoder.

## Chapter 3

## Sub-100mV Supply Voltage Memory Design and Simulation

### 3.1 Simulation and design environment

Cadence Virtuoso is used for schematic design, layout, and simulation of integrated circuits. The EDA-tool provides, graphical schematic, graphical layout, and simulation editor. While the graphical schematic is used as a high-level schematic enabling connection between transistor models, the layout tool enables the physical construction of MOSFETS, signal metal routing and extraction of physical layout circuit effects and parasitic components. Running the Spectre simulator, a SPICE class circuit simulator providing different SPICE-analyses such as DC and transient analysis. While DC analysis calculates the DC operating point of circuits, the transient analysis computes a circuits response as a function of time. SPICE(simulation specific program with integrated circuit emphasis) is a general term used for software tools simulating analog and digital circuit behaviour. The cadence simulation editor enables extraction of currents, voltages, and device parameters. In addition to providing more complex algorithms that can compute different figures of merit in relation to signal processing. The simulation editor also provides important Monte Carlo analysis. In this thesis, regular MonteCarlo and Scaled Sigma Sampling statistical sampling methods are used to extract typical power, performance, and functional yield of the circuits under process and mismatch variation.

### 3.2 FD-SOI device choice

FD-SOI was chosen as the process technology provides cutting edge 22 nm technology node in a process with similar characteristics to CMOS bulk process. In addition, as section. 2.3 explains the back biasing scheme opens the possibility for strong reverse or forward back biasing. Thus, a decision was made to route individual biasing nets for N-type and P-type MOSFETS during layout. However,
all results in this thesis are extracted at a no/0V body bias. To choose a fitting device, a threshold voltage analysis was made for the different devices provided by the technology. This analysis consisted of applying a sub-100mV voltage over the drain terminal for the N-type devices, and source for P-type devices. Gate and back gate-terminal was grounded together with the source terminal for N-type and drain for P-type. When the back-gate is pulled to ground(0V) this means that there is no body biasing applied to the device. Preferably a device with a high threshold voltage should be used as section.2.3 explains, raising the threshold voltage of a device lowers leakage giving lower static power consumption.

### 3.3 Standard-Cell Based Memory implementation(SCM) and simulation

There have been several good attempts in constructing ultra-low voltage memories, usually centered around the $200 \mathrm{mV}-300 \mathrm{mV}$ supply voltage range[31][32][30], as good performance in addition to ultra-low power consumption is usually found at these voltages for common CMOS process technologies. Source.[33] reports a very low supply voltage $V D D_{\min }=160 \mathrm{mV}$. However, SRAM bit-cell memory suffers immensely from a diminishing SNM as supply voltage is lowered. Other memory types such as standard-cell memory seems to be more promising since [13] reports $V D D_{\min }=76 \mathrm{mV}$ and [6] reports $V D D_{\min }=62 \mathrm{mV}$ for circuits utilizing flip-flops as some sort of temporary memory. Flip-flop and latch arrays are usually referred to as standard-cell based memories(SCM's), and usually require more area, and power compared to custom SRAM memory arrays at certain sub-threshold supply voltages[30]. To the best of the authors knowledge there has yet to be a small memory array at sub-100mV supply voltage. As such a simple SCM using the DFFarchitecture from [13], is designed for operation at sub-100mV supply voltage. Furthermore, total area, power consumption, performance, and functional yield for the SCM is extracted at a sub-100mV supply voltage. The principle SCM architecture can be seen in Fig.3.5 where there is a separate read and write address bus, in addition to separate input and output bus to the architecture. The SCM symbol can be seen in Fig.3.1. The chosen architecture for the SCM require a few basic standard logic gates, being NAND, NOR and NOT. A small Schmitt-Trigger logic gate library(ST-library) consisting of NAND, NOR and NOT -logic gates, were constructed in addition to a classical logic gate library(REG-library) to later be used as a comparison to their ST-equivalent gates in terms of some figures of merit, which are estimated using 7-stage ring-oscillators.

### 3.3.1 Design of Static Logic gates

Considering earlier work on logic operating beneath 100 mV supply voltage, such as [6] which report operational supply voltages between 62 mv and 84 mV and [13] which have a nominal supply voltage of 90 mV . With the choice of using a process technology with limited literature, finding the absolute minimum supply


Figure 3.1: Simple standard cell-memory symbol
voltage for any constructs/cells, could quickly become time intensive. Therefore, a methodology was used that in essence margined voltage ranges(which in turn margins functional yield). This margining methodology of voltage is primarily applied when sizing the gates. Both process and mismatch variation should be accounted for during sizing such that the maximum output voltage deviation is $0.2 \cdot V D D$ from the ideal rail voltages. The voltage transfer curve(VTC) of Fig.2.2 in section.2.1.1 shows the margin range methodology graphically, as the curve can define a logic 1 and logic 0 within the green ranges. Furthermore, the typical logic gate should, as seen in Fig.2.2, during a transition from logic 1 to logic 0 pass through the trip-point and be symmetric. That is the point where if the input voltage is VDD/2, the output voltage should be VDD/2. If this requirement is fulfilled, the logic gate is considered balanced. Fig.3.2 shows a VTC(voltage transfer curve) transitioning from logic 1 to logic 0 voltage level, which is considered a failure, as the VTC is at no point within the upper green logic 1 range. Thus, any connected fan-out logic will likely understand an output logic 1 voltage level as a logic 0 voltage level.

## Worst-case logic gate configuration

Usually, logic gates have a worst-case configuration. While the simple NOT-logic gate only has one input and one output, NAND and NOR -type logic gates have multiple inputs and as such certain "configurations" will produce a worse output voltage level than others. As section.2.4.1 explains a low fan-in and fan-out is


Figure 3.2: VTC curve which does not reach the logic 1 green range.
beneficial in keeping voltage deviation low. The two constructed NAND, NOR and NOT logic gate libraries utilize this as the NAND and NOR -logic gates have a maximum fan-in of 2 and a chosen maximum fan-out of 1 . While the NOT logic gates have a maximum fan-in of one and a chosen maximum fan-out of 2 . While fan-in is a result of the transistor level architectures chosen, maximum fan-out is chosen as low as it is, because of its relation to output voltage deviation and thus functional yield as mentioned in section.2.4.1. In Fig.3.3 the ST and REG logic gate library can be seen. Structures from both libraries can also be found in section. 2 .

## Logic gates simulation and verification

To determine correct sizing and a supply voltage, realistic input stimulation and output load applied to the device under test would be beneficial. Seen in Fig.3.4 is a block diagram of the test-bench used, where the input and output generation block consist of two inverters in series, also referred to as buffers. Logic gates


Figure 3.3: ST and REG logic gate library, figure copy found in appendix, Fig.A. 1
with multiple inputs had multiple input generation blocks. By applying process and mismatch variation on the test-bench, the input generation blocks and load blocks would then create realistic input voltages and loads. As mentioned in section.3.3.1, each logic gate was sized according to the margining methodology under the effects of process and mismatch variation. Which was done during prelayout simulation, that is without parasitics or layout effects. The specific criterion used to determine the minimum supply voltage for each library, was for the worst logic gate(NAND or NOR or NOT) in each library, configured in worst case configuration, to pass 1000 out of 1000 Monte-Carlo simulations under the specification set by the margining methodology. As mentioned in section.2.2 increasing area of the gates would lower mismatch. Thus, the different logic gates were sized after smallest feature lengths possible to conserve total area and MOSFET capacitance's and still pass the criterion set by the margining methodology. Just 1000 MonteCarlo simulations without parasitics and layout effects means in essence that the functional yield of each gate is not confident. As pre-layout simulation can differ drastically from post-layout simulation. And as such the focus during pre-layout
simulation was for the mean or typical logic gate to exhibit a good typical voltage level deviation rather than finding rare failure events. Resulting gate width and lengths can be found in the results section.4.2.


Figure 3.4: Test-bench block diagram

### 3.3.2 Design of combinational and memory element circuits

Apart from the memory element, the SCM-component needs a few structures to operate functionally. In Fig.3.5 the principle SCM architecture sketch is shown, consisting of a one hot 7 -to-128 write-address decoder(WAD), clock gates, 128-to-1 MUX-es, data flip-flops as read address registers and data flip-flops acting as memory elements. While the architecture shown in Fig. 3.5 shows the principle, the actual complete SCM was designed to hold 1024 bits of data, consisting of 8 rows and 128 lines. Buffers and inverter trees were used to reduce the highest internal fan-out of the circuit as mentioned in section.3.3.1, where the NAND and NOR -type logic gates were chosen to have a maximum fan-out of 1 while the NOTgates have a maximum fan-out of 2 . As such similar fan-out buffers and inverter trees as shown in section.2.4.1 were inserted where appropriate. Usually when the required fan-out is high such as the clock signal which needs to be propagated to many different clock gates and data-flip-flops, inverter trees are used. If the required fan-out of a logic gate was low, such as two or three, buffers would be better as a 1-to-2 fan-out tree needs three inverters while a single buffer only needs two inverters.


Figure 3.5: Simple SCM sketch

### 3.3.3 Write, Read, Hold -logic and functionality

The write logic implemented in the SCM uses a 7 -to-128 one-hot decoder(WAD) and clock gates to select the correct line(row) in the SCM to be written. The 7-to128 one-hot decoder is constructed from the 3-to-8 and 4-to-16 decoders seen in section.2.8. Where the WAD generates a single logic 1 which is fed to a clock gate as seen in Fig.3.5, depending on the write address input bus WAddr[k-1:0](also referred to as WADDR[6:0]). The clock gates, which are based on a DFF, a NOR gate and NOT gate, are used to gate the lines(rows) which are not selected by the WAD. Such that each line stores the current value until the write logic accesses the line again. Separate from the write logic, the read logic of the 1024-bit SCM, is comprised of 8 different 128 -to- 1 multiplexers. These multiplexers are constructed from chaining the simple 2-to-1 multiplexers seen in Fig.2.11, section.2.7. By chaining simple 2 -to-1 multiplexers, the internal fan-out is kept low. The flip-flops seen in the figure use the same d-flip-flop architecture as seen in Fig.2.9, section.2.6. Each read and write takes approximately 2 clock cycles. As when writing to a line selected by the WAD, the clock gate needs the first positive clock edge, and the memory elements on the chosen clock gate line needs the next positive edge to copy the value specified by the DataIn(c) pins(also referred to as DIN[7:0]). When reading a line(row) the RAddr[k-1:0](also referred to as RADDR[6:0]) signals will be saved in read address flip-flops, the first clock edge releases the signals, selecting which line the multiplexers should output. The Highest delay path in this architecture is in the read-path. Where RADDR[0] will have to travel through a maximum of 11 NOT-gates and 14 NAND-gates from its read address flip-flop. Thus, as the read data-flip-flops outputs the select signals to the multiplexers, about one clock period later, the correct line will appear at the output DataOut( $\mathrm{n}: 0$ ) (also referred to as $\mathrm{D}[\mathrm{n}: 0]$ ).

### 3.3.4 Verification and chip-yield estimation

As section.2.2 explains, there are certain challenges when it comes to estimating yield of sizeable circuits under the effects of process and mismatch variation. Bigger circuits might have incredibly high dimensionality. And while algorithms such as Scaled Sigma Sampling can deal with the high dimensionality of large designs, simulation time is still a problem. If one simulation takes 10 minutes, 2200 simulations of Scaled Sigma Sampling might take days, perhaps weeks. There have been certain statistical methods/models proposed, such as the Poisson yield model(section.2.2.2) which might work reasonably well in estimating circuit yield, since memory circuits have an advantage over other kinds of circuits in that they usually have a high amount of component replication within the circuit. As such the functional yield of the memory elements are estimated using a simple test-bench similar to the one explained in section.3.3.1. Fig.3.6 shows the test-bench where the input generation blocks should generate different typical voltages. Data-flip-flops are used as memory elements in the SCM. Thus, functional operation of the memory elements depends heavily on when the trig-


Figure 3.6: DFF test-bench
gering clock edge and data signal appear at the memory element. The test-bench is setup, such that setup and hold time are valid for all simulations of SSS, by using an incredibly low clock frequency and good timeliness of the D-signal. Seen in Fig.3.7 is the timing diagram of the input and output signals, where there are three clock periods, Q and Q _bar signals should flip at every positive edge. The only requirement is that given appropriate input stimuli and output load, the memory element should take on the new value specified by the D-signal after every positive CLK-signal edge has arrived at the d-flip-flop. The most important assumption of the Poisson yield method is the fact that all memory elements are treated as independent. In addition, the typical voltages and loads created by the input generation and output load blocks, are typical. Al-tough Poisson-yield method might produce a good estimation on circuit-yield, a critical path methodology was also employed to verify typical operation of the memory with more typical input and output loads. This would also reveal the highest delay path and thus the operating frequency of the SCM.


Figure 3.7: DFF timing-diagram

## Critical-path methodology

The critical-path methodology used in this thesis for estimating functional chip yield of the 1024-bit memory array takes inspiration from [21] and [22] where a critical path methodology of some sort is utilized to estimate the functional yield for the whole memory on account of a smaller number of components. Seen in Fig.3.8 the critical path chosen can be seen. The main idea of the methodology is to build a model to verify that if write, read, and hold functionality holds for one bit of the array, the rest of the memory elements will have a similar pass rate. Choosing a typical path with a low amount of switching activity to model the whole memory ensures that the simulation computation time is minimal. The circuitry is reduced to just the signal propagation path and immediate connected circuitry. During simulation, the post layout effects, and parasitic components of the logic gates are used and not a complete deconstructed version of the whole memory.


Figure 3.8: A reduced critical path, copy found in the appendix Fig.A. 2

## Critical path simulation

During simulation the write address will switch from WADDR[6:0]="0100010" to WADDR[6:0]="0000000" and back to WADDR[6:0]="0100010", the clock gate will then be enabled and the D0 pin will output a "1". The memory element will then discard the logic 0 that was initially there and store a logic 1 . The read address pin will then go from " 1 " to " 0 " and a path through the 128 -to- 1 reduced MUX will then be open where the memory elements stored value can be observed at the output. Note, the main reason for choosing to simulate bit 0 at address " 0000000 ", is due to the highest delay path as explained in section.3.3.3 where the RAddr(0) signal needs to at maximum travel through 11 NOT-gates and 14 NANDgates from the read address 0 flip-flop(R_reg in Fig.3.8). Thus, a decision was made to first simulate just the propagation delay between read address 0 flip-flop
and the output, and thus choosing an appropriate clock frequency for simulating the whole path. This means that while estimating the clock frequency, process and mismatch variation is only applied to the relevant gates. While estimating actual functional yield for the path, every gate has process and mismatch variation applied. The timing diagram for the simulation is seen in Fig.3.9. Which better show how the different signals should switch in reference to the CLK signal.


Figure 3.9: Signal timing diagram, for writing a single logic 1 to bit 0 at address " 0000000 ", then reading this same address.

### 3.3.5 SCM Power and Performance

The chosen critical path methodology as explained in section.3.3.4 reveals the nominal clock frequency of the SCM. However, power consumption cannot be estimated well using this method. As such a decision was made to extract power and energy consumption at the nominal tt-corner or typical-transistor corner, which should produce results in terms of power and energy which are typical during maximum activity in the SCM. To estimate max write and read energy, maximum switching during the operations are needed. Thus, starting at WADDR[6:0] $=$ "1111111", maximum write energy was estimated by writing DIN[7:0] = "11111111"
to WADDR[6:0] "0000000". Flipping all bits in memory at this typical address and ensuring maximum switching activity in the write logic of the SCM. Maximum read energy was estimated by starting at RADDR[6:0]="1111111", then selecting $\operatorname{RADDR}[6: 0]=$ "0000000", ensuring maximum switching activity in the read logic of the SCM. Write and read energy is estimated using $E_{t o t}=\int_{t 1}^{t 2} I_{\text {Source }} \cdot V_{D D} d t$. Since power is energy expended per second, integrating over the power in the specific time interval t 1 to t 2 , reveals the energy expended in this time interval. Write energy is then estimated during the two-clock period in which there is maximum switching. Similarly, the read energy is estimated during the two-clock period the read logic switches at a max.

### 3.4 7-stage ring oscillators

Using the two logic gate libraries from section.3.3.1, 7-stage ring oscillators were constructed to benchmark power and performance for each logic gate. In total there are six different 7-stage ring oscillators where the configuration of each ring-oscillator is seen in Fig.3.10. By pulling one of the input terminals of a NAND logic gate to supply voltage, it will act as an inverting stage, where if the previous connected NAND outputs a logic 1 it will output a logic 0 . Similarly, the NOR-gate can be made to act as a inverting stage, by pulling one terminal to ground. Thus, 7-stage ring-oscillators can be made from simple NAND and NOR -gates which the Fig. 3.10 show. While there are multiple ways to configure NAND and NOR gate to act as an inverting stage, as the truth table(Fig.2.1) in section. 2.4 shows. It is important in regard to later comparison between ST and REG logic gates, that similar logic gate types used in the respective ring-oscillators have the same configuration. Such as the ST-NAND based ring-oscillator which have the upper terminal pulled to supply voltage, the REG-NAND based ring-oscillator also have the upper terminal pulled to supply voltage.

### 3.4.1 Power, Performance extraction

As section.2.1.3 explains, total power consumption can be estimated as the supply voltage times current drawn by the circuit from the supply source. Cadence virtuoso can extract all transient currents and voltages of a design, as such accurate power and delay can be estimated. The results extracted show the average power consumption, oscillating frequency, and jitter. The average power consumption is the total average power consumed by the circuit. From these results, PDP and EDP(Equation.2.72.8 see section.2.1.3) can be estimated. Which should give an indication on how good power and performance of each logic gate is in comparison to the others. Since the oscillators are constructed from seven identical stages, the average power consumption for each logic gate in an oscillator circuit should be: average power consumption per stage $=$ average power/7. In addition, the delay of each stage can be computed from Equation.2.20, section.2.5. Equation.3.1 shows the re-written form of Equation. 2.20 where delay per stage is


Figure 3.10: Ring-oscillator gate level schematic
estimated depending on $N$ stages and oscillating frequency. By computing average power consumption per stage and delay per stage EDP and PDP can be computed to indicating how good power and performance each gate has.

$$
\begin{equation*}
t_{n}=\frac{1}{2 f_{o s c} N} \tag{3.1}
\end{equation*}
$$

### 3.5 Layout and parasitic extraction

Except for sizing of the logic gate-libraries explained in section.3.3.1, parasitic and layout effects need to be extracted for the logic gates, data-flip-flop, SCM and ring-oscillators. There is a certain accuracy given by the foundry when it comes to simulation without considering effects from physical layout. Which means that the functional yield of a logic gate can be very different from pre and post -layout simulation, that is without and with layout dependent effects. MOSFET model parameters such as threshold voltage and common parasitics such as capacitors and resistors can appear due to signal metal routing and placement of MOSFET devices. A bad physical layout can then result in simulations that are extremely different from pre-layout simulation. As such the layout of the logic gate libraries are done, which enables the physical layout the data-flip-flop, the standard-cell memory and 7 -stage ring-oscillator circuits. By accounting for layout effects and parasitic components during simulation, functional yield approximation using the Poisson yield model and critical path methodology should produce more realistic results. In addition, power, and performance of the SCM and ring-oscillators
should also be more realistic. All physical layouts need to fulfill DRC(Design rules check) and LVS(Layout Versus Schematic) -rules. No transistor chaining is used, to limit length of diffusion effect[34]. All layout designs parasitics and layout effects are extracted at the nominal RC(resistor and capacitor) corner using xAct at 27 degrees Celsius.

### 3.6 Results extraction temperature, voltage, simulation number and confidence level

Temperature, supply voltage, confidence level and simulation number are variables that need to be chosen for the results to be realistic. All results are extracted at 27 degrees Celsius, if not stated otherwise. The ring-oscillator results for the ST based ring-oscillators are extracted at a supply voltage of 87 mV and 100 mV while the REG-based ring-oscillators are extracted at just 100 mV . This is as a direct result of the minimum supply voltage of each library, refer to section.4.2 regarding those results. Thus, 100 mV was chosen as a common operating voltage which would enable comparison between the REG and ST logic gate libraries in terms of power and performance. The SCM related results are extracted at a supply voltage of 87 mV .

Except for the threshold voltage analysis, explained in section. 3.2 and the sizing of the logic gates where Monte-Carlo statistical sampling is used, all other results are extracted using SSS. These SSS results then include lowest and highest recorded sample, and a 95\% mean confidence interval. A 95\% confidence level is used as in [18]. But there are different sources such as [21] and [22] which use a lower confidence level of $90 \%$. The number of simulations/samples used for extracting results affected by process and mismatch are approximately 2200 depending on Cadence Virtuoso "yield verification - autostop" algorithm. Cadence Virtuoso simulation editor require a maximum of 2289 samples to estimate circuit yield higher than 3 sigma with a confidence of $95 \%$. But usually stops around 2218 samples. 2200 simulations are a good amount that should give "ok" accuracy while keeping total simulation time reasonable. As such it is used to extract power and performance results, in addition to verifying the functional yield using Poisson yield model and critical path method explained in section.3.3.4 and 3.3.4. The Confidence Interval which is computed by Cadence Virtuoso on results from a run of Scaled Sigma Sampling, use some form of adjusted confidence interval formula. As an example, if the specification is pass/fail, the formula used is like the adjusted-Wald formula(Equation.2.18). The confidence interval would then be approximately [ $99.7855 \%, 100 \%$ ], for $2218 / 2218$ simulations passing the requirement. This would then mean the highest failure rate could be approximately 1 $99.7855=0.002145$, which is about 2.855997 sigma using the adjusted-Wald formula( Equation.2.13). Cadence virtuoso would compute this to [99.865\%,100\%], which would correspond to a sigma of 2.999977 .

## Chapter 4

## Results

### 4.1 FD-SOI device choice

Threshold voltage analysis on the available devices was done as explained in section.3.2. Fig. 4.1 shows an illustration of the measured absolute threshold voltages of the N and P -type devices available, for certain typical gate width and lengths. Al-tough this might seem a bit arbitrary they gave an indication on what might be a promising low leakage device. The indication $\mathrm{p}[\mathrm{x}]$ and $\mathrm{n}[\mathrm{x}]$ indicate the same series of device, as n 0 and p0 is N-type and P-type MOSFETS of series 0 . While n5 and p5 is of the series which have the highest threshold voltage. n1 and p1 was chosen to be used as this series of device had a reasonably high threshold voltage.


Figure 4.1: Illustration of threshold voltages

|  | ST-NOT | REG-NOT | ST-NAND | REG-NAND | ST-NOR | REG-NOR | Unit |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| N0_W | 530 | 800 | 800 | 2490 | 740 | 1920 | nm |
| N00_W | - | - | 800 | - | 740 | - | nm |
| N1_W | 330 | - | 700 | 2490 | 300 | 1920 | nm |
| N11_W | - | - | 700 | - | 300 | - | nm |
| N2_W | 340 | - | 120 | - | 640 | - | nm |
| P0_W | 360 | 280 | 400 | 400 | 820 | 2250 | nm |
| P00_W | - | - | 400 | - | 820 | - | nm |
| P1_W | 120 | - | 120 | 400 | 240 | 2250 | nm |
| P11_W | - | - | 120 | - | 240 | - | nm |
| P2_W | 120 | - | 120 | - | 120 | - | nm |
| L_c | 60 | 100 | 60 | 100 | 60 | 60 | nm |

Table 4.1

### 4.2 Logic gate library sizing and minimum supply voltage

Using the methodology explained in section.3.3.1, the ST-logic library was found to have a minimum supply voltage of 87 mV , while the REG-logic library was found to have a minimum supply voltage of 97 mV . The NOR-type logic gate in each library is the worst logic gate, thus deciding the minimum supply voltage for each library. Table.4.1 represents a lookup table, where the different logic gates individual MOSFETs gate widths and lengths are shown. The lookup table maps to the logic gates schematics, which can be seen in Fig.3.3 in section.3.3.1 or in the appendix, Fig.A.1. As an example, NO_W which denotes the width of the NO MOSFET in the ST-NOT schematic as being 530 nm , and the common gate length of all MOSFETs in this logic gate as being 60 nm denoted by the L_c row in the bottom of the table. Each library has its own common gate length, where the REG-library has a larger gate length of 100 nm , while the ST-library has 60 nm . Note that the notation of " - " means that there is no corresponding MOSFET in the schematic of such a name. Since the name NOO as an example is only used by a MOSFET in the ST-NAND and ST-NOR logic gates.

In terms of worst-case configuration mentioned in section.3.3.1, see table.4.2 where the individual worst case configurations can be seen. Notation 01/10 means that there is a negligible difference between the output voltage when applying logic 0 to the first input terminal and logic 1 to the other, or if logic 1 is applied to the first input terminal and logic 0 to the other. The VOL column pertains to configuration(s) which produce the worst logic 0 output voltage. While VOH pertains to the configuration(s) producing the worst logic 1 output voltage.

### 4.3 Logic gate library layout

Fig.4.2 shows the layout design of the logic gates in 22 nm FD-SOI. The figure is included for visual comparison and a closer look at each individual logic gate

| Gate | VOL | VOH |
| :--- | :--- | :--- |
| ST-NOT | 1 | 0 |
| ST-NAND | 11 | $01 / 10$ |
| ST-NOR | $01 / 10$ | 00 |
| REG-NOT | 1 | 0 |
| REG-NAND | 11 | $01 / 10$ |
| REG-NOR | $01 / 10$ | 00 |

Table 4.2: Worst case logic gate configuration table
can be made in Fig.A.3,A.4,A.5,A.6,A. 7 and A. 8 in the appendix, section.A.2. All cells are without DRC/LVS errors. Each logic gate has an individual height and width. Thus, some care is needed when routing power and biasing nets. Biasing and power nets were placed close to the edge of the logic gates layout as can be seen in Fig.4.3 which is the layout of the ST-NOT logic gate. Where PB is the P-type MOSFETs back gate biasing net, and NB being the N-type MOSFETs biasing net. VDD is the power supply net and VSS is the ground net. The blue and pink rectangles in Fig. 4.3 shows the metal 1 and metal 2 routing and pins. The red rectangles show the poly silicon gates and the connection between N and P type MOSFETS. Because of the common gate length used in the ST logic gate library, there are two extra red vertical poly-silicon rectangles per MOSFET acting as dummy MOSFETs. The REG-library having a higher common gate length, have only the need for one extra red vertical poly silicon line per MOSFET acting as dummies as can be seen in the REG-NOT layout design in Fig.A.3. Depending on primarily the size of the gate length of a MOSFET, there is a required number of dummy gates per MOSFET to pass design rules check(DRC).


Figure 4.2: Visual size comparison between ST and REG logic gate library


Figure 4.3: Layout of the ST-NOT logic gate

### 4.3.1 Logic-gate layout area comparison

Table.4.3 shows the physical area of each logic gate. Widths and heights are included, where the ST-NOR logic gate is the overall largest, measuring in at $15.84816 \mu^{2}$. The REG-NOT is the smallest logic gate, encompassing an area of $3.635828 \mu \mathrm{~m}^{2}$.

|  | Area | unit | Width | Height | unit |
| :--- | :--- | :--- | :--- | :--- | :--- |
| ST-NOT | 7.822584 | $\mu m^{2}$ | 2.988 | 2.618 | $\mu m$ |
| REG-NOT | 3.635828 | $\mu m^{2}$ | 1.246 | 2.918 | $\mu m$ |
| ST-NAND | 14.11882 | $\mu m^{2}$ | 4.822 | 2.928 | $\mu m$ |
| REG-NAND | 10.43577 | $\mu m^{2}$ | 2.252 | 4.634 | $\mu m$ |
| ST-NOR | 15.84816 | $\mu m^{2}$ | 4.82 | 3.288 | $\mu m$ |
| REG-NOR | 13.2823 | $\mu m^{2}$ | 2.252 | 5.898 | $\mu m$ |

Table 4.3

### 4.4 Sub-100mV SCM

### 4.4.1 DFF Memory element

Fig.4.4 shows the layout of the data-flip-flop. The device placement is divided into two specific rows where the upper row contains all the ST-NOT gates, while the lower contains all the ST-NAND gates. By utilizing two rows, the gates can overlap to some degree dictated by some DRC rules. Overlap means the power ground and back biasing nets of each gate also overlap, minimizing the need for most bias net routing. The DFF measures $35.964 \mu \mathrm{~m}$ in width and $5.546 \mu \mathrm{~m}$ in height with an area of $199.456344 \mu \mathrm{~m}^{2}$. Considering that the DFF has a small unused rectangular area in the lower left corner(the layout design figure is rotated 90 degrees clockwise, thus the unused rectangle is in the lower left corner), the effective area used by the DFF would be $175.356 \mu \mathrm{~m}^{2}$. As the SCM which the DFF is to be used in can have other gates located in this region rather than leaving it empty. The DFF can then be regarded as a six-sided polygon with an area consumption of $175.356 \mu^{2}$, by treating the small rectangle measuring in at $8.231 \mu \mathrm{~m}$ in width and $2.928 \mu \mathrm{~m}$ in height as unused space.


Figure 4.4: Layout design of the memory element DFF, measuring $35.964 \mu \mathrm{~m}$ in width and $5.546 \mu \mathrm{~m}$ in height, the layout design is here rotated 90 degrees clockwise

## Poisson yield model results

The data-flip-flop used in the SCM was verified using the methodology explained in section.3.3.4 to estimate functional SCM yield. The memory-element passed $2217 / 2218$ simulations of scaled sigma sampling with process and mismatch variation applied. With the single failing simulation, not switching with appropriate stimuli applied. Cadence virtuoso simulation editor estimates the functional yield confidence interval to be [99.7863\%,99.9977\%]. Meaning the higher bound of failure rate is 0.00214 or $0.2137 \%$. Using the Poisson yield model from Equation.2.17, to achieve a $90 \%$ lower bound of functional chip yield considering an array of 1024 independent memory elements, a redundancy of at least 4 is needed. Which would result in a lower bound of chip yield of about $92.89 \%$. A redundancy of 4 means that in all working chips(in $92.89 \%$ of all chips), at least 1020 memory elements are working. However, considering that there in all actuality a total of $1024+128+7=1159$ identical data-flip-flops in the SCM, where 1024 are memory elements, 128 are used in the clock gate and 7 are used in the read address logic, the lower bound of functional yield is $89.4015 \%$.

## Critical path results

As mentioned in section.3.3.4 the operating clock frequency was chosen after just applying process and mismatch variation on the chain of inverter and NAND-gates in the critical path to choose a fitting operating clock frequency. The lowest and highest propagation delay sample was found to be 1.854 ms and 5.935 ms . While cadence computes the mean 95\% confidence interval to be between 3.314 ms $3.357 \mathrm{~ms}([3.314 \mathrm{~ms}, 3.357 \mathrm{~ms}])$. As such it was decided to use 150 Hz as the operating frequency when simulating the whole critical path. The same clock frequency is also used during the extraction of power, and performance of the SCM. 150 Hz translates to a period of $1 / \mathrm{f}=6.6666667 \mathrm{~ms}$. Further result regarding functional yield, power and performance should then not be affected because of too small of a clock frequency. Like the Poisson yield results in section.4.4.1 the memory element fails once out of 2289 simulations. With the single failing simulation, not switching with appropriate stimuli applied. Cadence virtuoso computes the $95 \%$ confidence interval to be [99.7929\%, 99.9978\%]. Which is very similar to the Poisson yield model results in the previous section. If we then as in the Poisson yield model, treat each of the 1159 data-flip-flop in the SCM as failing independently, lower bound of functional chip yield would be $90.4303 \%$ with a redundancy of 4 .

### 4.4.2 SCM Layout and Area

To do efficient layout, the SCM was divided into parts. Seen in Fig. 4.5 is a piece of the SCM, mainly 16 memory elements, some 2-to-1 multiplexers and NOT-gates that create a part of a SCM column, which can be seen in Fig.4.6. The layout exploits the memory elements small unused rectangular area mentioned in section.4.4.1. Where mostly multiplexers are placed very close to the memory ele-
ments, which also reduces signal routing length. Layout design of a single SCM column together with the layout of the 7-to-128 WAD can be seen in Fig.4.6. In the completed SCM, the WAD is located at the left edge, followed by eight memory columns. In the appendix, Fig.A. 9 can be seen where approximate location of the WAD, memory columns and read-address D-flip-flops are marked. Fig.4.7 shows the completed SCM which measures $443.559 \mu m \times 763.689 \mu m$ (width $x$ height) With the total area being $338741.1 \mu \mathrm{~m}^{2}$. Since there are 1024 memory elements, the area per bit of the SCM is $330.8019 \mu \mathrm{~m}^{2} /$ bit. This is almost double the effective area of the data-flip-flop memory elements which is $175.356 \mu^{2}$ (section.4.4.1). Which means almost $47 \%$ of the memory is dedicated to overhead circuitry.


Figure 4.5: Layout design of a smaller part of the memory column


Figure 4.6: Layout design of the 7-to-128 WAD and a single memory column


Figure 4.7: Completed layout design of the Standard Cell Memory, measuring $443.559 \mu \mathrm{~m}$ in width and $763.689 \mu \mathrm{~m}$ in height.

|  | AREA | unit |
| :--- | :--- | :--- |
| REG_NAND_OSC | 66.3058 | $\mu \mathrm{~m}^{2}$ |
| REG_NOR_OSC | 84.48295 | $\mu \mathrm{~m}^{2}$ |
| REG_NOT_OSC | 21.24888 | $\mu \mathrm{~m}^{2}$ |
| ST_NAND_OSC | 94.61539 | $\mu \mathrm{~m}^{2}$ |
| ST_NOR_OSC | 107.236 | $\mu \mathrm{~m}^{2}$ |
| ST_NOT_OSC | 50.98817 | $\mu \mathrm{~m}^{2}$ |

Table 4.4: Layout area of the six ring-oscillators

### 4.4.3 SCM Power and Performance

The SCM uses a clock frequency of 150 Hz as mentioned in section.4.4.1, which translates to a clock period of 6.6667 ms . During maximum switching activity of the write logic the energy consumed was found to be about 92.37 pJ over the span of 2 clock periods. While during maximum switching of the read logic, the SCM consumes about 94.12 pJ over a period of 2 clocks. The average power consumed during these operations was found to be 6.991 nW .

### 4.5 7-stage ring-oscillators results

### 4.5.1 Ring-oscillator layout and area

To extract realistic power and performance results for the six 7-stage ring-oscillator circuits, layout of each ring-oscillator circuit was done. A vizual comparison of the six ring-oscillator circuits can be seen in Fig.4.8, where from the top of the figure to the bottom the REG-NAND, REG-NOR, REG-NOT, ST-NAND, ST-NOR and ST-NOT -based ring-oscillators can be seen. Each individual oscillator circuit can also be seen in section.A. 4 in the appendix. The REG-NOT, REG-NAND and REGNOR can be seen in Fig.A.10,A.11,A.12. ST-NOT ST-NAND, ST-NOR can be seen in Fig.A.13,A.14,A.15. Table.4.4 shows the total area of each ring-oscillator where the ST-NOR based oscillator is the largest, measuring in at $107.236 \mu \mathrm{~m}^{2}$.


Figure 4.8: The layout design of the six ring-oscillators

| VDD $=87 \mathrm{mV}$ <br> Frequency |  |  | min | max | mean. 1 | mean.2 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | Unit

Table 4.5: Table showing oscillating frequency of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation

### 4.5.2 Ring-oscillator Frequency, jitter and power consumption

The tables shown in this section include the frequency, jitter and average power consumption extracted from the 7 -stage ring-oscillator circuits. Results are extracted at 27 degrees Celsius at two supply voltage operating points, 87 mV and 100 mV . Tables showing results from 87 mV supply voltage (denoted as VDD $=87 \mathrm{mV}$ ) only includes results from ring oscillators constructed from the ST-logic gate library, as the REG-library supply voltage is found to be 97 mV as mentioned in section.4.2. Tables showing results from 100 mV supply voltage includes results from ring-oscillators constructed from both ST and REG -logic gate libraries. Lowest and highest sample recorded are included in the tables. The mean is given as a $95 \%$ confidence interval where the lower bound refers to the mean. 1 column, while upper bound refers to the mean. 2 column. Each ring-oscillator was simulated pre and post -layout. The difference between pre and post -layout is shown by the diff rows. As an example, in Fig. 4.5 at a supply voltage of 87 mV , the lower bound mean(mean.1) oscillating frequency of the ring-oscillator constructed from ST-NAND gates have changed with 1681.2 Hz . The same ST-NAND based ringoscillator, when supply voltage is 100 mV , shows a difference of 2252.8 Hz .

| $\mathrm{VDD}=87 \mathrm{mV}$ |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Jitter |  | min | max | mean. 1 | mean. 2 | Unit |
| ST NOT | Pre | 0 | 1.181 | $1.40 \mathrm{E}-01$ | $1.51 \mathrm{E}-01$ | Hz |
|  | Post | 0 | 0.4495 | 5.98E-02 | $6.41 \mathrm{E}-02$ | Hz |
|  | Diff | 0 | 0.7315 | 8.03E-02 | 8.64E-02 | Hz |
| ST NAND | Pre | 0 | $6.63 \mathrm{E}-01$ | $9.76 \mathrm{E}-02$ | $1.05 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $2.03 \mathrm{E}-01$ | $2.70 \mathrm{E}-02$ | $2.92 \mathrm{E}-02$ | Hz |
|  | Diff | 0 | $4.60 \mathrm{E}-01$ | 7.06E-02 | 7.58E-02 | Hz |
| ST NOR | Pre | 0 | $9.39 \mathrm{E}-01$ | $2.48 \mathrm{E}-01$ | $2.61 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $4.81 \mathrm{E}-01$ | 8.30E-02 | $8.78 \mathrm{E}-02$ | Hz |
|  | Diff | 0 | 4.57E-01 | $1.65 \mathrm{E}-01$ | $1.74 \mathrm{E}-01$ | Hz |

Table 4.6: Table showing jitter of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation

| VDD $=87 \mathrm{mV}$   <br> Average Power  min |  |  |  |  |  |  |  | max | mean. 1 | mean. 2 | Unit |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---: | :---: | :---: | :---: | :---: |
| ST NOT | Pre | $1.22 \mathrm{E}-12$ | $8.22 \mathrm{E}-12$ | $3.09 \mathrm{E}-12$ | $3.15 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Post | $9.95 \mathrm{E}-13$ | $3.03 \mathrm{E}-12$ | $1.68 \mathrm{E}-12$ | $1.70 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Diff | $2.26 \mathrm{E}-13$ | $5.19 \mathrm{E}-12$ | $1.41 \mathrm{E}-12$ | $1.45 \mathrm{E}-12$ | W |  |  |  |  |  |
| ST NAND | Pre | $1.40 \mathrm{E}-12$ | $9.34 \mathrm{E}-12$ | $3.51 \mathrm{E}-12$ | $3.57 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Post | $7.69 \mathrm{E}-13$ | $4.76 \mathrm{E}-12$ | $1.89 \mathrm{E}-12$ | $1.93 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Diff | $6.35 \mathrm{E}-13$ | $4.58 \mathrm{E}-12$ | $1.61 \mathrm{E}-12$ | $1.64 \mathrm{E}-12$ | W |  |  |  |  |  |
| ST NOR | Pre | $3.45 \mathrm{E}-12$ | $1.03 \mathrm{E}-11$ | $5.71 \mathrm{E}-12$ | $5.77 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Post | $2.22 \mathrm{E}-12$ | $5.70 \mathrm{E}-12$ | $3.44 \mathrm{E}-12$ | $3.47 \mathrm{E}-12$ | W |  |  |  |  |  |
|  | Diff | $1.23 \mathrm{E}-12$ | $4.62 \mathrm{E}-12$ | $2.28 \mathrm{E}-12$ | $2.31 \mathrm{E}-12$ | W |  |  |  |  |  |

Table 4.7: Table showing average power consumption of the three ST logic gate based ring-oscillators at 87 mV , for pre and post -layout simulation in addition to the difference in W between the pre-layout simulation to the post-layout simulation

| VDD $=100 \mathrm{mV}$ |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | Freq | min | max | mean. 1 | mean. 2 | Unit |
| REG NOT | Pre | $4.17 \mathrm{E}+03$ | $1.46 \mathrm{E}+04$ | $7.88 \mathrm{E}+03$ | $7.98 \mathrm{E}+03$ | Hz |
|  | Post | $2.31 \mathrm{E}+03$ | $7.85 \mathrm{E}+03$ | $4.29 \mathrm{E}+03$ | $4.35 \mathrm{E}+03$ | Hz |
|  | Diff | 1864 | 6732 | 3587 | 3634 | Hz |
| ST NOT | Pre | $2.54 \mathrm{E}+03$ | $8.55 \mathrm{E}+03$ | $4.69 \mathrm{E}+03$ | $4.75 \mathrm{E}+03$ | Hz |
|  | Post | 862.3 | $2.80 \mathrm{E}+03$ | $1.57 \mathrm{E}+03$ | $1.59 \mathrm{E}+03$ | Hz |
|  | Diff | 1679.7 | 5748 | 3125 | 3166 | Hz |
| REG NAND | Pre | $2.19 \mathrm{E}+03$ | $7.08 \mathrm{E}+03$ | $3.96 \mathrm{E}+03$ | $4.01 \mathrm{E}+03$ | Hz |
|  | Post | $1.22 \mathrm{E}+03$ | $3.89 \mathrm{E}+03$ | $2.11 \mathrm{E}+03$ | $2.14 \mathrm{E}+03$ | Hz |
|  | Diff | 972 | 3187 | 1852 | 1876 | Hz |
| ST NAND | Pre | $1.78 \mathrm{E}+03$ | $5.87 \mathrm{E}+03$ | $3.19 \mathrm{E}+03$ | $3.23 \mathrm{E}+03$ | Hz |
|  | Post | 525.3 | $1.64 \mathrm{E}+03$ | 941.2 | 952.9 | Hz |
|  | Diff | 1251.7 | 4224 | 2252.8 | 2281.1 | Hz |
| REG NOR | Pre | $3.30 \mathrm{E}+03$ | $1.15 \mathrm{E}+04$ | $6.21 \mathrm{E}+03$ | $6.29 \mathrm{E}+03$ | Hz |
|  | Post | $2.13 \mathrm{E}+03$ | $7.16 \mathrm{E}+03$ | $3.88 \mathrm{E}+03$ | $3.96 \mathrm{E}+03$ | Hz |
|  | Diff | 1170 | 4370 | 2328 | 2333 | Hz |
| ST NOR | Pre | $2.02 \mathrm{E}+03$ | $6.56 \mathrm{E}+03$ | $3.67 \mathrm{E}+03$ | $3.72 \mathrm{E}+03$ | Hz |
|  | Post | 689 | $2.43 \mathrm{E}+03$ | $1.40 \mathrm{E}+03$ | $1.42 \mathrm{E}+03$ | Hz |
|  | Diff | 1331 | 4127 | 2265 | 2293 | Hz |

Table 4.8: Table showing oscillating frequency of the six logic gate based ringoscillators at 100 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation

| VDD=100mV |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | Jitter | min | max | mean.1 | mean. 2 | Unit |
| REG NOT | Pre | 0 | 2.722 | $2.95 \mathrm{E}-01$ | $3.27 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $9.87 \mathrm{E}-01$ | $1.21 \mathrm{E}-01$ | $1.32 \mathrm{E}-01$ | Hz |
|  | Diff | 0 | 1.7348 | 0.1737 | 0.195 | Hz |
| ST NOT | Pre | 0 | $8.76 \mathrm{E}-01$ | $1.23 \mathrm{E}-01$ | $1.33 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $6.08 \mathrm{E}-01$ | $6.46 \mathrm{E}-02$ | $6.94 \mathrm{E}-02$ | Hz |
|  | Diff | 0 | 0.2685 | 0.05876 | 0.06367 | Hz |
| REG NAND | Pre | 0 | $8.96 \mathrm{E}-01$ | $1.66 \mathrm{E}-01$ | $1.77 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $4.74 \mathrm{E}-01$ | $8.34 \mathrm{E}-02$ | $8.89 \mathrm{E}-02$ | Hz |
|  | Diff | 0 | 0.4221 | 0.0823 | 0.08828 | Hz |
| ST NAND | Pre | $7.73 \mathrm{E}-12$ | $7.86 \mathrm{E}-01$ | $1.24 \mathrm{E}-01$ | $1.32 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $2.57 \mathrm{E}-01$ | $3.03 \mathrm{E}-02$ | $3.29 \mathrm{E}-02$ | Hz |
|  | Riff | $7.73 \mathrm{E}-12$ | 0.5294 | 0.09328 | 0.09944 | Hz |
| ST NOR | Pre | 0 | 2.065 | $3.21 \mathrm{E}-01$ | $3.39 \mathrm{E}-01$ | Hz |
|  | Post | 0 | 2.041 | $3.47 \mathrm{E}-01$ | $3.77 \mathrm{E}-01$ | Hz |
|  | Diff | 0 | 0.024 | -0.026 | -0.0379 | Hz |
|  | Pre | 0 | 1.133 | $2.39 \mathrm{E}-01$ | $2.54 \mathrm{E}-01$ | Hz |
|  | Post | 0 | $5.01 \mathrm{E}-01$ | $9.76 \mathrm{E}-02$ | $1.03 \mathrm{E}-01$ | Hz |
|  | Diff | 0 | 0.6317 | 0.14106 | 0.1504 | Hz |

Table 4.9: Table showing jitter of the six logic gate based ring-oscillators at 100 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation

| VDD $=100 \mathrm{mV}$ |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| REG NOT | Average power | min | max | mean.1 | mean.2 | Unit |
|  | Pre | $1.06 \mathrm{E}-12$ | $3.00 \mathrm{E}-12$ | $1.82 \mathrm{E}-12$ | $1.84 \mathrm{E}-12$ | W |
|  | Sost | $8.27 \mathrm{E}-13$ | $2.25 \mathrm{E}-12$ | $1.38 \mathrm{E}-12$ | $1.40 \mathrm{E}-12$ | W |
|  | Diff | Pre | $2.36 \mathrm{E}-13$ | $7.55 \mathrm{E}-13$ | $4.36 \mathrm{E}-13$ | $4.41 \mathrm{E}-13$ |
| W |  |  |  |  |  |  |
|  | Post | $2.52 \mathrm{E}-12$ | $7.97 \mathrm{E}-12$ | $4.24 \mathrm{E}-12$ | $4.29 \mathrm{E}-12$ | W |
|  | Diff | $1.39 \mathrm{E}-12$ | $4.18 \mathrm{E}-12$ | $2.35 \mathrm{E}-12$ | $2.38 \mathrm{E}-12$ | W |
| REG NAND | Pre | $1.13 \mathrm{E}-12$ | $3.79 \mathrm{E}-12$ | $1.89 \mathrm{E}-12$ | $1.92 \mathrm{E}-12$ | W |
|  | Post | $2.22 \mathrm{E}-12$ | $6.52 \mathrm{E}-12$ | $3.79 \mathrm{E}-12$ | $3.83 \mathrm{E}-12$ | W |
|  | Siff | $1.79 \mathrm{E}-12$ | $5.12 \mathrm{E}-12$ | $2.93 \mathrm{E}-12$ | $2.96 \mathrm{E}-12$ | W |
| STAND | Pre | $4.32 \mathrm{E}-13$ | $1.41 \mathrm{E}-12$ | $8.62 \mathrm{E}-13$ | $8.72 \mathrm{E}-13$ | W |
|  | Post | $2.76 \mathrm{E}-12$ | $8.26 \mathrm{E}-12$ | $4.90 \mathrm{E}-12$ | $4.95 \mathrm{E}-12$ | W |
|  | Diff | $1.50 \mathrm{E}-12$ | $4.33 \mathrm{E}-12$ | $2.71 \mathrm{E}-12$ | $2.74 \mathrm{E}-12$ | W |
|  | Pre | $1.26 \mathrm{E}-12$ | $3.93 \mathrm{E}-12$ | $2.19 \mathrm{E}-12$ | $2.22 \mathrm{E}-12$ | W |
|  | Post | $8.00 \mathrm{E}-12$ | $1.74 \mathrm{E}-11$ | $1.16 \mathrm{E}-11$ | $1.17 \mathrm{E}-11$ | W |
|  | Diff | $7.03 \mathrm{E}-12$ | $1.38 \mathrm{E}-11$ | $1.00 \mathrm{E}-11$ | $1.01 \mathrm{E}-11$ | W |
|  | Pre | $9.66 \mathrm{E}-13$ | $3.56 \mathrm{E}-12$ | $1.55 \mathrm{E}-12$ | $1.52 \mathrm{E}-12$ | W |
|  | Post | $4.75 \mathrm{E}-12$ | $1.41 \mathrm{E}-11$ | $7.86 \mathrm{E}-12$ | $7.94 \mathrm{E}-12$ | W |
|  | Diff | $3.13 \mathrm{E}-12$ | $7.87 \mathrm{E}-12$ | $4.81 \mathrm{E}-12$ | $4.86 \mathrm{E}-12$ | W |
|  |  | $1.62 \mathrm{E}-12$ | $6.19 \mathrm{E}-12$ | $3.05 \mathrm{E}-12$ | $3.08 \mathrm{E}-12$ | W |

Table 4.10: Table showing average power consumption of the six logic gate based ring-oscillators at 100 mV , for pre and post -layout simulation in addition to the difference in Hz between the pre-layout simulation to the post-layout simulation

| Delay | Mean.1 | Mean.2 |
| :--- | :--- | :--- |
| REG-NOT | $1.64 \mathrm{E}-05$ | $1.66 \mathrm{E}-05$ |
| ST-NOT | $4.50 \mathrm{E}-05$ | $4.56 \mathrm{E}-05$ |
| REG-NAND | $3.34 \mathrm{E}-05$ | $3.39 \mathrm{E}-05$ |
| ST-NAND | $7.50 \mathrm{E}-05$ | $7.59 \mathrm{E}-05$ |
| REG-NOR | $1.81 \mathrm{E}-05$ | $1.84 \mathrm{E}-05$ |
| ST-NOR | $5.02 \mathrm{E}-05$ | $5.09 \mathrm{E}-05$ |

Table 4.11: Delay of each logic gate at 100 mV , computed from Table.4.8

| Average power | Mean. 1 | Mean. 2 |
| :--- | :--- | :--- |
| REG-NOT | $1.98 \mathrm{E}-13$ | $2.00 \mathrm{E}-13$ |
| ST-NOT | $3.36 \mathrm{E}-13$ | $3.39 \mathrm{E}-13$ |
| REG-NAND | $4.18 \mathrm{E}-13$ | $4.23 \mathrm{E}-13$ |
| ST-NAND | $3.87 \mathrm{E}-13$ | $3.91 \mathrm{E}-13$ |
| REG-NOR | $1.43 \mathrm{E}-12$ | $1.45 \mathrm{E}-12$ |
| ST-NOR | $6.87 \mathrm{E}-13$ | $6.94 \mathrm{E}-13$ |

Table 4.12: Average power consumption of each logic gate at 100 mV , computed from Table.4.10

### 4.5.3 Individual logic gate power and performance

As mentioned in section.3.4.1 power per stage and delay was extracted to compute PDP and EDP -figures of merit using the 7-stage ring-oscillator results(section.4.5.2). Using the post-layout ring-oscillator results at a supply voltage of 100 mV , the delay per stage and average power consumption per stage is computed and can be seen in Table.4.11 and Table.4.12. Fig.4.9 is a histogram which shows the individual gates PDP and EDP -FOM, based on the delay per stage and average power per stage tables. For an accurate comparison between the logic gates, PDP and EDP is computed using the $95 \%$ mean confidence intervals. Thus, Mean. 1 in the PDP and EDP table is computed from the Mean. 1 column in the delay and power per stage table. Mean. 2 in the PDP and EDP table is similarly computed. Thus, the REG-NOT have a PDP in the range [3.25E-18, 3.33E-18] and EDP is in the range of [5.35E-23, 5.54E-21]. The histogram shows us that overall, the ST-library is worse in terms of power and performance versus the REG-library. Fig.4.10 is included as a comparison in terms of PDP and EDP for the ST logic gate library, where PDP and EDP is computed based on the ring-oscillator results at 87 mV and 100 mV , where the computed delay and power at 87 mV can be seen in Table.4.13 and 4.14. PDP for the ST-logic gates does not change that much. But there is a small difference in EDP, which is lower(better) at 100 mV .

| Delay | Mean.2 | Mean. 1 |
| :--- | :--- | :--- |
| ST-NOT | $6.06 \mathrm{E}-05$ | $6.14 \mathrm{E}-05$ |
| ST-NAND | $9.96 \mathrm{E}-05$ | $1.01 \mathrm{E}-04$ |
| ST-NOR | $6.75 \mathrm{E}-05$ | $6.84 \mathrm{E}-05$ |

Table 4.13: Delay of the ST logic gates at 87 mV supply voltage, computed from Table.4.5

| Average power | Mean. 1 | Mean. 2 |
| :--- | :--- | :--- |
| ST-NOT | $2.40 \mathrm{E}-13$ | $2.43 \mathrm{E}-13$ |
| ST-NAND | $2.71 \mathrm{E}-13$ | $2.76 \mathrm{E}-13$ |
| ST-NOR | $4.91 \mathrm{E}-13$ | $4.95 \mathrm{E}-13$ |

Table 4.14: Average power consumption of the ST-logic gates at 87 mV , computed from Table.4.7


Figure 4.9: PDP and EDP histogram for each logic gate at a supply voltage 100 mV


Figure 4.10: Histogram showing the change in the ST logic gates PDP and EDP from 87 mV to 100 mV

## Chapter 5

## Discussion

### 5.1 Device consideration

The 22 nm FD-SOI process technology is primarily chosen as the technology provides a cutting edge 22 nm technology node. While the technology provides the ability for stronger back gate biasing than most common CMOS process technologies, back gate biasing was not investigated or applied in this thesis project. The device series used was determined through the threshold voltage analysis results seen in section.4.1. The provided series of devices have a wide array of different threshold voltages. But in the end the series of device which n 1 and p 1 belong to(seen in Fig.4.1) which have neither the highest nor lowest threshold voltage, was picked. In the future it would be interesting to apply series with higher threshold voltage to indicate how power consumption might decrease as higher threshold voltage means lower leakage current in a device.

### 5.2 Regarding the logic gate sizing methodology

The margining methodology(section.3.3.1) used for sizing the logic gates, resulted in two logic gate libraries that can function at sub-100mV supply voltage. There are a few problems with the methodology that need to be discussed. Picking a maximum voltage deviation need to be done carefully. As on one side, too small of a maximum voltage deviation, means that the supply voltage probably needs to be raised, lest the logic gates at the current sizing not pass 1000/1000 Monte Carlo simulations. The logic gates sizing can be increased, which would according to Pelgroms law(section.2.2) decreases the variance of mismatch. Thus, using larger sizing's for the logic gates in this thesis could lower the minimum supply voltage of each library. On the other side, if too high of a maximum voltage deviation is chosen, lower supply voltage and/or smaller sizing can be used, but this might lead to connected fan-out logic not understanding what the logic gate output voltage level is, which results in circuits employing the logic gates having a low functional yield. So there is a balancing act between setting a maximum voltage
deviation, sizing, yield and minimum supply voltage. The maximum voltage deviation used $(0.2 \cdot V D D)$ in this thesis was set from experience, and did fulfil its purpose, as the SCM circuit yield goal was at $90 \%$ (includes a redundancy of 4).

### 5.3 Poisson yield model and critical path methodology

The most important assumption of the Poisson yield method is the fact that all memory elements are treated as independently failing. The Poisson yield model might be best applied on memory circuits, but it all depends on the architecture. Because this thesis SCM has such an architecture that, if one memory element on a bit-line fails, it should not affect any other connected memory element on the same bit-line. Do note that since the clock gates and read address d-flip-flops are of the same type as the memory elements, any failure in these components, will affect the functionality of the rest of the circuit. Redundancy is used to refer to memory elements that are purposefully neglected in this thesis. Any circuit which uses this thesis SCM, can then employ software in the form of a self-test which determines which memory element that should not be written nor read as it is non-functional. But there still is the problem with the d-flip-flops used in the overhead. As mentioned in section.4.4.1 where the lower bound of yield was for 1024 memory elements above $90 \%$ with a maximum redundancy of 4 , the true lower bound is probably closer to $89.4015 \%$ since there is a total of 1159 d-flipflops.

The critical path methodology also shows similar results and is regarded as the more accurate method on estimating functional circuit yield. There is a small difference in samples, where the critical path methodology resulted in a confidence interval based on more samples, and thus in theory should be more accurate. In addition, the input stimuli and output load of the memory element under test should be more typical as the behaviour of the circuit is better modelled using a reduced path, rather than just the memory element itself and some inverters. However, the method has low coverage, as just one path is modelled and not the whole SCM is simulated. Thus, confidence will not be $95 \%$. In this method, the results are very similar to the Poisson yield model results, as there is a single failing simulation, where the memory element failed to switch to the proper value. The true lower bound of functional yield is likely above $90 \%$ with a maximum redundancy of 4 . But a 4 in redundancy is more complex than it seems. Consider the fact that four clock gates fail in one SCM. Thus, the SCM only has 1024-32=992 functional memory elements. If instead four read address flip-flops fail, a SCM might only be able to read 8 addresses, which translates to 64 bits. However, considering that there is a-lot more memory elements in the SCM than read-address flip-flops and clock gates, both cases are unlikely. After-all since the lower bound of functional yield for a DFF in the SCM is 99.7929\%(section.4.4.1), then the read-address flip-flops as a standalone component will have a lower bound of yield equal to $0.997929^{7}=98.5593 \%$. Likewise, the lower bound of yield for the
clock gates as standalone component is $0.997929^{128}=76.6928 \%$. Do note that the probability function of the clock gates and read-address flip-flops need to be merged into the memory elements to find the actual approximate circuit yield and what this yield means in reference to functional memory elements.

In the future, using a method with better coverage, which considers all components in the SCM and specifically what number of functional elements the final yield refers to, should be a goal. A way to do this, is by simulating the complete SCM using SSS or Monte Carlo with sufficient samples. This is of-course a time intensive method, but the most accurate, as the methods used in this thesis to estimate chip-yield rely on the probability of functionality of a single memory element to model a circuit which has many more dimensions.

### 5.4 Tools impact on results

There are many sources of error, which can adversely affect the results. While there is always human error affecting result accuracy. Trust in the models provided by the foundry, the spectre simulator, Cadence Virtuoso layout and simulation post processing is needed.

Physical layout has DRC and LVS rules which should allow for many different designs which can be produced by the foundry and should produce accurate extracted parasitics and layout effect models to be included in any post layout simulation. Though, extracting physical layout effects and parasitic's at different corners than nominal RC would probably produce different results, and should be considered in further development.

Most results are extracted using SSS with process and mismatch applied, thus the results should be realistic, however the way in which the confidence intervals are computed by Cadence is unknown. As in section.3.6, it is mentioned that the way in which pass/fail specification is computed is similar to the adjusted-Wald method of computing confidence intervals. This adds another uncertainty to the results, as the lower bound of functional circuit yield is computed based on precisely the confidence interval computed by Cadence Virtuoso.

### 5.5 Ring-oscillator results accuracy

The jitter of the ring-oscillators seen in Table. 4.6 and 4.9 show that the frequency for the most part jitters. Overall, the jitter is low with the REG-NOT oscillator having the highest sample of 2.722 Hz in Table.4.9. Which means that the oscillating frequency extracted using Cadence Virtuoso is likely accurate and good enough for this thesis purpose.

### 5.6 Logic gate libraries results

The minimum supply voltage for each logic gate library designed in this thesis(section.4.2) show that the ST-logic library has a much lower minimum supply voltage than the REG-library. While it loses in terms of area, PDP and EDP as can be seen in Table Fig.4.3 and Fig.4.9. Using ST-structures can be seen as a supply voltage reduction technique, trading area, power, performance for supply voltage reduction. Which is similar to what previous research in the sub-100mV supply voltage domain has shown [6].

### 5.7 SCM discussion

The SCM designed and simulated in this thesis is a simple architecture, where low internal fan-out is prioritized. Each logic gate had a low fan-in because of the chosen logic gate transistor architectures. The fan-out of each gate was chosen to be as low as possible. Fan-in and fan-out is directly linked to output voltage deviation, thus linked to functional yield. While this problem was solved by inserting buffers and inverter tree structures in the SCM, a-lot of SCM area could be saved if fan-out of each logic gate was chosen as higher. Buffers and inverter trees also introduce some delay and increased power consumption. Both read and write logic can be more performing in terms of power and performance. The 128-to- 1 multiplexers, while having a low internal fan-out have a very large delay and is susceptible to glitches. The Write logic employs a pre-decoder in the form of a 3 -to- 8 decoder, such that there is reduced switching activity in the WAD. Since the 3 -to- 8 decoder selects only one 4 -to- 16 decoder in the WAD, the rest will have reduced switching activity.

Increasing the SCM size either through adding columns to store more bit per address, or adding more rows, such that the memory has more addresses, would likely increase power consumption. Adding more rows and thus addresses, the multiplexers would need to choose between $128+x$ more rows in a column, which would probably decrease performance, as the longest delay path, is through the multiplexers in the SCM. Adding another column, should mostly affect the power consumption, as there is no need to increase the multiplexers.

The current results are satisfactory as the goal of a sub-100mV memory is achieved. There is a margin of almost 13 mV , which means that the supply voltage could be increased. In such a case, the on-to-off current ratio of the logic gates would improve, which would improve the maximum voltage deviation, possibly improving the functional yield of the SCM. When it comes to power and performance, there is not a-lot of difference as seen with PDP in Fig.4.10, section.4.5.3. EDP shows a little bit of difference between 87 mV and 100 mV but not a lot. To the authors knowledge the SCM in this thesis is the first sub- 100 mV supply voltage memory design at a size of 1024 bits.

|  | This project SCM |  | 8 x 8 -bit multiplier from[6] |  |
| :--- | :--- | :--- | :--- | :--- |
| Average power | 6.991 nW | nW | 17.9 nW | nW |
| Frequency | 150 Hz | Hz | 5200 Hz | Hz |
| Supply voltage | 87 mV | mV | 62 mV | mV |
| Area | 338741.1 | $\mu \mathrm{~m}^{2}$ | 45002 | $\mu \mathrm{~m}^{2}$ |

Table 5.1: Comparison between this thesis SCM and 8x8-bit multiplier from[6]

### 5.7.1 SCM comparison to similar circuits

Table.5.1 shows a comparison between the 8 x 8 -bit multiplier test structure from [6] and this thesis SCM. Note that the 8x8-bit multiplier results in the table from [6] pertains to one of the S16 test structures, which had a minimum supply voltage of 62 mV . Regarding comparison between the functional yield, the $8 \times 8$-bit multiplier chips from [6] seem to have a $90 \%$ circuit yield as [6] reports that out of ten produced chips, one failed due to bonding error. This thesis lower bound of functional yield was simulated and computed to be $90.4303 \%$ using the critical path methodology(section.4.4.1). Note that this includes a maximum redundancy of 4.
Except for the average power consumption, this thesis SCM has a lower frequency, higher layout area and higher supply voltage than the 8x8-bit multiplier. The SCM power was extracted at the tt-corner. Not running scaled sigma sampling, thus the SCM typical power consumption result is a bit inaccurate. Future development should then consider the methodology proposed in section.5.3, where simulation of the complete memory using SSS will produce a more realistic power consumption. Frequency is very application dependent. Being chosen in this thesis to be 150 Hz .
With respect to the area per bit of the SCM and percentage overhead. The SCM has $330.8019 \mu \mathrm{~m}^{2} /$ bit. With the area overhead consuming just a little under $47 \%$ of the total area. The $512 \times 13$-bit memory in [32], where the total layout area is $520 \times 480 \mu \mathrm{~m}^{2}$. Would then have an area per bit of $(5120 * 480) /(512 * 13)=$ $37.5 \mu \mathrm{~m}^{2} /$ bit. And it seems that each bit-cell from [32] has a width and height of $4.74 \mu \mathrm{~m}$ which would indicate that the overhead consumes just above $40 \%$ of the layout area $\left(1-\left(4.74^{2} / 37.5\right)=0.400864\right)$. In addition, while the memory in [32], use energy per operation as a figure of merit, which is 1 nJ per operation it also has a power consumption of $1.196 \mu W$. This thesis SCM use just 92.37pJ in write energy and 94.12 pJ in read energy, with the average power consumed, being just $6.991 \mathrm{nW} \mu W$. This seems reasonable as the memory in [32] operates at a nominal supply voltage of 310 mV . Even if this thesis SCM use almost 10 times the area per bit, supply voltage is much lower at just 87 mV .

### 5.8 Further development

As mentioned in section.5.3, simulating a more accurate SCM is needed. While the current yield estimation methods used in this thesis show good results for the SCM, a closer look at redundancy and how it effects the SCM is needed. Simulating the whole memory with SSS or Monte-Carlo would then be a methodology which could reveal a more accurate functional yield, in addition to estimating more accurate power and performance at the same time. In the case that the functional chip yield is satisfactory, the design could be realized on a physical chip. Producing a number of chips would reveal the true circuit yield, true power and performance, in addition to minimum supply voltage. Both temperature operating range and body bias would be interesting to investigate and apply to the SCM. The current SCM is constrained, as 27 degrees Celsius means that the SCM might only be suited for application in a lab environment, where temperature is constant. Choosing a different device series with higher threshold voltage to construct the SCM from thus possibly decreasing leakage and static power consumption would be interesting. And investigating techniques to enable high fan-out, to reduce the number of buffers, and size of inverter trees. The SCM is also of a simple architecture, where it would be interesting to implement other DFF or latch -architectures as memory elements. And investigate substitutes for read and write logic which are robust but power saving.

## Chapter 6

## Conclusion

This thesis focused on the construction of a standard cell memory, operating at a supply voltage lower than 100 mV . The project presents two logic gate libraries consisting of the most basic common logic gates(NAND, NOR and NOT). The Schmitt-Trigger(ST) logic gate library is used to construct a very simple SCM, consisting of 1024 data-flip-flops, 128 -to- 1 multiplexers, one hot and one cold decoders and clock gates. Schmitt Trigger logic gates have an improved on-to-off current ratio, thus effectively lowering the minimum supply voltage which a logic gate can operate at. As a comparison to the ST-logic gate library, a library consisting of classic logic gates are constructed. Minimum supply voltage and area are compared. The individual logic gates are benchmarked in terms of power and performance by constructing a 7 -stage ring-oscillator circuit from each logic gate. The six ring-oscillators, oscillating frequency and power consumption are extracted and compared using PDP and EDP figures of merit. Comparison shows that while the ST-logic library has worse power, performance and area, the minimum supply voltage is lower than the classic logic gate library. Thus Schmitt-Trigger logic gates would be interesting for practical applications in other supply voltage limited circuits.

Since the ST-library achieved a minimum supply voltage of 87 mV , the SCM area, power, and performance in addition to functional yield is extracted and computed at this supply voltage. Compared to similar sub-100mV circuits, the SCM has a lower power consumption. While performance is worse, minimum supply voltage is higher and layout area consumption is higher. The functional yield is comparable, as it is estimated to be above $90 \%$ with a redundancy of 4 . The average power consumption was found to be 6.991 nW . Physical area consumption is $338741.1 \mu^{2}$. Operating frequency is 150 Hz . With a lower bound of functional yield estimated to be above $90.4303 \%$ with a redundancy of 4 . The SCM is to the authors knowledge the first design of a 1024 -bit memory array at sub-100mV supply voltage.

## Bibliography

[1] H. Soeleman, K. Roy and B. Paul, 'Robust subthreshold logic for ultra-low power operation,' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 9, no. 1, pp. 90-99, 2001. DOI: 10.1109/92.920822.
[2] X. Wu, I. Lee, Q. Dong, K. Yang, D. Kim, J. Wang, Y. Peng, Y. Zhang, M. Saligane, M. Yasuda, K. Kumeno, F. Ohno, S. Miyoshi, M. Kawaminami, D. Sylvester and D. Blaauw, 'A 0.04 mm 316 nw wireless and batteryless sensor system with integrated cortex-m0+ processor and optical communication for cellular temperature measurement,' in 2018 IEEE Symposium on VLSI Circuits, 2018, pp. 191-192. DOI: 10.1109/VLSIC. 2018.8502391.
[3] M. Alioto, 'Enabling the Internet of Things: From Integrated Circuits to Integrated Systems,' [Online]. Available: https://link. springer. com/ book/10.1007/978-3-319-51482-6.
[4] Y. K. Ramadass and A. P. Chandrakasan, 'A batteryless thermoelectric energyharvesting interface circuit with 35mv startup voltage,' in 2010 IEEE International Solid-State Circuits Conference - (ISSCC), 2010, pp. 486-487. DOI: 10.1109/ISSCC. 2010. 5433835.
[5] D. S. Weiss, R. Kirsner and W. H. Eaglstein, 'Electrical Stimulation and Wound Healing,' Archives of Dermatology, vol. 126, no. 2, pp. 222-225, Feb. 1990, ISSN: 0003-987X. DOI: 10.1001/archderm. 1990.01670260092018. eprint: https://jamanetwork.com/journals/jamadermatology/articlepdf/ 551423/archderm\_126\_2\_018.pdf. [Online]. Available: https://doi. org/10.1001/archderm. 1990.01670260092018.
[6] N. Lotze and Y. Manoli, 'A $62 \mathrm{mv} 0.13 \mu \mathrm{~m}$ cmos standard-cell-based design technique using schmitt-trigger logic,' IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 47-60, 2012. DOI: 10.1109/JSSC. 2011. 2167777.
[7] L. Benini, A. Macii and M. Poncino, 'Energy-aware design of embedded memories: A survey of technologies, architectures, and optimization techniques,' ACM Trans. Embed. Comput. Syst., vol. 2, no. 1, pp. 5-32, Feb. 2003, ISSN: 1539-9087. DOI: 10.1145/605459.605461. [Online]. Available: https://doi.org/10.1145/605459.605461.
[8] O. Andersson, B. Mohammadi, P. Meinerzhagen, A. Burg and J. N. Rodrigues, 'Ultra low voltage synthesizable memories: A trade-off discussion in 65 nm cmos,' IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 6, pp. 806-817, 2016. DOI: 10.1109/TCSI. 2016. 2537931.
[9] Y. Tsividis, 'Eric vittoz and the strong impact of weak inversion circuits,' IEEE Solid-State Circuits Society Newsletter, vol. 13, no. 3, pp. 56-58, 2008. DOI: 10.1109/N-SSC. 2008.4785782.
[10] H. Soeleman and K. Roy, 'Ultra-low power digital subthreshold logic circuits,' in Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477), 1999, pp. 94-96. DOI: 10.1145/ 313817.313874.
[11] V. Beiu, S. Aunet, J. Nyathi, R. R. Rydberg and W. Ibrahim, 'Serial addition: Locally connected architectures,' IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 54, no. 11, pp. 2564-2579, 2007. DOI: 10. 1109/ TCSI. 2007. 907885.
[12] J. Meindl and J. Davis, 'The fundamental limit on binary switching energy for terascale integration (tsi),' IEEE Journal of Solid-State Circuits, vol. 35, no. 10, pp. 1515-1516, 2000. DOI: 10.1109/4.871332.
[13] D. G. A. Neto and C. Galup-Montoro, 'Design and testing of a 32-khz frequency divider chain operating at vdd $=76 \mathrm{mv}$,' in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020, pp. 1-5. DOI: 10.1109/ ISCAS45731.2020.9181177.
[14] L. A. Pasini Melek, A. L. da Silva, M. C. Schneider and C. Galup-Montoro, 'Analysis and design of the classical cmos schmitt trigger in subthreshold operation,' IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 4, pp. 869-878, 2017. DOI: 10.1109/TCSI.2016. 2631726.
[15] L. A. Pasini Melek, M. C. Schneider and C. Galup-Montoro, 'Operation of the classical cmos schmitt trigger as an ultra-low-voltage amplifier,' IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 9, pp. 12391243, 2018. DOI: 10.1109/TCSII. 2017. 2783975.
[16] N. Reynders and W. Dehaene, 'Ultra-Low-Voltage Design of Energy Efficient Digital Circuits,' [Online]. Available: https://link.springer.com/book/ 10.1007/978-3-319-16136-5.
[17] M. Pelgrom, A. Duinmaijer and A. Welbers, 'Matching properties of mos transistors,' IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 14331439, 1989. DOI: 10.1109/JSSC. 1989. 572629.
[18] S. Sun, X. Li, H. Liu, K. Luo and B. Gu, 'Fast statistical analysis of rare circuit failure events via scaled-sigma sampling for high-dimensional variation space,' IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 7, pp. 1096-1109, 2015. DOI: 10.1109/TCAD. 2015.2404895.
[19] A. Singhee and R. A. Rutenbar, 'Extreme Statistics in Nanoscale Memory Design.,' [Online]. Available: https://link. springer. com / book/ 10 . 1007/978-1-4419-6606-3\#toc.
[20] R. Aitken and S. Idgunji, 'Worst-case design and margin for embedded sram,' in 2007 Design, Automation Test in Europe Conference Exhibition, 2007, pp. 1-6. DOI: 10.1109/DATE. 2007.364475.
[21] M. Qazi, M. Tikekar, L. Dolecek, D. Shah and A. Chandrakasan, 'Loop flattening amp; spherical sampling: Highly efficient model reduction techniques for sram yield analysis,' in 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010), 2010, pp. 801-806. DOI: 10 . 1109 / DATE. 2010.5456940.
[22] W. Wu, F. Gong, G. Chen and L. He, 'A fast and provably bounded failure analysis of memory circuits in high dimensions,' in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014, pp. 424-429. DOI: 10.1109/ASPDAC. 2014.6742928.
[23] J. Sauro and J. Lewis, 'Estimating completion rates from small samples using binomial confidence intervals: Comparisons and recommendations,' vol. 49, Sep. 2005. DOI: 10.1177/154193120504902407.
[24] N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet, D. Croain, M. Bocat, P.-O. Sassoulas, X. Federspiel, A. Cros, A. Bajolet, E. Richard, B. Dumont, P. Perreau, D. Petit, D. Golanski, C. Fenouillet-Béranger, N. Guillot, M. Rafik, V. Huard, S. Puget, X. Montagner, M.-A. Jaud, O. Rozeau, O. Saxod, F. Wacquant, F. Monsieur, D. Barge, L. Pinzelli, M. Mellier, F. Boeuf, F. Arnaud and M. Haond, ' 28 nm fdsoi technology platform for high-speed low-voltage digital applications,' in 2012 Symposium on VLSI Technology (VLSIT), 2012, pp. 133-134. DOI: 10.1109/VLSIT. 2012.6242497.
[25] M. Wiatr and S. Kolodinski, '22fdx ${ }^{\text {TM }}$ technology and add-on-functionalities,' in ESSDERC 2019-49th European Solid-State Device Research Conference (ESSDERC), 2019, pp. 70-73. DOI: 10.1109/ESSDERC. 2019.8901819.
[26] R. Carter, J. Mazurier, L. Pirro, J.-U. Sachse, P. Baars, J. Faul, C. Grass, G. Grasshoff, P. Javorka, T. Kammler, A. Preusse, S. Nielsen, T. Heller, J. Schmidt, H. Niebojewski, P.-Y. Chou, E. Smith, E. Erben, C. Metze, C. Bao, Y. Andee, I. Aydin, S. Morvan, J. Bernard, E. Bourjot, T. Feudel, D. Harame, R. Nelluri, H.-J. Thees, L. M-Meskamp, J. Kluth, R. Mulfinger, M. Rashed, R. Taylor, C. Weintraub, J. Hoentschel, M. Vinet, J. Schaeffer and B. Rice, '22nm fdsoi technology for emerging mobile, internet-of-things, and rf applications,' in 2016 IEEE International Electron Devices Meeting (IEDM), 2016, pp. 2.2.1-2.2.4. DOI: 10.1109/IEDM. 2016. 7838029.
[27] E. Strandvik, 'Compensation of threshold voltage for process and temperature variations in 28 nm utbb fdsoi,' 2015. [Online]. Available: http : / / hdl. handle.net/11250/2371461.
[28] J. Chen, L. Clark and Y. Cao, 'Maximum - ultra-low voltage circuit design in the presence of variations,' IEEE Circuits and Devices Magazine, vol. 21, no. 6, pp. 12-20, 2006. DOI: 10.1109/MCD. 2005. 1578583.
[29] M. Mandal and B. C. Sarkar, 'Ring oscillators: Characteristics and applications,' Indian Journal of Pure and Applied Physics, vol. 48, pp. 136-145, Feb. 2010.
[30] P. Meinerzhagen, S. M. Y. Sherazi, A. Burg and J. N. Rodrigues, 'Benchmarking of standard-cell based memories in the sub-vt domain in $65-\mathrm{nm}$ cmos technology,' IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 1, no. 2, pp. 173-182, 2011. DOI: 10.1109/JETCAS. 2011. 2162159.
[31] A. Teman, L. Pergament, O. Cohen and A. Fish, 'A 250 mv 8 kb 40 nm ultralow power 9 t supply feedback sram (sf-sram),' IEEE Journal of Solid-State Circuits, vol. 46, no. 11, pp. 2713-2726, 2011. DOI: 10.1109/JSSC. 2011. 2164009.
[32] J. Chen, L. Clark and T.-H. Chen, 'An ultra-low-power memory with a subthreshold power supply voltage,' IEEE Journal of Solid-State Circuits, vol. 41, no. 10, pp. 2344-2353, 2006. Doi: 10.1109/JSSC.2006.881549.
[33] J. P. Kulkarni, K. Kim and K. Roy, 'A 160 mv robust schmitt trigger based subthreshold sram,' IEEE Journal of Solid-State Circuits, vol. 42, no. 10, pp. 2303-2313, 2007. DoI: 10.1109/JSSC.2007.897148.
[34] P. Drennan, M. L. Kniffin and D. R. Locascio, 'Implications of proximity effects for analog design,' in IEEE Custom Integrated Circuits Conference 2006, 2006, pp. 169-176. DOI: 10.1109/CICC. 2006.320869.

## Appendix A

## Additional Material

## A. 1 Copies from methods chapter



Figure A.1: ST and REG logic gate library


Figure A.2: A reduced critical path

## A. 2 logic gate library layout designs

## A.2.1 Layout design of REG logic gate library



Figure A.3: Layout design of REG-NOT logic gate, width and height of the cell is $1.246 \mu \mathrm{~m}$ and $2.918 \mu \mathrm{~m}$


Figure A.4: Layout design of REG-NAND logic gate, width and height of the cell is $2.252 \mu \mathrm{~m}$ and $4.634 \mu \mathrm{~m}$


Figure A.5: Layout design of REG-NOR logic gate, width and height of the cell is $2.252 \mu \mathrm{~m}$ and $5.898 \mu \mathrm{~m}$

## A.2.2 Layout design of ST logic gate library



Figure A.6: Layout design of ST-NOT logic gate, width and height of the cell is $2.988 \mu \mathrm{~m}$ and $2.618 \mu \mathrm{~m}$


Figure A.7: Layout design of ST-NAND logic gate, width and height of the cell is $4.822 \mu \mathrm{~m}$ and $2.928 \mu \mathrm{~m}$


Figure A.8: Layout design of ST-NOR logic gate, width and height of the cell is $4.82 \mu \mathrm{~m}$ and $3.288 \mu \mathrm{~m}$

## A. 3 SCM subcircuit location



Figure A.9: Figure showing approximate location of the WAD, the eight memory columns and the read address data flip-flops.

## A. 4 Ring oscillator circuit layout figures



Figure A.10: Layout design of REG-NOT based 7-stage ring-oscillator, width and height of the layout is $7.282 \mu \mathrm{~m}$ and $2.918 \mu \mathrm{~m}$


Figure A.11: Layout design of REG-NAND based 7-stage ring-oscillator, width and height of the layout is respectively $14.324 \mu \mathrm{~m}$ and $4.629 \mu \mathrm{~m}$


Figure A.12: Layout design of REG-NOR based 7-stage ring-oscillator, width and height of the layout is respectively $14.324 \mu \mathrm{~m}$ and $5.898 \mu \mathrm{~m}$


Figure A.13: Layout design of ST-NOT based 7-stage ring-oscillator, width and height of the layout is respectively $19.476 \mu \mathrm{~m}$ and $2.618 \mu \mathrm{~m}$


Figure A.14: Layout design of ST-NAND based 7-stage ring-oscillator, width and height of the layout is respectively $32.314 \mu \mathrm{~m}$ and $2.928 \mu \mathrm{~m}$


Figure A.15: Layout design of ST-NOR based 7-stage ring-oscillator, width and height of the layout is respectively $32.3 \mu \mathrm{~m}$ and $3.32 \mu \mathrm{~m}$

