Doctoral theses at NTNU, 2022:141

## Somayeh Hossein Zadeh

# Energy Efficient Subthreshold Digital Building Blocks

NTNU

NTNU Norwegian University of Science and Technology Thesis for the Degree of Philosophiae Doctor Faculty of Information Technology and Electrical Engineering Department of Electronic Systems



Norwegian University of Science and Technology

Somayeh Hossein Zadeh

# Energy Efficient Subthreshold Digital Building Blocks

Thesis for the Degree of Philosophiae Doctor

Trondheim, May 2022

Norwegian University of Science and Technology Faculty of Information Technology and Electrical Engineering Department of Electronic Systems



Norwegian University of Science and Technology

#### NTNU

Norwegian University of Science and Technology

Thesis for the Degree of Philosophiae Doctor

Faculty of Information Technology and Electrical Engineering Department of Electronic Systems

© Somayeh Hossein Zadeh

ISBN 978-82-326-6993-6 (printed ver.) ISBN 978-82-326-6316-3 (electronic ver.) ISSN 1503-8181 (printed ver.) ISSN 2703-8084 (online ver.)

Doctoral theses at NTNU, 2022:141

Printed by NTNU Grafisk senter

To my family, for their love, endless support, encouragement and sacrifices

# Abstract

Many IoT applications such as implantable biomedical devices, sensor nodes in the internet of things operate in the kHz range, and power consumption is the primary concern in such applications. However, the required voltage of the most implantable electronic devices is 2-3 V [47]. The output properties of the most recent in vivo energy harvesters (IVEHs) is 150 mV and below [47, 61] which could suit the low voltages for the subthreshold circuits, while saving energy by not having to use as energy costly DC-DC conversion as one would for higher supply voltages. Therefore, subthreshold circuits operating at the supply voltages lower than the absolute value of the threshold voltage of the transistors might be the best option for such applications. The power consumption is reduced as the circuit supply voltage is lowered down towards and below the threshold voltage of the transistors, but it will increase the propagation delays. It may not be a concern for low to medium performance. Voltage scaling in integrated circuits brings challenges for a designer that has to be considered during the design phase. The impact of the process, voltage, and temperature variations increases by voltage scaling and affects the functionality of the circuits.

This thesis focuses on designing and exploring energy efficient computing and memory circuits at ultra low voltage subthreshold regime at the different abstraction levels.

Techniques such as body biasing (reverse body bias), transistor stacking, device sizing, multi-threshold voltage devices at the gate level have been explored to reduce the power consumption especially static power, taking into account the reliability issue and process, voltage and temperate variations.

At the circuit level, different topologies of the full adders based on the standard CMOS designed for subthreshold supply voltages have been compared considering the functionality and reliability issues. In addition, an optimal back gate bias has been proposed in a commercially available 22 nm FDSOI (Fully Depleted Silicon On Insulator) technology that minimizes the energy per operation consumption of subthreshold digital CMOS circuits and improves the reliability. The adder as a case study under optimal body bias consumes 4.6 percent less energy than zero body bias at Vdd=150 mV and a frequency of 1 kHz.

At the architectural level, two different types of adders including Kogge

Stone adder (KSA), the fastest adder, and the Ripple Carry adder (RCA), the simplest adder have been designed and fabricated for supply voltages as low as Vdd = 140 mV. The adders have been synthesized at the gate level using full custom standard cell library designed for ultra low voltage subthreshold regime.

The gap between simulation and measurement results is filled with successful implementation and comparison of the ultra low subthreshold adders at such a low voltage 140 mV. To the best of the authors knowledge this is the first measurement comparison between two different adder architectures for ultra low supply voltages as low as 140 mV.

Simulated results in [7] indicated that the RCA is 1.36X energy efficient compared to the KSA at the same speed. Measured results presented here, show that the RCA is 4.15X to 1.92X energy efficient compared to the KSA at supply voltages between 250 to 500 mV. In addition, the RCA designed in this study outperforms the reported works in terms of a defined FoM which is  $(Tech)/(V_{min}.Energy_{min})$ .

Digital circuits designed for applications like sensor networks, implantable biomedical devices and environmental monitoring need to work at different conditions. For example, the temperature range that circuit should work. In this thesis, we have studied the performance of the circuits at different temperatures, supply voltage and in the presence of mismatch and process variations.

The multiple threshold voltage technique has been used to design a 7T loadless SRAM cell for subthreshold regime, and demonstrate the different trade offs for single, regular and flip well types SRAM memories. Among all devices used (HVT, RVT, LVT and, SLVT) available in a commercially available 22 nm FDSOI technology, the best combination for minimizing energy per access is HVT devices as the driver transistors and RVT for the rest of the transistors. The single well SRAM has the lowest leakage per bit cell over its regular and flip well counterparts. The regular well type has lower static noise margin (SNM) variability.

An 8-bit RCA has been designed by using multiple threshold voltage technique in 22 nm FDSOI technology. The simulation results based on the extracted netlist from layout show that the energy per one bit addition is lowest in our adder compared to the proposed adders in FDSOI technology. The energy per one bit addition for the proposed adder at Vdd = 300 mV is 0.23 fJ.

We have also used the dynamic body bias technique for the adder to balance the PUN/PDN (Pull up /Pull down networks). The results show that the adder with dynamic body bias is robust and functional at the supply voltage 60 mV lower that that of the adder with conventional body bias.

Additionally, a new standard cell memory based on the NAND race free D-latch has been synthesized and explored. The simulation results show that using robust NAND race free D-latch leads to lower minimum operating supply voltage, and hence, lower power and energy for standard cell memory.

This dissertation analyzes subthreshold digital circuits using 22 nm siliconon-insulator process and 130 nm bulk CMOS technology.

We also fabricated and tested the digital circuits in 130 nm technology and the measurement results are compatible with the simulation results.

# Acknowledgments

This thesis is submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (PhD) at the Norwegian University of Science and Technology (NTNU). The research has been conducted at the Department of Department of Electronic Systems from October 2017 to April 2021. My advisor has been professor Snorre Aunet, and co-advisor has been professor Trond Ytterdal.

I wish to express my sincere gratitude to professor Snorre Aunet and professor Trond Ytterdal for the continuous supports, for their patience, and guidance.

I would like to thank my colleagues in Department Department of Electronic Systems for comprehensive discussions and feedbacks.

Finally, and above all, my special gratitude goes to my family for unconditional love and support in my whole life. Words cannot express how grateful I am to them. I would not have been able to accomplish this work without their love and endless support.

Somayeh Hossein Zadeh Trondheim, April 2021

# Contents

| A                         | bstra              | bstract ii                                                   |      |  |  |  |  |
|---------------------------|--------------------|--------------------------------------------------------------|------|--|--|--|--|
| A                         | Acknowledgments vi |                                                              |      |  |  |  |  |
| $\mathbf{C}$              | Contents ix        |                                                              |      |  |  |  |  |
| $\mathbf{L}^{\mathrm{i}}$ | List of Figures xi |                                                              |      |  |  |  |  |
| $\mathbf{L}^{\mathrm{i}}$ | ist of             | f Tables                                                     | xiii |  |  |  |  |
| $\mathbf{L}^{\mathrm{i}}$ | ist of             | f Abbreviations                                              | xv   |  |  |  |  |
| 1                         | Int                | roduction                                                    | 1    |  |  |  |  |
|                           | 1.1                | Motivation and challenges at ultra low subthreshold regime . | 1    |  |  |  |  |
|                           | 1.2                | Power components                                             | 4    |  |  |  |  |
|                           | 1.3                | MOS operation region                                         | 5    |  |  |  |  |
|                           | 1.4                | Minimum energy point                                         | 6    |  |  |  |  |
|                           | 1.5                | Structure of the dissertation                                | 7    |  |  |  |  |
|                           | 1.6                | Summary of paper contributions                               | 8    |  |  |  |  |
| <b>2</b>                  | The                | esis Summary                                                 | 11   |  |  |  |  |
|                           | 2.1                | Design Method                                                | 12   |  |  |  |  |
|                           | 2.2                | Power reduction techniques used in this thesis               | 12   |  |  |  |  |
|                           |                    | 2.2.1 Multi-threshold CMOS                                   | 12   |  |  |  |  |
|                           |                    | 2.2.2 Transistor/Device sizing                               | 15   |  |  |  |  |
|                           |                    | 2.2.3 Transistor Stacking                                    | 16   |  |  |  |  |
|                           |                    | 2.2.4 Body biasing                                           | 18   |  |  |  |  |
|                           |                    | 2.2.5 Architecture optimization in computing circuits        | 19   |  |  |  |  |
|                           | 2.3                | Subthreshold adders                                          | 19   |  |  |  |  |
|                           |                    | 2.3.1 Different adder topologies                             | 19   |  |  |  |  |
|                           |                    | 2.3.2 Different adder architectures                          | 20   |  |  |  |  |
|                           |                    | 2.3.3 Functional yield                                       | 23   |  |  |  |  |
|                           | 2.4                | Memories                                                     | 23   |  |  |  |  |

|   |     | 2.4.1 SRAM macros                                                    | 24 |
|---|-----|----------------------------------------------------------------------|----|
|   |     | 2.4.2 Cell stability                                                 | 15 |
|   |     | 2.4.3 Standard cell based memory (SCM)                               | 27 |
|   |     | 2.4.4 Ultra low voltage latches and flip-flops for subthreshold      |    |
|   |     | regime                                                               | 27 |
| 3 | Cor | aclusion 3                                                           | 1  |
| 4 | Pul | blications 3                                                         | 3  |
|   | 4.1 | Paper I: Comparison of ultra low power full adder cells in 22        |    |
|   |     | nm FDSOI technology 3                                                | 5  |
|   | 4.2 | Paper II: Ultra-low voltage subthreshold binary adder architec-      |    |
|   |     | tures for IoT applications: Ripple carry adder or Kogge Stone        |    |
|   |     | adder                                                                | 1  |
|   | 4.3 | Paper III: Exploring optimal back bias voltages for ultra low        |    |
|   |     | voltage CMOS digital circuits in 22 nm FDSOI Technology . 4          | 9  |
|   | 4.4 | Paper IV: An ultra low voltage subthreshold standard cell based      |    |
|   |     | memories for IoT applications                                        | 6  |
|   | 4.5 | Paper V: Multi-threshold voltage and dynamic body biasing            |    |
|   |     | techniques for energy efficient ultra low voltage subthreshold       |    |
|   |     | adders 6                                                             | 52 |
|   | 4.6 | Paper VI: Comparative study of single, regular and flip well         |    |
|   |     | subthreshold SRAMs in 22 nm FDSOI technology 6                       | 9  |
|   | 4.7 | Paper VII: Subthreshold power PC and NAND race free flip-            |    |
|   |     | flops in frequency divider applications                              | 6  |
|   | 4.8 | Manuscript VIII: Energy efficiency of serial versus parallel adders. | 83 |
|   | 4.9 | Appendix:                                                            | 3  |

# List of Figures

| 1.1  | $I_{on}/I_{off}$ ratio for NMOS transistor with minimum length and                        |    |
|------|-------------------------------------------------------------------------------------------|----|
|      | width                                                                                     | 3  |
| 1.2  | $I_{on}/I_{off}$ ratio for PMOS transistor with minimum length and                        |    |
|      | width                                                                                     | 4  |
| 1.3  | The energy of the minority3 based 32-bit RCA versus the supply                            |    |
|      | voltage.                                                                                  | 7  |
| 2.1  | PCB, and socket for measuring purpose.                                                    | 13 |
| 2.2  | Chip photo.                                                                               | 14 |
| 2.3  | Measurement equipment                                                                     | 14 |
| 2.4  | $I_{on}/I_{off}$ for the minimum size NMOS transistor vs the channel                      |    |
|      | length, $Vds = 150 mV. \dots \dots$ | 16 |
| 2.5  | Threshold voltage of the NMOS transistor vs the channel length                            |    |
|      | and width, $Vds = 150 mV. \ldots \ldots \ldots \ldots \ldots \ldots$                      | 16 |
| 2.6  | Normalized delay and leakage current for inverters with 2 and 3                           |    |
|      | stacked transistors and single transistor, length = $190 \text{ nm}, W_{NMOS}$            |    |
|      | $= 300 \; \mathrm{nm},  W_{PMOS} = 1.8 \; \mathrm{um}.$                                   | 17 |
| 2.7  | The layout of the 32 bits KSA adder                                                       | 22 |
| 2.8  | The layout of the 32 bits minority3 based RCA adder                                       | 22 |
| 2.9  | The schematic of the 7T pull-up loadless SRAM                                             | 25 |
| 2.10 | The cross coupled inverters with noise sources for hold and read                          |    |
|      | SNM                                                                                       | 26 |
| 2.11 | Butterfly curves of the SRAM at various supply voltages at typical                        |    |
|      | temperature                                                                               | 26 |
| 2.12 | The schematic of the NAND race free flip-flop                                             | 29 |
| 2.13 | The schematic of the Power PC flip-flop                                                   | 29 |

List of Tables

# List of Abbreviations

| IoT   | Internet of Things                       |
|-------|------------------------------------------|
| CMOS  | Complementary Metal-Oxide-Semiconductor. |
| FDSOI | Fully Depleted Silicon on Insulator.     |
| NMOS  | N-type Metal-Oxide-Semiconductor.        |
| PMOS  | P-type Metal-Oxide-Semiconductor.        |
| MEP   | Minimum Energy Point                     |
| PVT   | Process, Voltage and Temperature         |
| RDF   | Random Dopant Fluctuation                |
| SNM   | Static Noise Margin                      |
| PUN   | Pull Up Network                          |
| PDN   | Pull Up Network                          |
| RCA   | Ripple Carry Adder                       |
| KSA   | Kogge Stone Adder                        |
| SCM   | Standard Cell Memory                     |
| SRAM  | Static Random Access Memory              |
| FBB   | Forward Body Bias                        |
| RBB   | Reverse Body Bias                        |
| PDP   | Power Delay Product                      |
| HVT   | High Threshold Voltage                   |
| LVT   | Low Threshold Voltage                    |
| RVT   | Regular Threshold Voltage                |
| SLVT  | Super Low Threshold Voltage              |
| UHVT  | Ultra High Threshold Voltage             |
|       |                                          |

# CHAPTER 1

#### 1.1 Motivation and challenges at ultra low subthreshold regime

Many Internet of Things (IoT) applications like wireless sensor networks, implantable and wearable biomedical sensors are energy constrained. Power reduction has become the fundamental requirement in such applications.

The power consumption has two main parts, the dynamic (switching) and the static (idle or standby leakage) power consumption. Both dynamic and static power have a relation with the supply voltage scaling. For leakage dominated circuits, the power consumption has an exponential relationship to the supply voltage. The dynamic power is proportional to the square of the supply voltage. Hence, voltage scaling is the most effective technique for power reduction. Therefore, ultra low power circuits translate to ultra low voltage subthreshold circuits. It introduces the circuits operating at supply voltages below the absolute value of the transistor's threshold voltage. Since 1960s, a lot of research works have been done in this area [50].

Ultra low voltage subthreshold circuits have been used in several applications. Here are some examples:

- 1. The first category is energy constrained applications with low to medium performance. Many battery operated portable devices like distributed sensor networks, implants and Radio Frequency Identification (RFID) belong to this category. The focus in this category is minimum energy point (MEP) where the energy is minimum for the circuits [68].
- 2. The second category is power constrained applications. These circuits typically have a long standby time and they are in sleep mode most of the time, hence, the static power consumption dominates the overall power consumption in such circuits. Therefore, working at ultra low supply voltages beyond the MEP reduces the power consumption. The wake up and surveillance circuits are part of this category [1].

- 3. The third category is represented by energy harvesting applications. It is impossible for such applications to use a fix source of energy like battery. This kind of applications are powered by energy harvesting systems. For example, the implantable glucose fuel cells have been developed as the power supply of cardiac pacemakers for the first time by the American hospital supply corporation [28]. In these applications the minimum supply voltage represents the point where the operation can begin. For example, thermoelectric generators generate a output voltage as low as 50 mV for body wearable applications [43].
- 4. The fourth category is for chip multi-processor applications. By technology scaling and increasing the number of cores in a single die, the percentage of the chip that is powered off (dark silicon) has increased [17]. This low efficiency of the dark silicon may stop higher core counts [17]. In [67], an energy efficient sub/nearthreshold chip multi processor with reducing loss of performance has been designed.

Working in the subthreshold regime has several challenges that have to be taken into account:

- 1. The first challenge for operating in the ultra low voltage subthreshold regime is a substantial increase at the circuit delay. This is not a concern for applications in the low to medium performance region. A typical sensor node in medical applications executes 2000 instructions every 10 minutes and goes back to sleep, which means three operations per second [46].
- 2. The second challenge is high sensitivity to process, voltage and temperature (PVT) variations. The subthreshold current is exponentially dependent on the transistor's threshold voltage.

Threshold voltage variations caused by Random Dopant Fluctuations (RDF) make a large variance in the behavior of subthreshold circuits and, affect the functionality of the circuits. [66] shows that in comparison with super-threshold regime where geometric and RDF affect the circuit variations equally, in subthreshold regime RDF is a dominant part for variations.

Both carrier mobility and threshold voltage are dependent to temperature. By increasing the operating temperature, both carrier mobility and threshold voltage are reduce. Unlike superthreshold regime where the mobility effect is dominant, in the subthreshold regime the threshold voltage effect is the prominent part [60]. Hence, by increasing the operating temperature in the subthreshold regime, the current increases. Hence, the circuit becomes faster at a higher temperature. At lower temperature, the drain current reduces a lot. Typically, the circuits have been designed to operate at a nominal supply voltage. The nominal supply voltage may change for reasons, such as, tolerances of the voltage regulators,  $I \times R$  drops along supply rails [23]. Therefore, the supply is determined at  $\pm 10$  % around the nominal value [23].

Gaussian statistical distributions has been used to model the process and, hence, the threshold voltage variations [23]. Based on the current equation in subthreshold regime, the drain current has a lognormal distribution [66].

3. The third challenge is the degradation of a transistor on to off current ratio and, hence, reduced static noise margins (SNM) and, therefore, the functionality of the circuits becomes worse at subthreshold supply voltages. More accurately, Fig. 1.1 and Fig. 1.2 show the  $I_{on}/I_{off}$  ratio for NMOS and PMOS transistors with minimum length and width as a function of Vds in 130 nm technology. For simulating  $I_{on}$  and  $I_{off}$ , Vgs is equal to Vds and 0, respectively. As can be seen, this ratio becomes problematically low at extremely low supply voltages.



Figure 1.1:  $I_{on}/I_{off}$  ratio for NMOS transistor with minimum length and width.

The circuits operated in the subthreshold regime must be optimized at various abstraction levels including transistor/device, gate, architecture and system level to obtain energy efficient solutions and, deal with the above challenges.



Figure 1.2:  $I_{on}/I_{off}$  ratio for PMOS transistor with minimum length and width.

#### **1.2** Power components

The power consumption has two main parts, the active (switching) and the static (idle leakage) power consumption. This part is dissipated when the circuit is not switching. The static power has been defined as the following equation:

$$P_{st} = V_{dd}.I_{off} \tag{1.1}$$

 $V_{dd}$  and  $I_{off}$  are the supply voltage and the leakage current, respectively. The dynamic power contains switching power and short circuit power which is the power due to the direct current path from the supply voltage to the ground. This current has a strong relation to the supply voltage [42]. We have ignored it in the equation because it is negligible in the subthreshold supply voltages. The switching power due to the charging and discharging of the load capacitances has been defined as the following equation:

$$P_{dy} = \alpha C_{tot} V_{dd}^2 F_{clk} \tag{1.2}$$

 $C_{tot}$ ,  $V_{dd}$ ,  $F_{clk}$ , are the total effective switched capacitance, the supply voltage, the switching frequency and activity factor, respectively. The activity factor is the probability of the total output load switching which is between zero to one. Its value is zero/one when no/all of gates switch in every clock cycle.

#### 1.3 MOS operation region

Three regions of channel inversion with respect to the gate-source voltage are defined for the MOS transistors. In the case of a positive voltage being applied to the gate to source voltage, a layer of charge between the drain and source gates is created, and provides the channel of the transistor, therefore the current can flow between drain and source. The regions of MOS operations based on the threshold and gate source voltage of the transistors are as follow:

- The weak inversion: this region is known as the deep subthreshold region where the  $V_{gs} \ll V_{th}$ .
- The moderate inversion region: This region is known as the nearthreshold region, where  $V_{gs} \approx V_{th}$ , Moderate inversion is approximately 100 mV more or below the  $V_{th}$  of the transistor.
- The strong inversion region: This region is known as the upper superthreshold region where  $V_{gs} \gg V_{th}$ . Strong inversion occurs when the channel is strongly inverted.

The device is in the subthreshold region when  $V_{gs}$  is below  $V_{th}$ , and the device is in the superthreshold region when  $V_{gs}$  is above the threshold voltage.

Subtreshold circuits have been used mostly for applications with low to medium performance requirement and low energy dissipation is key in such application. The first researches on subtreshold current and subtreshold operation have been done in 1960s and early 1970s [20, 31, 50]. An electronic wrist watch was the first most successful production in the subtreshold regime [50].

In subthreshold regime, the current flows by diffusion. By applying a small positive gate voltage, the electrons are only available at the surface, and the holes will be repelled from the surface. Because of the difference between the density of the electrons in drain and source, the diffusion current will flow between drain and source. Subthreshold regime at small voltages refers to the weak inversion regime. For the first time the exponential relationship between the current and the gate voltage has been shown in [48]. In [56], the first measurement of the transistor drain current in subthreshold regime has been shown by Eric Vittoz.

Expressed by the following well known simplified equation, NMOS transistor subthreshold current has an exponential relation with the gate-source and threshold voltage [40].

$$I_{ds} = 2n\mu C_{ox} U_T^2 W / L[e^{(V_{gs} - V_T)/nU_T}][1 - e^{(V_{ds}/U_T)}]$$
(1.3)

N is a subthreshold slope factor  $(1 + C_{dep}/C_{ox})$ .  $C_{ox}$  and  $C_{dep}$  are the gate oxide and depletion capacitance, respectively.  $V_T$ ,  $V_{gs}$ ,  $V_{ds}$  and  $U_T$  are the threshold voltage, gate source, drain source and thermal voltage, respectively. W/L is the width to length ratio of the transistor. This equation can also be applied for PMOS with opposite polarity. The threshold voltage depends on the source to bulk voltage. The following equation represents the transistor threshold voltage when the source bulk voltage is not zero.

$$V_{th} = V_{th0} + \gamma (\sqrt[2]{|-2\Phi_F + V_{SB}|} - \sqrt[2]{|2\Phi_F|}) - \eta V_{DS} - \Delta V_{th}$$
(1.4)

The  $V_{th0}$  is the threshold voltage of the transistor when the bulk spac-

ing connected to Gnd/Vdd for NMOS/PMOS transistors. The  $V_{SB}$  is the source bulk voltage.  $2\Phi_F$  and  $\gamma$  are the surface potential and body effect parameters, respectively. When source bulk voltage is positive/negative, it will increase/decrease the amount of charge requiring to invert the channel, and increase/decrease the threshold voltage of the transistor.  $\eta$  is the drain induced barrier lowering effect (DIBL) coefficient. The DIBL effect is reduced by reducing  $V_{DS}$ . The threshold voltage is dependent to the short channel effect  $\Delta V_{th}$ .

#### 1.4 Minimum energy point

Minimum energy point is the operating point where the circuits has the lowest energy per operation. In [58] and [59] it has been shown that the minimum energy point occurs in the subthreshold regime when the circuit works at the maximum operating frequency. One of the reasons that sub-threshold regime attracts high interest is the MEP.

The dynamic and static parts of the energy are dependent on the supply voltage. The dynamic energy has a quadratic relation with the supply voltage. The static part has a linear relation with the supply voltage. However, the clock period has a relation with the supply voltage and current:

$$T_{clk} \propto C.V/I$$
 (1.5)

Hence, by reducing the supply voltage to the subthreshold regime, the static energy will increase exponentially. The minimum energy point occurs due to the relationship between static and dynamic energy at the different supply voltages. At the superthreshold supply voltages the active energy dominates the static energy while this is vice versa at the subthreshold supply voltages. The minimum energy point occurs when the active and static energy have the same slope with opposite sign [60]. The minimum energy point is influenced by many different parameters such as: activity factor, supply voltage, threshold voltage, workload, duty cycle, temperature [59]. Fig. 1.3 shows the energy per operation for a 32-bit minority3 based RCA from the post layout simulations in 130 nm technology.



Figure 1.3: The energy of the minority3 based 32-bit RCA versus the supply voltage.

#### 1.5 Structure of the dissertation

This thesis is a collection of the papers dedicated to designing and exploring energy efficient digital circuits including memory and computing circuits at ultra low subthreshold supply voltages taking into account the PVT variations.

Exploring is based on the power reduction especially leakage power by exploiting the use of various techniques at different abstraction levels.

This thesis attempts to explore the main challenges which are PVT variations at ultra low supply voltages, and also find the best circuit structures and architectures that are very energy efficient in the subthreshold regime.

This dissertation has been organized as follows: A brief of the papers including different power reduction techniques used in this thesis, subthreshold adders and memories is presented in Chapter 2. Chapter 3 concludes the thesis. Chapter 4 is the collection of the published papers. The papers collection have been listed as follows:

• Paper I: Zadeh S.H. Ytterdal T. Aunet S. Comparison of ultra low power full adder cells in 22 nm FDSOI technology. In 2018 IEEE Nordic

Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC) 2018 Oct 30 (pp. 1-5). IEEE [63].

- Paper II: Zadeh S.H. Ytterdal T. Aunet S. Ultra low voltage subthreshold binary adder architectures for IoT applications: ripple carry adder or kogge stone adder. In 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC) 2019 Oct 29 (pp. 1-7). IEEE [65].
- Paper III: Zadeh S.H. Ytterdal T. Aunet S. Exploring optimal back bias voltages for ultra low voltage CMOS digital circuits in 22 nm FD-SOI technology. In 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC) 2019 Oct 29 (pp. 1-6). IEEE [64].
- Paper IV: Zadeh S.H. Ytterdal T. Aunet S. An ultra low voltage subthreshold standard cell based memories for IoT applications. In 2020 28th Iranian Conference on Electrical Engineering (ICEE) 2020 Aug 4 (pp. 1-5). IEEE.
- Paper V: Zadeh S.H. Ytterdal T. Aunet S. Multi-threshold voltage and dynamic body biasing techniques for energy efficient ultra low voltage subthreshold adders. In2020 IEEE Nordic Circuits and Systems Conference (NorCAS) 2020 Oct 27 (pp. 1-6). IEEE.
- Paper VI: Zadeh S.H. Ytterdal T. Aunet S. Comparative study of single, regular and flip Well subthreshold SRAMs in 22 nm FDSOI technology. In 2020 IEEE Nordic Circuits and Systems Conference (Nor-CAS) 2020 Oct 27 (pp. 1-6). IEEE.
- Paper VII: Zadeh S.H. Ytterdal T. Aunet S. Subthreshold power PC and NAND race free flip-flops in frequency divider applications. In 2021 IEEE Nordic Circuits and Systems Conference (NorCAS) 2021 Oct 27 (pp. 1-6). IEEE.
- Manuscript VIII: Zadeh S.H. Ytterdal T. Aunet S. Subthreshold energy efficiency of serial versus parallel adders. ready to be submitted for review for journal publication.

#### 1.6 Summary of paper contributions

In paper I, different ultra low voltage subthreshold full adder typologies have been designed and compared for supply voltages between 140-160 mV, for the temperature range between 27-50  $^{\circ}$ C and 1 kHz frequenc, which is appropriate for implantable biomedical applications.

For designing the adders in this paper, the width of the PMOS has been selected 2X that of the NMOS. The body bias of the NMOS transistors was changed to balance the Pull up/Pull down Network (PUN/PDN). The channel of the transistors has been increased to simultaneously improve on the leakage and the robustness against PVT variations.

In paper II, different adder architectures including 32, 16 and 8-bits Ripple Carry Adder (RCA) and Kogge Stone Adder (KSA) have been designed and compared at ultra low supply voltages.

The RCA adders are based on the minority3 and XOR gates. The RCA architecture along with minority3 based topology and slightly increasing the supply voltage allows to achieve a energy efficient adder for subthreshold operation as compared to the KSA adder.

The inverters which are robust gates in the minority3 based RCA have been stacked to explore the transistor stacking at the ultra low voltage operation. The stacked inverters reduce the leakage current while slightly increasing the delay which is not a concern for low frequency applications.

In paper III, the optimal back gate bias voltage for minimum energy digital circuits have been investigated. It has been shown that the optimal back gate bias voltage is dependent to various parameters like activity factor, workload etc. The proposed approach has been tested using a full adder as a case study.

In paper IV, the CMOS standard NAND race free D-latch has been selected as a robust storage cell to design new standard cell memory (SCM) for ultra low supply voltages as low as 170 mV. This supply voltage is the minimum supply voltage reported for designed and published SCMs.

In paper V, the multiple threshold voltage technique along with channel length upsizing and differential read buffer have been used to design 7T loadless SRAM cell for subthreshold regime, and demonstrate the different trade offs for single, regular and flip well types SRAM memories.

In paper VI, it has been shown that by using multi-threshold voltage technique the required Wp/Wn ratio for tuning PUN/PDN is reduced. Indeed, it results in less parasitic capacitances for transistors and, hence, less energy per operation for subthreshold circuits. In this paper, the dynamic body bias technique has been used to reduce the functional supply voltage for 8-bit RCA.

In manuscript VI, it has been shown that upsizing the Power PC flip-flop increases its reliability while it may still provide lower power consumption than the NAND race free flip-flop. Based on results verified by measurements on ten chip samples, two frequency dividers have demonstrated functionality down to a  $V_{dd}$  of 135 mV. The Power PC flip flop based frequency divider is 24 % more energy efficient than the NAND race free counterpart at an ultra low supply voltage of 160 mV.

In manuscript VIII, it has been shown through chip measurements that a 32 bit serial adder may be up to 4.1 times as energy effective as a parellel adder, while maintaining the same speed, under subtreshold operation. A 32-bit Ripple Carry Adder (RCA) based on minority-3 gates from the carry propagate family and a Kogge Stone Adder (KSA) based on Boolean gates from the parallel prefix family were designed, implemented and fabricated. It is intended for ultra low subtreshold supply voltages using 130 nm CMOS bulk technology. Based on measurement results from ten chip samples, the adders are functional for supply voltages as low as 140 mV. The measurement results show that the minimum energy point (MEP) for RCA and KSA adders are 250 mV and 300 mV, respectively. The energy per bit addition for these adders at the MEP is 4.90 fJ and 13.5 fJ, respectively.

# $_{\rm CHAPTER}2$

## Thesis Summary

In the papers collection of this thesis, the concept of voltage scaling of CMOS circuits has been explored in the implementation of low power and energy efficient logic libraries, memory elements and static random access memories. The addition is one of the fundamental and widespread arithmetic operations. Moreover, it is the basic building block for many other useful operations, such as subtraction, multiplication, etc. Hence, the design of energy efficient adders has been aimed at many digital circuit designers. Is there an energy efficient adder architecture that shows energy efficient superior low voltage behavior? what kind of architecture is the energy efficient adder for ultra low voltage regime?

Many Low voltage SRAM cells have been reported. However, The most common assist method is to use decoupled read and write port. In this method, the storage nodes are decoupled from the bitlines. Subthreshold SRAM cells with decoupled read and write ports either have a high number of transistors or they are single-ended. In general, differential SRAM cells are more robust over their single-ended counter-parts. Is there a compact differential SRAM cell functional for subthreshold and nearthreshold regions?

The drive strength of the pull up and pull down transistors differs significantly in subthreshold CMOS logic. Are there methods to balance pull up and pull down networks without upsizing of the pull up networks to reduce the power consumption?

We have designed and explored energy efficient digital circuits including memory and computing circuits at ultra low subthreshold supply voltages using static CMOS logic style as the fundamental topology in our subthreshold circuits due to the simplicity and robustness of this topology. The author in [69] has illustrated that if the low voltage, low power and low PDP are of a concern then CMOS logic is the best choice for the implementation of arbitrary combinational circuits (This is true for bulk CMOS technology and not FinFET technology). We have focused on the power reduction especially leakage power and functionality of the circuits by exploiting the use of various techniques at different abstraction levels taking into account the PVT variations.

#### 2.1 Design Method

This section gives information about the methods which were used in this thesis. Two design kits from Global Foundry and STM available through CMP have been used.

- 1. IC Global Foundry Microelectronics 22nm Advanced CMOS FDSOI technology.
- 2. IC STMicroelectronics 130nm BiCMOS SiGe 6 ML BiCMOS9MW2.

The design software tools used in this thesis are as follow:

- 1. Cadence Virtuoso, custom IC Design Environment for both schematic and layout.
- 2. Cadence Spectre, circuit simulator
- 3. Cadence Innovus, place and route.
- 4. Cadence Genus, logic synthesis.
- 5. Mentor Graphics Calibre, design rule check (DRC) and layout versus schematic (LVS).
- 6. Cadence Quantus, extraction netlist from the layout.

The measurement equipment used in the laboratory are as follow:

- 1. Rigol DP 832A digital power supply.
- 2. HP 6632A DC digital power supply.
- 3. Keithley 6485 Picoammeter.
- 4. Agilent 33522A Function generator.
- 5. Rohde Schwarz RTE 1022 Oscilloscope.

The PCB and QFN44 socket shown in Fig. 2.1 were used to measure the performance metrics of the ten chip samples. The chip photo and measurement equipment of the implemented test chip is shown in Fig. 2.2, and Fig. 2.3 respectively.

#### 2.2 Power reduction techniques used in this thesis

#### 2.2.1 Multi-threshold CMOS

This technique utilizes transistors with different threshold voltages to create a circuit with extremely low leakage. In a 22 nm FDSOI technology, different threshold voltage devices are available including high threshold voltage



Figure 2.1: PCB, and socket for measuring purpose.

(HVT), regular threshold voltage (RVT), low threshold voltage (LVT), super low threshold voltage (SLVT) and ultra high threshold voltage (UHVT). The leakage current of HVT devices is significantly lower than that of the LVT devices.

In the subthreshold region, the main design goal is to balance PMOS and NMOS transistors to have identical currents in the switching point [41]. When there is no adequate balance between PMOS/NMOS transistors, the dc behaviour of the circuit will be affected, and the SNM will be reduced. To balance the PMOS and NMOS transistors in a simple gate like an inverter, the size of PMOS transistor has to increase several times compared to that of the NMOS transistor. This PMOS width upsizing will bring large capacitances and large asymmetric layout.

To obtain an adequate balance between transistors without PMOS upsizing, the multiple threshold voltage may be used. This technique uses stronger type of transistors (with lower threshold voltage) for PMOS and the weaker type (higher threshold voltage) transistors for NMOS to balance the PUN/PDN without PMOS upsizing.

This method has been used at the gate level in paper VI for designing en-



Figure 2.2: Chip photo.



Figure 2.3: Measurement equipment.

ergy efficient full adders. The method has been used to tune the PUN/PDN without PMOS width upsizing. In the full adders which we have developed in paper VI, HVT NMOS transistors and minimum sized RVT PMOS transistors have been chosen. The width of the NMOS HVT transistors have been found by sweeping the input voltage such that the voltage transfer

characteristic has equal input and output at Vdd/2.

In addition to paper VI, this method has been used in paper V to design subthreshold SRAM cell.

#### 2.2.2 Transistor/Device sizing

In low throughput subthreshold circuits, the static energy dominates the total energy per operation. This will increase with technology scaling by high delay variability and lowering the subthreshold swing [10], this is true for bulk CMOS technology. In bulk CMOS when scaling, the leakage current increases significantly, because the gate lost electrostatic control of the channel subsequently. While FDSOI and FinFET achieve much better leakage results because the gate has much better control over the channel in these technologies [22]. In bulk CMOS when scaling, the process variation increases which leads to mismatched device behaviors and degrades the yield of the entire die. This is caused by Random Doping Fluctuations (RDF). In FinFETs on the other hand the channel is undoped or lightly doped, this reduces the statistical impact of RDF on the threshold voltage of the transistors. Overall, FinFETs have less variation compared to planar devices [22]. With scaling in FDSOI and FinFET technologies, the subthreshold slope value has improved [22].

Hence, In bulk CMOS technologies, it will reduce the energy efficiency. Mismatch variation is approximately proportional to the inverse of the square root of the transistor area.

Increasing the gate lengths of the subthreshold circuits improves the robustness and functional yield of the circuits. It also improves the subthreshold swing of the transistor. Subthreshold swing is defined as the amount of the gate source voltage to change the subthreshold current. Subthreshold swing should be smaller to obtain a higher  $I_{on}/I_{off}$  ratio. By increasing the larger channel length, the depletion capacitance decreases, and the subthreshold swing decreases [29].

Traditionally, to minimize energy, transistors should be sized as small as possible [33]. However, it has been shown that in the subthreshold regime, the channel length upsizing is more efficient than MTCMOS power gating, body biasing, Vth selection or device width upsize, and it increases robustness while simultaneously reducing static leakage energy [9].

Fig. 2.4 shows the  $I_{on}/I_{off}$  for the minimum sized NMOS transistor versus the channel length at Vdd = 150 mV using 130 nm technology. The slope between 130-190 nm is much steeper than that of the rest of the range. The length of 190 nm has been selected as a tradeoff for the cells.

Fig. 2.5 shows the thethreshold voltage of the NMOS transistor vs the channel length and width at Vds = 150 mV.



Figure 2.4:  $I_{on}/I_{off}$  for the minimum size NMOS transistor vs the channel length, Vds = 150 mV.



Figure 2.5: Threshold voltage of the NMOS transistor vs the channel length and width, Vds = 150 mV.

In paper IV, a standard cell based memory has been developed. This memory is based on the full custom standard cell library designed for ultra low supply voltages. The channel length upsizing has been used as a leakage reduction technique.

#### 2.2.3 Transistor Stacking

The process of stacking of transistors significantly reduces the leakage current through stacked off transistors compared to the single off transistor [37].
Subthreshold circuits with high fan-in and fan-out are prone to logic failure due to process variations [13]. In ultra low supply voltages, the stacked transistors in complex gates with high fan-in can increase the vulnerability of the circuits considering PVT variations. Stacking of the transistors in the selective gates with low fan-in like simple inverter can be used to reduce the leakage current. The process of stacking two off transistors will significantly reduce the subthreshold leakage current with reasonable penalty in the delay compared to the single transistor. The authors in [27] have found the optimal width ratio for stacked transistors. They have shown that it is beneficial to size the stacked transistors equally to optimize the current drivability.

To see the effectiveness of the stacked transistors in the subthreshold regime, the delay and leakage current of the two stacked and three stacked transistors have been compared with that of a single transistor in Fig. 2.6. As Fig. 2.6 shows in the subthreshold regime having two stacked transistors reduces leakage current by 4.6X.



Figure 2.6: Normalized delay and leakage current for inverters with 2 and 3 stacked transistors and single transistor, length = 190 nm,  $W_{NMOS} = 300$  nm,  $W_{PMOS} = 1.8$  um.

In Paper II, minority3 based 32-bit RCA with stacked inverters have been developed. The simulation results show that the static power consumption has been reduced 15% compared to the minority3 based RCA without stacked inverters. In the case of low throughput applications, using stacked inverters for adder will reduce leakage current and the total energy per cycle of the circuit. This technique causes a drop in the circuit speed, which might not be problematic for the low frequency systems.

## 2.2.4 Body biasing

Body biasing is one of the methods to balance the PUN/PDN to obtain the same drive current. It actually makes the transistors weaker or stronger by changing the threshold voltage of the transistors. By reverse body biasing, the threshold voltage increases and the leakage/delay of the circuit decreases/increases. On the other hand, the forward body biasing decreases the threshold voltage and increases/decreases the leakage/delay of the circuit.

In the 22 nm FDSOI technology, the available transistors are RVT, HVT (conventional well) or LVT, SLVT (Flip well). The conventional and flip well transistors are optimized for reverse and forward body bias, respectively. Different well type circuits (conventional, flip and single well) can be made by different transistors. The reverse body bias has been used in the literature to reduce the leakage current and, hence, the static power consumption.

In order to apply a reverse body bias voltage a negative/positive voltage must be applied to the conventional NMOS/PMOS transistors. Reverse body bias voltage in a flip well can be applied by a positive/negative voltage to the NMOS/PMOS transistors.

In paper I, for the design of the adders, the width of the PMOS has been selected 2X of that of the NMOS. The body bias of the NMOS transistors was changed to balance the Pull up/Pull down Network (PUN/PDN). Different full adder typologies have been designed and compared for supply voltages between 140-160 mV, for the temperature range between 27-50 °C which is appropriate for implantable biomedical applications.

In paper III, a technique has been proposed to determine the optimal body bias (reverse body bias) to minimize energy and improve the mismatch and process variations for extremely low supply voltages in 22 nm FDSOI technology. Using the optimal body bias found for a subthreshold adder as a case study gives 4.67% savings in energy compared to that of zero body bias. This technique also reduces the effects of the process variations, resulting in improved yield of the adder at Vdd = 150 mV by 0.4%.

The main components of leakage current in scaled nm bulk CMOS technology are subthreshold leakage, source/drain junction band-to-band tunneling (BTBT) leakage (reverse biased PN junctions from the drain/source to the well) and gate leakage. Reveres body bias will change each of these components [38]. Reverse body bias increases the threshold voltage, and hence, the leakage current will drop. Reverse bias increases the BTBT current [38]. Applying body bias does not have a significant effect on the gate leakage. This leakage current has been canceled by using high-k dielectric in FDSOI technology. Therefore, it is important to find the optimal body bias which reduce the total leakage current. In paper V, body biasing method has been used as an assistance technique for read and write operations to improve the strength ratio between access and latched transistors.

## 2.2.5 Architecture optimization in computing circuits

Arithmetic units, such as comparators, adders, and multipliers have been known as the heart of the data-path which belongs to the core of any microprocessor.

Binary addition is the most basic and widespread arithmetic operation. Moreover, it has been used for complex operations like multiplication and division. One of the most power hungry components in a processor is the adder which is often the possible location of hot-spots [19]. Therefore, a significant goal for many digital circuit designers is to design an energy efficient adder.

The topology and structure of the datapath circuits like adders affect the power consumption, and they present different performance metrics like speed, area, power consumption, and the complexity of wiring. The delay is influenced by the number of inversion levels, the number of transistors in series, transistor sizes (channel widths), and capacitances [69]. The circuit area is based on the number of transistors and their sizes and the complexity of wiring. Power consumption depends on the capacitances, activity factor and the the wiring complexity [69].

Hence, selecting the optimum arithmetic structures based on the application is a key issue for digital designers to reduce power. Therefore, architecture optimization in computing circuits means to use of architectures with less complexity, capacitances, switching activity.

In this thesis, adders have been designed and implemented in 130 nm bulk CMOS and 22 nm FDSOI technologies for developing of ultra low voltage subthreshold digital building blocks.

# 2.3 Subthreshold adders

## 2.3.1 Different adder topologies

Different topologies of the full adders influence all aspects of the circuit performance like the delay, area, power and the wiring of the circuits. The speed of the full adder is influenced by the number of the inversion levels in the circuits, the number of the series transistors and transistors area (the channel length and width). The power consumption is determined by the activity factor, the capacitance and the circuit size. Robustness is related to the supply voltage and technology scaling and temperature is another important issue that has to be considered in the choice of the adder topology.

Therefore, finding the proper choice of the full adder topology will save considerable power consumption for very large scale integrated (VLSI) circuits.

Dynamic gates have higher power consumption compared to static gates. The static gates are more robust against the voltage and technology scaling compared to the dynamic gates. Therefore, the dynamic gates are not a viable choice for our applications [16, 39], and all the full adders in our study are based on static CMOS logic gates. The input signals in CMOS logic gates are connected to the transistors gates which makes it a easy choice for characterizing the circuits. The complementarity of the CMOS gates makes their layout regular and straightforward [69].

One of the drawbacks for conventional CMOS is the large PMOS transistors which increase the area and power of the circuits [69]. We have used reverse body bias technique to reduce the leakage power and have the ratio of two for the PUN/PDN. Notice that in FinFET technologies, the pull-up network and the pull-down network are very symmetric. Hence, PMOS and NMOS devices with the same number of fins have similar driving strength. Therefore, the optimal ratio between the width of PMOS and NMOS transistors is one for FinFET logic[22].

## 2.3.2 Different adder architectures

Several considerations have to be taken into account for selecting the best architecture for the subthreshold regime, including energy and power efficiency. Another (third) consideration is the vulnerability to the PVT variations. By voltage scaling and working at subthreshold regime, the PVT variations are getting worse. Hence, selecting the right architecture to reduce these variations is an important issue for subthreshold circuit designers.

A comparison of full adders in the subthreshold regime has been performed by several studies [18, 21, 26], but there has been less attention given to the larger adders at the architectural level in this regime.

It was shown in [7] that serial adders operating in subthreshold could be equally fast as parallel adders, for an increase in the supply voltage, while using less energy per addition. The building blocks that we used for the design of RCA is more robust and less leaky compared to other alternatives, in [8]. The block used in [7], have little robustness towards process variations, and have high leakage power [8].

The most well-known and simplest adder with regular design and an easy implementation is the RCA from the carry propagate adders family. It has the lowest power consumption and area usage with a high delay due to the long carry propagation path from the least significant bit to the most significant one.

This adder adds two N-bit numbers,  $A_i$  and  $B_i$  and an optional carry-in (CIN) by carry-propagation.

The KSA was proposed by Kogge and Stone [30]. The KSA from the parallel prefix family design has the shortest delay (if fan-out is constrained to 2), but it consumes higher power and area compared to the RCA. It has minimal depth and maximum fan-out of two. In this adder, all of the outputs are computed separately and in parallel. Hence, it increases the wiring complexity and number of independent tree structures. The wires in the circuits play a significant part in the performance and become an important design consideration [6].

The KSA consists of three stages;

1. Input stage: the Pi(propagate) and Gi(generate) signals are computed by two one-bit inputs processing AND and XOR gates.

$$P_i = (A_i) XOR(B_i) \tag{2.1}$$

$$G_i = (A_i)AND(B_i) \tag{2.2}$$

2. Carry propagation network: in this stage, the carry signal from the previous bit lines evaluated by computing the  $P*_i$  and  $G*_i$ .

$$P_{i} = (P_{i})AND(P_{i-1}) \tag{2.3}$$

$$G_{*i} = (G_{i-1})AND(P_i)OR(G_i)$$
(2.4)

3. Output stage: a XOR operation has to be done for generate (G.) signal from the previous bit and the propagate signal of the current bit.

Paper II focuses on the design of 32, 16 and 8 bits KSA and RCA based on the full custom standard cell library designed for ultra low supply voltages as low as 140 mV using 130 nm CMOS bulk technology. The layout of the 32 bits KSA and minority3 based adders are shown in Fig. 2.7 and Fig. 2.8, respectively. Based on the comparison between different minority 3 gates in [4], we decided to use one with ten transistors as a robust one. Both adders have been designed with the same sizing strategy.

In general, based on the post layout simulations, the RCA is much more efficient compared to the KSA, while the KSA is faster than the RCA at the same supply voltage.

Post-layout simulation results confirm that with a marginal increase in the supply voltage of the RCA compared to that of the the KSA adder at



Figure 2.7: The layout of the 32 bits KSA adder



Figure 2.8: The layout of the 32 bits minority3 based RCA adder

the same speed, the power consumption and energy per operation, as well as the area of the RCA is far less than those for KSA. For example, when increasing the supply voltage of the 8 bit RCA by 44 mV compared to that of the KSA adder, the energy per operation for the KSA is about 3.5 times higher than that of the RCA at the same speed [65].

32 bit RCA and KSA adders were fabricated using 130 nm technology thus allowing us to perform measurements. Based on the measurement re-

sults both adders are fully functional for supply voltages as low as 130 mV. The adders were optimized for low supply voltages as low as 140 mV. However, the area of our adders designed in 130 nm was improved by 5.73X and 1.39X compared to that of the adder in [44] designed in 90 nm technology, respectively.

We have investigated different RCA topologies and considered how the minimum energy point varies with different topologies.

## 2.3.3 Functional yield

For the subtreshold circuits having a high functional yield is an important issue especially at the ultra low supply voltages because of the low  $I_{on}/I_{on}$ ratio and high current variability [33]. Indeed, high current variability may lead to erroneous output logic levels, and hence, functional failure. For all of the cells in our library, we have verified the functionality (the output logic levels) through 1000 Monte Carlo simulations for all possible combinations of inputs in the presence of both mismatch and process variations. For example for one bit full adder with three inputs, all of the possible input combinations are  $2^3$ .

This type of exhaustive test is applicable for circuits with a few inputs but quickly becomes impractical for circuits with many inputs. For example, a 32 bit adder would need  $2^{32}$  input vectors test.

The strategy that we have used for testing the functionality of the 32 bit RCA was to test each full adder independently [2]. For example, for testing bit N, we have changed the  $A_n$  and  $B_n$  of the test vector. In order to set the carry input of bit N, the correct choices have to be selected for  $A_{n-1}$  and  $B_{n-1}$  [2].

The functional yield of the digital gates in [33] has been extracted from the SNM of the gates in presence of the variations. When the SNM is positive, it means the circuit can work at that supply voltage. We have used the Negative Slope Criteria (NSC) technique for calculating the SNM high and low [25].

# 2.4 Memories

Two options for memories operating at the subthreshold regime are specifically designed SRAM macros and standard cell based memories based on flip-flops and latches.

## 2.4.1 SRAM macros

SRAMs are one of the the main critical part in almost all VLSI circuits. They have been used in different memory hierarchy like register files and L1-L3 cache memories. This is due to the fact that SRAMs have the highest speed performance among various embedded memory technologies [62]. A large portion of modern digital ICs is composed of SRAMs, and they often take the prominent parts of the power consumption [49, 52].

Since the SRAM blocks are often on the hold state, the static power is a substantial factor in total power consumption. Ultra low voltage SRAMs are advantageous for power saving especially for power constrained batteryoperated applications. Conventional 6T SRAM meets reliability and functional issues at supply voltages below 600 mV for modern nm technologies [11].

Various topologies and peripheral circuits have been used to tackle the problems of 6T SRAM cell in the subthreshold regime [14, 32, 51, 55]. These methods remove the trade-off between read and write operations by isolating the read port from the internal nodes. Nevertheless, they either have a single ended read port [55], [51], [34] or poor density [32], [14], [15] and [24] with 10, 11, 12 and 14 transistors per bit cell, respectively. The sense amplifiers can not be used in the single-ended read operation unless one of the inputs of the sense amplifier is connected to a stable and precisely selected reference voltage [55].

Besides this, newer technologies like FDSOI have emerged as an interesting option for planar bulk CMOS to tackle these problems and reduce the minimum supply voltages

In the subthreshold regime, the bitline swing of a SRAM column is small. Therefore, it is problematic for a single-ended read port SRAM to identify the right output value, especially at the worst-case corner. Hence, a single-ended cell needs to compensate for stability by adding extra peripheral circuits such as buffer-foot [35], [34]. This technique can improve the subthreshold leakage noise current from the bitline. Nonetheless, other leakage components degrade the bitline swing and hence the functional yield. Besides, extra circuits will increase the area overhead.

In general, a differential read cell is more robust for subthreshold operation compared to the single-ended read cell [12]. However, the main obstacle for 10T, 12T, and 14T differential SRAM cells is the area overhead ddue to extra transistors.

Paper V focuses on the design of different well type SRAM cells by using a 7T loadless SRAM cell as a case study. The 7T loadless SRAM cell solves the read and write reliability problem by using read buffer to decouple the read and write signal. Compared to the 6T loadless SRAM cell of [34] the read



Figure 2.9: The schematic of the 7T pull-up loadless SRAM.

and write signal are differential, thus allowing traditional sensing techniques for the bitline. In this SRAM the multi-threshold voltage technique and channel length upsizing have been used for transistor sizing. This method uses minimum sized transistors for PMOS and NMOS transistors to present a lower bitline capacitance.

Conventional 4T SRAM cell has high leakage power and, hence low stability during hold operation. Therefore, the multi threshold voltage technique has been used for 4T SRAM cell, which adopts low threshold voltage devices for access transistors and low leakage devices for driver transistors to achieve high hold noise margin greater than  $5\sigma$ .

To prevent from WRITE failures in presence of the PVT variations access transistors of the cell have to be stronger than the cross-coupled transistors. Therefore, we have selected the HVT transistor (regular well) with ultra low leakage as the driver elements and the RVT (regular well) transistors as the access transistors to fulfill the above constraints. HVT devices have been used in the latch to reduce the leakage current and RVT devices for access READ and WRITE transistors to keep the speed high.

For stable READ operation in the subthreshold regime, the differential READ buffer with only three transistors has been used to isolate the bitline from the internal storage nodes. By adding the differential READ buffer, the SRAM cell has 7 transistors. The schematic of the 7T pull-up loadless SRAM cell is shown in Fig. 2.9.

## 2.4.2 Cell stability

The READ, WRITE and HOLD margins of the cell determines the READ stability, WRITE and HOLD ability using SNM. The SNM estimates the noise that can be applied to the cell without losing stable state during READ



Figure 2.10: The cross coupled inverters with noise sources for hold and read SNM

and HOLD state or changing the state in WRITE operation. Fig. 2.10 shows the cross coupled inverters with noise sources for hold and read SNM.

To estimate SNM values, the procedure introduced in [45] is used that finds values for the diagonals of the maximum squares. In this method, the axis has been rotated 45 ° and the difference between the two curves has been plotted. This maximum absolute distance between the two curves multiplied by  $\cos 45^{\circ}$  is the SNM. If the calculated SNM is negative, the cell will not be able to retain data. Fig. 2.11 shows the butterfly curves of the SRAM at various supply voltages at typical temperature.



Figure 2.11: Butterfly curves of the SRAM at various supply voltages at typical temperature.

## 2.4.3 Standard cell based memory (SCM)

SCM are proposed as an interesting possibility to SRAM macros for memory with low size capacity. In SCMs, the storage cells are flip-flops and latches. These arrays can be easily synthesized, placed and routed. Compared to SRAM macros, SCMs do not require peripheral circuits such as sense amplifiers, buffer foot driver and devices for supply voltage gating [54]. Hence, for low frequency applications with small memory sizes, the SCMs can have lower area than SRAM macros [36].

A new SCM based on the single phase clock and robust NAND race-free D-latch for ultra low supply voltages was implemented in 130 nm CMOS technology. The SCM has been presented in paper IV. Comparing performance metrics of our SCM to the previous published SCMs, it shows better energy efficiency in the same technology.

# 2.4.4 Ultra low voltage latches and flip-flops for subthreshold regime

Flip-flops and latches are commonly used as sequential elements. Latches are transparent when the clock signal is high, while flip-flops are not transparent [42].

To have reliable and energy-efficient latches and flip-flops, they should be static which means their outputs are always connected to either the power supply or ground. Dynamic nodes are vulnerable to the PVT variations specially at the ultra low supply voltages.

The energy efficient and reliable latches and flip-flops must be contention free. At ultra low supply voltages, the slope of the clock signal is low. Therefore, this will be a challenge for latches and flip-flops with inverted clock signal. The overlap between clock and clock-bar signals might lead to contention in the nodes and hence, to functional failure [3].

The single-phase clock avoids toggling the internal clock inverters and reduces the corresponding power penalty. The latches and flip-flops with clockbar signal might neeed more power consumption compared to the latches without inverted signal [3]. This is due to the additional transitions on clockbar signal. Therefore, latches without inverted clock single are appropriate for low power and low voltage circuits.

The NAND race free D-latch is a single phase clock latch, and this simple latch consists of two stages of logic function and latching. When the clock is low, the first part will not change with input changing and so the output part also will not change. When the clock is high, the first part evaluates the input and sends it to the output node. The components of this latch are standard CMOS NAND an inverter gates. Unlike the dynamic registers, the static registers have feedback in their structure to hold the output data. In the dynamic registers, the data has to be refreshed after a period of time.

Static registers have been selected to compare delay and energy of two different static CMOS flip-flop, since dynamic registers are shown to be unreliable in subthreshold.

In paper VII, the ultra low voltage conventional NAND race free flip-flop and Power PC flip-flop have been implemented for ultra low supply voltages. These flip-flops are completely static. Hence, they show robust operations with voltage scaling. Both flip-flops have been sized to have high functional yield at supply voltages down to 140 mV. The schematic of both flip-flops are shown in Fig. 2.12 and Fig. 2.13.

The measurement results for the fabricated frequency dividers based on both NAND race free and Power PC flip-flops confirm the functionality of these circuits at supply voltages as low as 135 mV in 130 nm technology. Based on the simulation results, comparisons for the two flip-flops at the same supply voltage and at the low frequency of 1 kHz shows that the Power PC flip-flop has relatively lower power consumption compared to the NAND race free counterparts. The Power PC flip-flop at 1 kHz frequency is 1.34X power efficient compared to the NAND race free counterpart.

The area of our Power PC and race free flip-flops designed in 130 nm are  $12.2 \times 6.15 = 75.0 \ \mu m^2$  and  $13.8 \times 6.15 = 84.8 \ \mu m^2$ , respectively.

According to the mean measured energy per operation for ten samples, the frequency divider based on the Power PC flip-flop consumes 24 % less energy per operation compared to the frequency divider based on the NAND race free flip-flop at the ultra low supply voltage of 160 mV.

MEPs for both frequency dividers is at 250 mV. The energy per operation at the MEPs for NAND race free and Power PC frequency dividers is 12.5 fJ and 12.2 fJ, respectively. The mean energy for NAND race free and Power PC frequency dividers at MEP is improved by 1.99X and 2.02X compared to that at 500 mV supply voltage, respectively.

The minimum functional supply voltages for the frequency dividers reported in [53] and [5] are 132 mV, 137 mV and 160 mV, respectively. The minimum functional supply voltage in our work is 135 mV. The minimum supply voltage in this study is almost the same as that of the one designed in [53].

Our present results are compatible quite well with the result reported in [57], where the authors declare, that the power PC flip-flop is the most power efficient onee among five flip-flops including NAND race free in subthreshold regime at the same supply voltage.



Figure 2.12: The schematic of the NAND race free flip-flop.



Figure 2.13: The schematic of the Power PC flip-flop.

# CHAPTER 3

# Conclusion

This work focused on designing and exploring energy efficient subthreshold digital computing and memory circuits. The specific emphasize on power reduction techniques for ultra low voltage digital circuits has been considered. Chapter 1 explained the basic characteristics of subthreshold circuits and Chapter 2 summarizes the paper collection presenting the approaches used for power reduction and also the evolution of these approaches by designing different adder architectures and storage elements.

Is there an energy efficient adder architecture that shows energy efficient superior low voltage behavior? what kind of architecture is the energy efficient adder for ultra low voltage regime?

Is there a compact differential SRAM cell functional for subthreshold and nearthreshold regions?

The drive strength of the pull up and pull down transistors differs significantly in subthreshold bulk CMOS logic. Are there methods to balance pull up and pull down networks without upsizing of the pull up networks to reduce the power consumption?

The building blocks including logic cells, different topologies of the full adders, flip-flop, SRAM and standard cell memory have been developed in 130 nm bulk CMOS and 22 nm FDSOI technology.

In 130 nm CMOS bulk technology, a custom standard cell library for ultra low supply voltages as low as 140 mV and temperature range of 27-50 °C applicable for ultra low power implantable biomedical applications has been designed. The channel length upsizing has been used to reduce the leakage current and improve the robustness and energy efficiency of the cells. Two different adder architectures including KSA and RCA have been synthesized to find the most appropriate architecture for ultra low voltage applications.

The standard cell based memory using NAND race free latch has been designed for such applications using custom standard cell library designed in 130 nm CMOS bulk technology. Different adder topologies have been designed for ultra low supply voltages using body biasing to balance the circuits in 22 nm FDSOI technology. The optimal body bias for minimizing the energy and improving the robustness for low fan-in circuits have been considered. The effect of the different parameters like activity factor and workload have been considered when optimizing body bias.

In 22 nm FDSOI technology, we developed a SRAM memory cell with multiple threshold voltage transistors. Different types of the SRAM well including flip, regular and single well have been designed and compared to see the effect of the well type trade-offs for energy, robustness, leakage power and speed.

We have also developed an energy efficient full adder by using multiple threshold voltage technique in 22 nm FDSOI technology.

To the best of our knowledge: 1) The implemented and fabricated 32 bits RCA and KSA adders in 130 nm technology are functional for supply voltages as low as 130 mV. The results are compatible with the simulation results and the theory. 2) The RCA adder designed in 22 nm FDSOI technology based on minority-3 logic gates and conventional body bias in Paper V has the lowest reported energy per bit per addition. 3) based on the measurement results for frequency dividers in 130 nm technology, the minimum operating voltage of 135 mV was achieved. The minimum functional supply voltage of 170 mV in Paper IV is achieved for the standard cell based memory based on latch gate.

# $_{\rm CHAPTER}4$

# Publications

The main contributions of this thesis is presented in the following papers:

- Paper I Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2018. Comparison of ultra low power full adder cells in 22 nm FDSOI technology, IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-5.
- Paper II

Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2019. Ultra-low voltage subthreshold binary adder architectures for IoT applications: Ripple carry adder Kogge Stone adder, IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-7.

- Paper III Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2019. Exploring optimal back bias voltages for ultra low voltage CMOS digital circuits in 22 nm FDSOI technology, IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-6.
- Paper IV Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2020. An ultra low voltage subthreshold standard cell based memories for IoT applications, 28th Iranian Conference on Electrical Engineering (ICEE), *IEEE*, pp, 1-5.
- Paper V Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2020. Multi-threshold voltage and dynamic body biasing techniques for energy efficient ultra low voltage subthreshold adders, IEEE Nordic Circuits and Systems Conference (NorCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-6.
- Paper VI Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2020. Comparative study of single, regular and flip well sub-

threshold SRAMs in 22 nm FDSOI technology, IEEE Nordic Circuits and Systems Conference (NorCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-6.

• Paper VII

Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2021. Subthreshold power PC and NAND race free flip-flops in frequency divider applications, IEEE Nordic Circuits and Systems Conference (NorCAS): NORCHIP and International Symposium of System-on-Chip (SoC), *IEEE*, pp, 1-6.

• Manuscript VIII Hossein Zadeh, Somayeh, Ytterdal, Trond and Aunet, Snorre, 2021. Subthreshold energy efficiency of serial versus parallel adders, ready to be submitted for review for journal publication.

# 4.1 Paper I: Comparison of ultra low power full adder cells in 22 nm FDSOI technology

# Comparison of Ultra Low Power Full Adder Cells in 22 nm FDSOI Technology

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering

Norwegian University of Science and Technology (NTNU)

O.S. Bragstads plass 2a, Trondheim, 7491, Norway

 $somayeh.h.zadeh@ntnu.no,\ trond.ytterdal@ntnu.no,\ snorre.aunet.ntnu.no$ 

Abstract—Five ultra low voltage and low power full adders have been designed and analyzed with CMOS logic structure. To compare these adders, different metrics including worst case delay, average power, PDP, and PDP\*Leakage have been investigated in the supply voltage varying from 140-160 mV. All the full adders have been designed and verified with Cadence Virtuoso design in a commercially available 22 nm FDSOI technology. An extended body bias voltages introduced in a 22 nm FDSOI technology have been used to balance Pull Up/Pull Down Networks and have a high functional yield. The test bench has been used to verify the functionality of full adders automatically in different conditions of temperature and supply voltage. The simulation results show that an Xor based adder is the best of all having the lowest delay, power, PDP, and PDP\*Leakage in different conditions.

Index Terms-ultra low voltage, low power, 22 nm FDSOI technology, extended body bias, PDP, PDP\*Leakage.

#### I. INTRODUCTION

In Internet of Things (IOT) applications, the design of implantable medical devices such as pacemakers, that could save a patient's life in emergency situations, is very critical [1], [2], [3]. Power consumption is a key issue in such applications which have long stand-by time. Using minimum possible supply voltage where it is below the absolute value of MOS threshold voltage makes circuits reduce power consumption. Operating at the subthreshold regime has been investigated since the sixties [4]. Considering the exponential relationship between current, temperature, threshold and supply voltage is a key concern in order to investigate the functionality of the circuits in different conditions in this regime. Fully Depleted Silicon on Insulator (FDSOI) technology has emerged to tackle the problems of ultra low voltage design. In this technology, the efficiency of body biasing technique has been increased by controlling the channel. Body bias technique in FDSOI and CMOS technology has been used in many works to reduce supply voltage and hence circuit power consumption [5], [6], [7]. In this paper, an extended body bias technique has been used in a commercially available 22 nm FDSOI technology. This method has been applied to design five full adders in supply voltages below the absolute value of MOS threshold voltage and is applicable for implantable medical IOT applications. Full adders have been designed at temperature

978-1-5386-7656-1/18/\$31.00 ©2018 IEEE

range 27-50°C, which is appropriate for implantable medical applications and with ultra low supply voltages varying from 140-160 mV to reduce the power consumption. Simulation for all adders has been done at 1 kHz frequency which is relevant for many IOT applications and max operating frequency. To achieve this goal, different aspects of digital circuits design in the subthreshold regime have been considered. Then, a new test bench has been suggested for the functionality of such circuits. Different performance metrics of five full adders have been simulated and the results have been compared. It is concluded that Xor based adder is the best option in this supply voltage range which yields the lowest delay, power, PDP, and PDP\*Leakage.

# II. DESIGN CONSIDERATION IN THE SUBTHRESHOLD REGIME

Expressed by the following simplified equation, NMOS transistor subthreshold current has an exponential relation with the gate-source and threshold voltage [8].

$$I_{ds} = I_0 \cdot (e^{(kV_{gs}/V_T)} e^{((1-k)V_{bs}/V_T)}) (1 - e^{-V_{ds}/V_T} + V_{ds}/V_0)$$
(1)

Where  $I_0$  is a constant related to the channel width and length of the MOS transistor.  $V_T$  and  $V_0$ , are the thermal and the Early voltage, respectively. k is approximately 0.7-0.75 which is related to subthreshold slope factor  $(1 + C_{dep}/C_{ox})$ . This equation can also be applied for PMOS with opposite polarity. Static power consumption is a dominant source for the total energy of the low frequency system. To reduce leakage and improve performance in CMOS logic gates, balancing of PUN/PDN<sup>1</sup> is a strong knob. The strength of the transistors can be tuned by using both body biasing and aspect ratio as well as device type [9] which will be discussed in more details in this section. Based on equation (1), digital circuits in the subthreshold regime are more sensitive to PVT (process, voltage, and temperature) variations than those of superthreshold. Threshold voltage variations caused by RDF (random dopant fluctuation) increase process variation which is proportional to the inverse of the square root of the transistor area.

<sup>1</sup>Pull Up/Pull Down Network

#### A. Design Strategies

According to the above information, subthreshold circuit designers should avoid taking minimum length and width for the transistors in order to decrease the variability. High width PMOS transistor causes larger capacitance in the circuit and hence more area and power consumption. Choosing the ratio of two for  $W_{PMOS}/W_{NMOS}$  is suitable to improve mismatch variation because of having a regular layout. Using HVT devices is recommended for reducing the leakage current which is the first priority in such an application. Body biasing is one of the techniques used for tuning the PUN/PDN. This method manipulates threshold voltage by change back gate voltage. The goal is to find a body bias voltage for both NMOS and PMOS transistors which results in a reasonable functional yield in full adders.

#### B. Full Adders Circuits Design

Schematics for five different full adders are shown in Fig.1. It has been proved that in the subthreshold regime and especially at the ultra low supply voltages,  $I_{on}/I_{off}$  is lower than that of the superthreshold regime. Gates having a maximum fan-in of 2-3 should be used to avoid robustness problems occurred in circuits and improve the functional yield [9]. Therefore, in this study, Minority-3 based, Nand based, Xor based and Nand-Nor based full adders have been selected for simulation [10]. The results then have been compared to the 28 transistors standard adder [10].

All gates have been designed and verified using Cadence Virtuoso design in a commercially available 22 nm FDSOI technology. They have been designed with HVT transistors optimized for reverse back biasing in order to reduce the leakage current. As shown in Fig.2, either subthreshold current or leakage current is highly affected by the sizing of transistor length, L.

Leakage current variation due to the change of transistor length is very high between 20-28 nm in comparison with 28-36 nm. Therefore the length of 28 nm is used for both reducing leakage and improving threshold voltage variation. The width for all NMOS and PMOS transistors is 200 nm and 400 nm, respectively, except for PMOS in Minority-3 which is 600 nm. The goal is to select the best body bias that has less leakage current and variability. To do so, leakage current variation of inverter designed with HVT transistors versus different back bias voltages has been simulated and shown in Fig.3. Leakage current variation of the transistor due to the changing of back bias voltage of both PMOS and NMOS transistors is very high between 0-300 mV in comparison with 300 mV to 2 V. Increasing reverse back bias voltage increase the variability. In order to decrease both leakage current and variability, selected body bias voltages have been tabulated in TABLE I.

#### III. RESULTS

#### A. Test Bench

Digital circuits are affected by so much variation in the subthreshold regime that they may not be functional in the worst case condition. Therefore using a systematic way is



(e) 28 transistors Standard adder Fig. 1: Five different adders

essential to evaluate the functionality of the circuits in the all different conditions. The suggested automatic test bench is shown in Fig. 4.

An ideal 3-bit ADC has been used to produce different inputs of one bit full adder in DC simulation. In result capture block, the outputs of the adder have been compared with



Fig. 2: Subthreshold On and Leakage currents versus length for NMOS transistor in different supply voltages,  $V_{ds}$  =140, 150, 160 mV, W=200 nm

TABLE I: Back Bias Voltages for PMOS and NMOS transistors in different gates.

| GATES                                | VBBN <sup>a</sup> | VBBPb    |
|--------------------------------------|-------------------|----------|
| Minority-3 and Nor                   | -300 mV           | 0        |
| Nand and Inverter and Xor            | 0                 | $V_{dd}$ |
| Transistors in Standard adder        | 0                 | $V_{dd}$ |
| <sup>a</sup> NMOS Back Bias Voltage. |                   |          |

<sup>b</sup>PMOS Back Bias Voltage.

both maximum acceptable low voltage  $(V_{OL})$  and minimum acceptable high voltage  $(V_{OH})$  which are equal to  $0.25^*$ vdd and  $0.75^*$ vdd of the full adder, respectively. To see the effect of process and mismatch variation, the output of result capture goes to the ADE Assembler in order to perform sufficient number of 1 k Monte Carlo simulation [11], [12] and obtain the functional yield of the full adder automatically. The driving power has been considered by using output FO4 load inverters [13].

#### B. Simulations

Different circuit metrics calculated from simulations for the full adders have been compared together. Full adders have been simulated at 1 kHz frequency which is the case for many IOT applications, at temperature range 27-50°C, which is appropriate for implantable medical applications with



Fig. 3: Leakage current of inverter versus Back Bias Voltages,  $W_{PMOS}/W_{NMOS}$ =400 nm/200 nm, L=28 nm



Fig. 4: Test Bench for functional yield simulation.

the supply voltage varying from 140-160 mV to reduce the power consumption. The suggested test bench result showed that Monte Carlo simulation for five full adders in different conditions has not been failed for all 1 k iterations. The plots for inputs and outputs of Xor full adder at a supply voltage of 150 mV has been shown as an example in Fig. 5.

Since the changing of an input transition may not necessarily alter the output results, for accurate power measurement, all different input transitions should be considered. Fig. 5 shows the different input transitions used for estimating the average power consumption of the full adders [13]. Fig. 6 demonstrates the test bench used to find the critical path resulting in the worst case delay. Verilog-AMS has been used to simulate the test bench. The same load as the test bench for functional yield has been used for full adders to create more realistic output capacitance which affects the delay of the circuit. All combination of transitions for three different inputs has been



Fig. 5: Inputs and Outputs of the Xor based full adder at a supply voltage of 150 mV and 1 kHz frequency.

considered in the input generator. In result capture block, the rise and fall time of different transitions have been calculated. Circuit metrics including average power, worst case delay, and



Fig. 6: Test Bench for worst case delay.

leakage current for a range of supply voltages between 140-160 mV at 1 kHz frequency are summarized in Tables II, III, IV, V and VII. Energy per operation at this frequency can be calculated with P = E/T. Since each addition is done in half of the period, T in this equation is 500 us for 1 kHz frequency. In order to have a comparison for the area of adders,  $W_N$ =200 nm has been assumed as one unit, the considered area for Xor, Nand based adder is 54 units. This metric for Nand-Nor and Minority-3 based adders and the standard adder is 48, 66 and 42 units, respectively.

 TABLE II: Nand based adder metrics at 27°C and 1 kHz.

 Vdd(mV)
 Power(pW)
 Delay(us)
 Leakage(pA)

|   | (uu(iii ) | rower(pw) | Denay(us) | Leakage(pri) |
|---|-----------|-----------|-----------|--------------|
| Ì | 140       | 7.20      | 7.99      | 46.3         |
| Ì | 150       | 7.91      | 6.34      | 47.4         |
| Ì | 160       | 8.74      | 5.03      | 48.6         |

TABLE III: Minority-3 based adder metrics at  $27^\circ C$  and 1 kHz.

| Vdd(mV) | Power(pW) | Delay(us) | Leakage(pA) |
|---------|-----------|-----------|-------------|
| 140     | 3.52      | 14.0      | 21.0        |
| 150     | 3.94      | 11.1      | 21.9        |
| 160     | 4.37      | 8.73      | 22.8        |

Energy per operation (which is PDP at max operating frequency) for all full adders at the maximum operating

TABLE IV: Xor based adder metrics at 27°C and 1 kHz.

| Vdd(mV) | Power(pW) | Delay(us) | Leakage(pA) |
|---------|-----------|-----------|-------------|
| 140     | 2.24      | 5.84      | 12.7        |
| 150     | 2.46      | 4 68      | 13.1        |

2.60

160

TABLE V: Nand-Nor based adder metrics at 27°C and 1 kHz.

3 73

13 5

| Vdd(mV) | Power(pW) | Delay(us) | Leakage(pA) |
|---------|-----------|-----------|-------------|
| 140     | 6.24      | 7.34      | 41.4        |
| 150     | 6.90      | 5.81      | 42.8        |
| 160     | 7.60      | 4.59      | 44.3        |

TABLE VI: Standard adder metrics at 27°C and 1 kHz.

|     | Vdd(mV) | Power(pW) | Delay(us) | Leakage(pA) |
|-----|---------|-----------|-----------|-------------|
| Ì   | 140     | 3.95      | 8.60      | 22.8        |
| Ì   | 150     | 4.357     | 6.9       | 23.4        |
| Ì   | 160     | 4.74      | 5.54      | 23.9        |
| - 1 |         |           |           |             |

frequency of 150 mV supply voltage have been listed in TABLE VII.

#### IV. DISCUSSION

According to the simulation results for 1 kHz frequency shown in Tables II, III, IV, V and VII, the Xor based full adder achieved the lowest power consumption, leakage and delay. The second one in terms of consumption of power and leakage is the Minority-3 based adder. TABLE VII shows four metric indicators at the max operating frequency for each adder. Xor based adder is the best of all because not only consumes the least power consumption but also is the fastest among five adders and has the least energy per operation. At 150 mV supply voltage, the Nand based adder consumes 3.22 times as much as the Xor based adder consumes. Among the five adders, Minority based adder is the slowest. The delay of the Minority-3 based adder is more than 2X of that for the Xor based adder. Since all adders are designed with HVT devices and they have reverse back bias voltages for NMOS or PMOS devices, huge delays have been observed. In spite of the huge delays, these adders are more desirable for low frequency applications. The static power consumption is a dominant part of the total energy of low frequency systems. As listed in Tables II, III, IV and V, the leakage current of the full adder based on Nand is the largest and it is 1.11, 2.16, 2.03 and 3.62 times as much as that of the full adders based on Nand-Nor, Minority, standard and Xor, respectively. To see

TABLE VII: Energy per operation of five full adders at maximum operating frequency and Vdd= 150 mV.

| Туре                      | Power<br>(mW) | Fmax<br>(kHz) | Energy Per<br>Operation<br>(aJ) | Energy Per<br>Operation<br>*Leakage*10 <sup>-29</sup> |
|---------------------------|---------------|---------------|---------------------------------|-------------------------------------------------------|
| Minority-3<br>Based       | 8.87          | 25.0          | 177                             | 388                                                   |
| Nand based                | 28.4          | 83.3          | 170                             | 809                                                   |
| Xor Based                 | 6.15          | 58.9          | 52.2                            | 68.4                                                  |
| Nand-Nor<br>Based         | 24.9          | 83.3          | 149                             | 638                                                   |
| 28 Standard<br>full adder | 13.2          | 62.5          | 106                             | 247                                                   |

TABLE VIII: Energy per operation and leakage for 1-bit full adders proposed in [6] and [14].

| Reference | Energy Per Oper-<br>ation | Leakage | Vdd    | Device  |
|-----------|---------------------------|---------|--------|---------|
| [6]       | 0.65 fJ                   | 46.4 pA | 300 mV | RVT     |
| [14]      | 6.48 fJ                   | 739 pA  | 300 mV | RVT/LVT |

the effect of both PDP and leakage in different full adders, the PDP\*Leakage metric has been invented and calculated in TABLE VII. As we can see, the amount of this indicator for the Xor based adder is much lower than others. TABLE VIII shows the amount of energy per addition and the leakage current for [6] and [14] in 28 nm FDSOI technology; these parameters have been compared with the result of this study. It has been done to see the effect of 22 nm versus 28 nm FDSOI technology on ultra low voltage design problems. Energy per operation and leakage current obtained for all full adders in this study are much lower than [6] and [14]. The amount of energy per addition for [6] is more than 12X of that for the Xor based adder in this study. Leakage is a key issue in ultra low voltage design. Since all full adders have been designed with HVT devices with reverse back bias voltage, leakage current for the five full adders in this study is much lower than the amount showed in TABLE VIII. The leakage current of the 1-bit adder in [6] is more than 3X that of the Xor based adder in this study.

#### V. CONCLUSION

Five reliable ultra low voltage full adders have been designed in a commercially available 22 nm FDSOI technology. To have a high functional yield, extended body bias voltages introduced in this technology have been used. All adders have been designed with HVT devices optimized for reverse back biasing. It is done to reduce the leakage of the full adders. They are functional at temperature range 27-50°C, which fits for implantable medical applications and the supply voltage varying from 140-160 mV. It is concluded that Xor based adder is the best option in this supply voltage range which vields the lowest delay, power, PDP, and PDP\*Leakage.

#### REFERENCES

- I. Akyildiz, M. Pierobon, S. Balasubramaniam, and Y. Koucheryavy, "The internet of bio-nano things," *IEEE Communications Magazine*, vol. 53, no. 3, pp. 32–40, 2015.
   S. R. Sridhara, "Ultra-low power microcontrollers for portable, wearable,
- [2] S. R. Sridhara, "Ultra-low power microcontrollers for portable, wearable, and implantable medical electronics," in *Design Automation Conference* (ASP-DAC), 2011 16th Asia and South Pacific, pp. 556–560, IEEE, 2011.
- [3] O. Vermesan, P. Friess, P. Guillemin, S. Gusmeroli, H. Sundmacker, A. Bassi, I. S. Jubert, M. Mazura, M. Harrison, M. Eisenhauer, et al., "Internet of things strategic research roadmap," *Internet of Things-Global Technological and Societal Trends*, vol. 1, no. 2011, pp. 9–52, 2011.
- [4] Y. Tsividis, "Eric vittoz and the strong impact of weak inversion circuits," *IEEE Solid-State Circuits Society Newsletter*, vol. 13, no. 3, pp. 56–58, 2008.
- [5] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage logic circuits exploiting gate level dynamic body biasing in 28 nm utbb fd-soi," *Solid-State Electronics*, vol. 117, pp. 185–192, 2016.
- [6] A. A. Vatanjou, E. Låte, T. Ytterdal, and S. Aunet, "Ultra-low voltage and energy efficient adders in 28 nm fdsoi exploring poly-biasing for device sizing," *Microprocessors and Microsystems*, vol. 56, pp. 92–100, 2018.

- [7] M. Miyazaki, J. Kao, and A. P. Chandrakasan, "A 175 mv multiplyaccumulate unit using an adaptive supply voltage and body bias (asb) architecture," in *Solid-State Circuits Conference*, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International, vol. 1, pp. 58–444, IEEE, 2002.
- [8] A. G. Andreou, K. A. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, and K. Strohbehn, "Current-mode subthreshold mos circuits for analog vlsi neural systems," *IEEE Transactions on neural networks*, vol. 2, no. 2, pp. 205–213, 1991.
- [9] M. Alioto, "Ultra-low power vlsi circuit design demystified and explained: A tutorial," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 59, no. 1, pp. 3–29, 2012.
- N. H. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
   M. Lanuzza, R. Taco, and D. Albano, "Dynamic gate-level body biasing
- [11] M. Lanuzza, R. Taco, and D. Albano, "Dynamic gate-level body biasing for subthreshold digital design," in *Circuits and Systems (LASCAS), 2014 IEEE 5th Latin American Symposium on*, pp. 1–4, IEEE, 2014.
- [12] A. Morgenshtein, V. Yuzhaninov, A. Kovshilovsky, and A. Fish, "Fullswing gate diffusion input logiccase-study of low-power cla adder design," *INTEGRATION, the VLSI journal*, vol. 47, no. 1, pp. 62–70, 2014.
- [13] S. Goel, A. Kumar, and M. A. Bayoumi, "Design of robust, energyefficient full adders for deep-submicrometer design using hybrid-emos logic style," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 12, pp. 1309–1321, 2006.
- [14] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Extended exploration of low granularity back biasing control in 28nm utbb fd-soi technology," in *Circuits and Systems (ISCAS), 2016 IEEE International Symposium* on, pp. 41–44, IEEE, 2016.

4.2 Paper II: Ultra-low voltage subthreshold binary adder architectures for IoT applications: Ripple carry adder or Kogge Stone adder

# Ultra-Low Voltage Subthreshold Binary Adder Architectures for IoT Applications: Ripple Carry Adder or Kogge Stone Adder

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology (NTNU) O.S. Bragstads plass 2a, Trondheim, 7491, Norway somayeh.h.zadeh@ntnu.no, trond.ytterdal@ntnu.no, snorre.aunet.ntnu.no

Abstract—In this study, 8, 16 and 32 bits ultra-low power, robust, Kogge Stone (KSA) and Ripple Carry (RCA) adders using a commercially available 130 nm bulk CMOS technology have been designed and analyzed at subthreshold supply voltages ranging from 140-160 mV and temperature range of 27-50 °C at 5 kHz frequency for implantable biomedical devices. Simulation results based on netlists extracted from layout confirm that with a marginal increase in the supply voltage of the RCA compared to that of the KSA adder at the same speed, the power consumption and energy per operation, as well as the area of the RCA is far less than KSA. For example, when increasing the supply voltage of the 8 bit RCA by 44 mV compared to that of the KSA adder, the energy per operation for the KSA is about 3.5 times higher than that of the RCA. We have investigated different RCA topologies and considered the minimum energy point varies with different topologies. In addition, in the case of low throughput applications, using the stacked inverters for the full adder will reduce the leakage current and the total energy per cycle of the circuit. For the Minority-3 based 32 bits RCA with stacked inverters, the energy per cycle improves 15 percent compared to that of Minority-3 based 32 bits RCA at Vdd = 150 mV.

Index Terms—KSA, RCA, ultra-low voltage, implantable biomedical devices, channel length upsizing, minimum energy point, stacked inverters.

#### I. INTRODUCTION

In Internet of Things (IoT) applications, the demand for ultra-low power electronic systems, such as implantable biomedical devices for saving patient's life in the emergency situations, wireless sensor network and devices for environmental monitoring has grown rapidly [1]. Many such devices like pacemaker require ultra-low power and long battery lifetime. Modern pacemaker topologies are extremely sophisticated and include an analog part as well as a digital part. Digital part consist of a microcontroller and some memory [2], [3].

Voltage scaling is the most effective technique for power reduction [4]. Therefore, subthreshold digital circuits operating at supply voltages below the absolute values of FET threshold voltages significantly reduce the active and leakage power. Transistor subthreshold current has an exponential relation 978.1-778.1-778.9/1953100\_02019\_EEE with the gate-source and threshold voltage of the transistor [5]. At ultra-low supply voltages, the degradation of a transistor on to off current ratio is the fundamental limit for supply voltage reduction, and it affects the functionality of the circuits. The main source for threshold voltage variation which affects different properties like speed and power consumption is random dopant fluctuation (RDF) [4]. On the other hand, aggressive voltage scaling decreases the circuit speed. This causes static power consumption as a dominant source for the total power consumption and energy dissipation for such circuits, and techniques to reduce the static power consumption and leakage current are extremely influential on the total power and the battery lifetime of the circuit.

In the literature, many studies of subthreshold adders have been presented [6]–[12]. In [10], [11] and [12], comparative studies between different adder architectures have been performed, however, they are based on the schematic level simulations, not physical level. In [10], the comparison is between the RCA adder and Sklansky adder from parallel prefix family. Furthermore, in [12], the impact of the aggressive voltage scaling and process variations on the reliability of the adders were not investigated.

This study presents several expansions comparing with [12]. An Xor based RCA, 10T Minority-3 based RCA and Minority-3 based RCA with stacked inverters instead of a 6T Minority-3 based RCA have been implemented. The 6T Minority-3 gate used in [12] has proven to be extremely vulnerable to PVT variations, and to have a high power consumption [13]. Our gates are more reliable and suitable for ultra-low voltage circuits and low frequency leakage dominated applications. Also, all of the results are based on post layout simulations. Additionally, 8 and 16 bits adder which are more suitable for IoT applications such as implantable biomedical devices have been added for comparison between RCA and KSA.

In this paper, we have designed and explored the trade offs between 8, 16 and 32 bits ultra-low voltage, robust KSA and RCA adders including the parasitics from the layout using 130 nm technology. This study uses channel length upsizing and Pull Up/Pull Down Networks (PUN/PDN) balancing techniques to reduce leakage power and improve the functional yield of low frequency digital circuits in the subthreshold regime. It compares the fastest adder (KSA) and the simplest adder (RCA) structures for low frequency and energy constrained applications in the subthreshold supply voltages varying from 140-160 mV and temperature range 27-50 °C which is applicable for implantable medical devices and 5 kHz being suitable for many IoT applications [4]. Additionally, we have performed a comparative study on the minimum energy point of the different 32 bits RCA topologies. Inverter stacking in the full adder structure has been used to reduce leakage and the energy per cycle in the case of the low throughput applications. This study may help designers to select an optimal architecture based on their application and parameters.

This paper has been organized as follows. Section 2 explains the sizing strategy used for the full custom standard cell library designed for subthreshold supply voltages. Section 3 describes and compares the KSA and RCA adder structures. In Section 4 and 5 the simulation results have been presented and discussed. In section 6, the paper has been concluded.

#### II. SIZING STRATEGY FOR THE FULL CUSTOM CELL LIBRARY

For low frequency systems which are leakage dominated, power reduction requires leakage current reduction. Previous works have shown channel length upsize is more efficient than MTCMOS power gating, body biasing, Vt selection or device width upsize, and it increases robustness while simultaneously reducing static leakage energy [14], [15]. Superthreshold standard cell libraries have not been designed and optimized for subthreshold supply voltage circuits [16]. Our library uses channel length upsizing as a leakage reduction technique. Fig. 1 shows the ratio of the subthreshold on-current to the offcurrent. Ion and Ioff are transistor currents when Vgs = Vds and 0, respectively. The slope between 130-190 nm is much steeper than that of the rest of the range. The length of 190 nm has been selected as a tradeoff for the cells. Our subthreshold library consists of combinational logic with different driving strengths and memory, which in principle can implement any synchronous digital function. Dynamic energy is independent of the PUN/PDN matching [17]. However, leakage energy is dependent on the matching of PUN/PDN. Therefore, to find the minimum energy for digital circuits the leakage current of PUN/PDN should be equal [17]. Therefore, the width for PMOS and NMOS has been selected such that to achieve the same leakage current for the PMOS and NMOS transistors. The other factor for selecting the width of the NMOS and PMOS transistor is having a high functional yield in the adder circuits. The width of the PMOS and NMOS have been selected 1.8 um and 300 nm, respectively.

#### III. ADDER CIRCUIT STRUCTURE

Addition is one of the fundamental and widespread arithmetic operations. Moreover, it is the basic building block for many other useful operations, such as subtraction, multiplication, etc. Hence, the design of energy efficient adders has been a significant goal for many digital circuit designers.

In this study, two adder topologies for the RCA and the KSA [19] have been selected for the comparison. These adders have been selected based on the different properties such as area, speed and power consumption. The RCA is the simplest adder from carry propagate adders family with the lowest power consumption and area usage with a high delay because of the long carry propagation path from least significant bit to most significant bit. The internal structure of the RCA including a chain of the full adders have been shown in Fig. 2. The KSA from parallel prefix adders family is the fastest adder, because of parallel computations in shorter paths with only  $log_2N$  logic stages, and it consumes higher power and area compared to the RCA. The internal structure for the parallel prefix, KSA adder and its sub blocks have been illustrated in Fig. 3, Fig. 4 and Fig. 5, respectively [18]. The sub-blocks in Fig. 5 were all implemented in the most straightforward way using Xor, And, Or and Inverter functions. And and Or functions were implemented by using Nand and Nor gates, respectively, combined with inverters.

Gates with maximum fan-in of 2 or 3 have been selected to avoid robustness problem and improve the functional yield of the circuits. Therefore, two different topologies of Minority-3 based and Xor based full adders have been selected for the RCA and they have been illustrated in the Fig. 6 [20].

For the comparative study of different RCA topologies, we have added the Minority-3 based RCA with stacked inverters [21]. In this topology, the inverters of the Minority-3 based full adder have been stacked in order to reduce the leakage current, compared to the other full adder version using standard 2transistor inverters. This technique relies on the fact that, leakage current through the two off state transistor is less than that of one transistor. This technique will reduce the circuit speed, but it is not a problem for the applications with more



Fig. 1: Ion/Ioff for the NMOS transistor versus the length of the transistor, with minimum width and Vds = 150 mV.



Fig. 5: Different sub blocks in the internal structure of the KSA [1]

#### relaxed throughput requirements. IV. SIMULATION RESULTS

The structure of the adders has been synthesized automatically at the gate level by Cadence design Genus tool using a full custom standard cell library. The cell library is based on 130 nm CMOS bulk technology and is designed for subtreshold supply voltages varying from 140-160 mV. All of the gates in the standard cell library have been designed with low leakage transistors in order to reduce the leakage current. The layout of the adders is generated automatically by Cadence Innovus design place and route tool. All the simulations have been performed from the extracted view of the layout. Therefore, parasitics have been included in the simulations. In order to explore the effect of the PVT variations on the





(b) Minority based adder Fig. 6: Two different typologies for 1-bit full adder for RCA



Fig. 7: The testbench for the adders [10].

functional yield of the circuits, one thousand Monte Carlo simulations in each corner has been simulated. The full adders are fully functional for one thousand Monte Carlo simulations from both mismatch and process variations at supply voltages varying 140-160 mV and temperature range of 27-50 °C, and the frequency of 5 kHz. The outputs of the adder have been compared with both maximum acceptable low voltage  $(V_{OL})$  and minimum acceptable high voltage  $(V_{OH})$  which are equal to 0.25\*vdd and 0.75\*vdd, respectively. Additionally, the functional yield simulations for 32 bits Minority-3 and Xor based RCA show, the simulated yield due to process variation and mismatch Over the temperature range -40 °C to 60 °C and the Vdd range 140 to 160 mV, is better than 99.99 percent.

#### A. Delay and power simulations

The testbench used for the simulation is shown in Fig. 7. The worst case delay transition has been considered when one of the inputs is connected to the ground, the other one is connected to Vdd while the toggling carry input has been applied to the adder circuit [12]. The power has been simulated for the worst case delay transition. The driving power has been considered by using output FO4 load inverters [22].

#### B. Comparison between KSA and RCA

The speed advantage and power advantage of different adders compare with each other for the same supply voltage 150 mV at 5 kHz frequency and 27  $^{\circ}\mathrm{C}$  are summarized in Tables I, II, and III.

TABLE I: Power and speed advantage of the 32 bits adders compare with each other for Vdd = 150 mV at 27°C and 5 kHz.

| Speed advantage of the KSA compared to the Minority-3 RCA | 5.98 |
|-----------------------------------------------------------|------|
| Speed advantage of the KSA compared to the Xor RCA        | 5.96 |
| Power advantage of the Minority-3 RCA compare to the KSA  | 5.96 |
| Power advantage of the Xor RCA compared to the KSA        | 3.94 |

TABLE II: Power and speed advantages of the 16 bits adders compare with each other Vdd = 150 mV at  $27^{\circ}$ C and 5 kHz.

| Speed advantage of the KSA compared to the Minority-3 RCA | 3.96 |
|-----------------------------------------------------------|------|
| Speed advantage of the KSA compared to the Xor RCA        | 3.77 |
| Power advantage of the Minority-3 RCA compared to the KSA | 4.82 |
| Power advantage of the Xor RCA compared to the KSA        | 3.18 |

TABLE III: Power and speed advantages of the 8 bits adders compare with each other Vdd = 150 mV at 27  $^{\circ}C$  and 5 kHz.

| Speed advantage of the KSA compared to the Minority-3 RCA | 2.70 |
|-----------------------------------------------------------|------|
| Speed advantage of the KSA compared to the Xor RCA        | 2.60 |
| Power advantage of the Minority-3 RCA compared to the KSA | 4.03 |
| Power advantage of the the Xor RCA compared to the KSA    | 2.69 |

As we expected, for all 8, 16 and 32 bits adders the KSA adder is faster than the RCA adders, and the power consumption of the RCA adders is lower than the KSA adder.

For comparing the energy efficiency of the adders at low frequency applications, we have considered two different cases. First, if the circuit is powered down after the operation, the energy is the power times the delay of the circuit, but shutting down the circuit requires extra cost and area overhead. It would also add extra power consumption, which could make it an undesirable option for ultra-low power IoT applications. Second, if the circuit operates for the entire clock cycle, the speed is the same for the circuits. Therefore, the power determines the energy efficiency of the adders.

In the first case, for 32 bits adders, the speed advantage of the KSA adder is a bit higher than the power advantage of the RCA. Therefore, at the same supply voltage 32 bits KSA adder has lower energy per operation compared to the Minority-3 based RCA adder. In contrast, for the 8 and 16 bits adder, the power advantage of the Minority-3 based RCA is better than the KSA adder. It shows, less energy per operation for Minority-3 based RCA compared to the KSA.

In the second case, 8, 16 and 32 bits RCA adders are energy efficient compared to those of KSA adders because they have less power.

In order to have a comparison between the adders at the same speed, the supply voltage of the KSA adder is varied between 150-250 mV, the supply voltage for the RCA is increased such that to maintain the same speed as that of the KSA adder. The experimental results show, with increasing the supply voltage of the 8 bits Minority-3 based RCA adder to 194-298 mV, the power consumption and energy per operation for RCA adder are far less than those of the KSA adder. These results have been shown in the Fig. 8. For example, from the

Fig. 8(c), with increasing the supply voltage of the RCA 44 mV higher than the supply voltage of 150 mV for the KSA adder, the energy per operation for the KSA is about 3.5X of that of the RCA.

Tables IV shows the area used for the different adders based on the layout. As we can see, the area for the KSA adder is much higher than the area for the RCA adders.

TABLE IV: Area for different adders.

| Area    | KSA $(um^2)$ | Minority-3 RCA (um <sup>2</sup> ) | Xor RCA $(um^2)$ |
|---------|--------------|-----------------------------------|------------------|
| 8 bits  | 2268         | 927                               | 846              |
| 16 bits | 5541         | 1855                              | 1694             |
| 32 bits | 13099        | 3694                              | 3388             |

#### C. Impact of different topologies on Minimum Energy Point

For this comparative study, we have investigated the energy per operation, delay and static power consumption of the different RCA topologies including a Minority-3 based RCA, a Minority-3 based RCA with stacked inverters, and an XOR based RCA adder at the maximum operating frequency. Fig. 9 shows the energy per operation, the delay and the static power consumption of the different RCA adders at the maximum operating frequency versus the supply voltage.

As can be seen from the Fig. 9(a), the Energy per operation for the Minority-3 based RCA is the lowest among three different topologies. The other point is that different topologies with the same architecture have different minimum energy points. For example, for the Minority-3 based RCA the minimum energy point is 230 mV, while it changes to 250 mV for Xor based RCA. Static power consummation is a dominant source for the total power consumption and energy dissipation of the low frequency system. Therefore, with reducing the leakage current, the total power and energy will reduce for the application with less speed requirement. Fig. 9(c) shows the static power of the different RCA adders. As the result indicates, the static power consumption of the Minority-3 based RCA with stacked inverters has the lower power compared to the others, and it will have the lowest energy per cycle in the case of the low frequency applications. For Minority-3 based 32 bits RCA with stacked inverters, the energy per cycle improves 15 percent compared to Minority-3 based 32 bits RCA at Vdd = 150 mV.

#### V. DISCUSSION

Table 1 compares the power and delay of 32 bits KSA adder, Minority-3 and Xor based RCA adders at the same supply voltage at the low frequency of 5 kHz. The 32 bits KSA adder is 5.98X, 5.96X faster than the Minority-3 and Xor based RCA adders, respectively. The power of 32 bits Minority-3 and Xor based RCA adders is 5.96X, 3.94X lower than 32 bits KSA adder. Hence, for the same supply voltage and low frequency application the 32 bits KSA adder is energy efficient in comparison with the RCA adder. From table 2 and 3, the 16 and 8 bits Minority-3 based RCA adders are energy efficient compared to those of KSA adder. The energy



Fig. 8: Energy per operation for the KSA and RCA adders at the same speed versus delay. (a) 32 bits adders; (b) 16 bits adders; (c) 8 bits adders.



Fig. 9: Energy per operation, delay and the static power consumption for the 32 bits RCA adders at the maximum operating frequency versus supply voltage. (a) Energy per operation; (b) Delay; (c) static power consumption.

efficiency of 32 bits KSA adder in comparison with Minority-3 based RCA adder is insignificant, but 8 and 16 bits Minority-3 based RCA are 1.49X and 1.21X energy saving over their KSA counterparts, respectively. 8 and 16 bits Minority-3 based RCA adders are more suitable for IoT applications such as implantable biomedical devices, operating in the kHz range.

From Fig. 8, at the same speed for both adders with increasing the supply voltage of the RCA adders, the 32, 16 and 8 bits RCA adders are more energy efficient than the KSA adders. By increasing the supply voltage of the 8, 16 and 32 bits Minority-3 based RCA adders by 44, 60 and 78 mV, the energy efficiency has been improved by 3.16X, 3.19X and 3.54X compared to those of KSA adder at Vdd=150 mV.

Table 4 compares the area of the designed adders from the layout. The 8, 16 and 32 bits KSA adder is 2.44X, 3.05X and 3.61X larger than those of Minority-3 based adders, respectively. The layout view of 32 bits KSA and Minority-3 based adders are shown in Fig. 10 and Fig. 11, respectively. 11.

The comparison study of 32 bits RCA adders includes different performance parameters including the delay, static power consumption, and energy of the adders versus the supply voltage at the maximum operating frequency. The adders are low speed which is expected by using the low leakage transistors with long channel length. The delay of the Minority-3 based with stacked inverters is larger than that



of Xor and Minority-3 based RCA adders. The Minority-3 based adder has the lowest energy per operation at the maximum operating frequency. Fig. 9(c) shows the static



Fig. 11: The layout view of the 32 bits Minority-3 based RCA adder

power consumption of the adders. Minority-3 based with stacked inverters has the lowest static power consumption. This means, for the applications for low throughput requirement which are leakage dominated, we can stack the inverters to reduce the leakage current and energy per cycle.

Our presented result is compatible with the result of [10], where it shows in subthreshold, the RCA is slower than parallel prefix adder and consumes less energy, and is the best for low frequency application. In [10], the comparison is between the RCA adder and Sklansky adder from the parallel prefix family, but we have selected the KSA adder which is the fastest adder.

The presented result in our study is compatible quite well with the result of [12], where the authors declare, by increasing slightly the supply voltage of the RCA compared to that of KSA, the RCA adder is energy efficient over its KSA counterpart at the same speed. Our results support these findings, but also extend the study on reliable gates for aggressive voltage scaling, and the functional yield simulations for 32 bits Minority-3 and Xor based RCA show, the simulated yield due to process variation and mismatch Over the temperature range -40 °C to 60 °C and the Vdd range 140 to 160 mV, is better than 99.99 percent. Additionally, our simulations are based on netlists extracted from layout, unlike just schematics, as in [12]. The energy per operation for KSA and RCA adders in our study at the same delay and Vdd = 150 mV is better than the energy for these adders in [12], and it may be explained by the techniques for sizing and selecting low leakage devices to reduce the leakage current which is the dominant source for the total power of ultra-low speed circuits.

#### VI. CONCLUSION

For many ultra-low energy IoT applications such as implantable biomedical devices, the design of the digital circuits in the subthreshold regime is promising. In this paper, we have designed and analyzed 8, 16 and 32 bits ultralow power, robust, Kogge Stone (KSA) and Ripple Carry (RCA) adders using available 130 nm CMOS bulk technology at subthreshold supply voltages ranging from 140-160 mV and temperature range of 27-50 °C at 5 kHz frequency, for implantable biomedical devices. The Channel length upsizing and PUN/PDN balancing techniques have been used as a sizing strategy for the design of a full custom standard cell library. The simulation results confirm that the RCA minimizes power consumption, and it would be slower than the KSA. For higher speeds, the KSA adder, which is power hungry, would be used. The experimental results show at the same speed for both KSA and RCA, with increasing slightly the supply voltage

of the RCA compared to that of the KSA adder, the power consumption and energy dissipation, as well as area for RCA are far less than those of the KSA adder. By increasing the supply voltage of 8 bits RCA 44 mV higher than the supply voltage of 150 mV for the KSA adder, the energy per operation for the KSA is 3.5X higher than that of the RCA.

We have also investigated different RCA topologies and found out that the minimum energy point varies for different topologies. In addition, in the case of low throughput applications, using of the stacked inverters for full adder will save the leakage current and the total energy per cycle of the circuit. This technique causes the drop in the circuit speed, but it may not be problematic for the low frequency systems.

#### REFERENCES

- [1] S. Hanson, B. Zhai, D. Blaauw, D. Sylvester, A. Bryant, and X. Wang, "Energy optimality and variability in subthreshold design," in *ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design*, pp. 363–365, IEEE, 2006.
- [2] S. A. Haddad, R. P. Houben, and W. Serdijin, "The evolution of pacemakers," *IEEE Engineering in Medicine and Biology Magazine*, vol. 25, no. 3, pp. 38–48, 2006.
- [3] S. A. P. Haddad and W. A. Serdijn, Ultra low-power biomedical signal processing: an analog wavelet filter approach for pacemakers. Springer Science & Business Media, 2009.
- [4] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, et al., "Energy-efficient subthreshold processor design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 8, pp. 1127–1137, 2009.
- [5] A. G. Andreou, K. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, K. Strohbehn, et al., "Current-mode subthreshold mos circuits for analog vlsi neural systems," *IEEE Transactions on neural networks*, vol. 2, no. 2, pp. 205–213, 1991.
- [6] A. T. Tran and B. M. Baas, "Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm cmos," in *International Conference on Communications and Electronics 2010*, pp. 87–91, IEEE, 2010.
- [7] S. Khanna and B. H. Calhoun, "Serial sub-threshold circuits for ultralow-power systems," in *Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design*, pp. 27–32, ACM, 2009.
- [8] S. Nanduru, S. Koppa, and E. John, An Ultra-low Power Robust Kogge-Stone Adder at Sub-threshold Voltages for Implantable Bio-medical Devices. 2016.
- [9] X. Wu, F. Wang, and Y. Xie, "Analysis of subthreshold finfet circuits for ultra-low power design," in 2006 IEEE International SOC Conference, pp. 91–92, IEEE, 2006.
- [10] D. Blaauw, J. Kitchener, and B. Phillips, "Optimizing addition for subthreshold logic," in 2008 42nd Asilomar Conference on Signals, Systems and Computers, pp. 751–756, IEEE, 2008.
- [11] H. Dorosti, A. Teymouri, S. M. Fakhraie, and M. E. Salehi, "Ultralowenergy variation-aware design: Adder architecture study," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 3, pp. 1165–1168, 2015.
- [12] V. Beiu, A. Djupdal, and S. Aunet, "Ultra low-power neural inspired addition: When serial might outperform parallel architectures," in *International Work-Conference on Artificial Neural Networks*, pp. 486–493, Springer, 2005.
- [13] H. K. O. Berge, A. Hasanbegović, and S. Aunet, "Muller c-elements based on minority-3 functions for ultra low voltage supplies," in 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems, pp. 195–200, IEEE, 2011.
- [14] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, "Analysis and minimization of practical energy in 45nm subthreshold logic circuits," in 2008 IEEE International Conference on Computer Design, pp. 294–300, IEEE, 2008.

- [15] D. S. Truesdell and B. H. Calhoun, "Channel length sizing for power minimization in leakage-dominated digital circuits," in 2018 IEEE SOI-3D-Subtractional discontectronics Technology Unified Conference (S3S), and an analysis of the second pp. 1–2, IEEE, 2018. [16] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing
- J. To Chamber J. P. Marger and C. Starker, and C. Starker, and C. Starker and S. St
- S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, et al., "Exploring variability and performance in a sub-200-mv processor," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 4, pp. 881–891, 2008.
   R. M. Bahadori, M. Kamal, A. ArZai-Kusha, and M. Pedram, "A com-parative study on performance and reliability of 32-bit binary adders," *Integration*, vol. 53, pp. 54–67, 2016.
   P. M. Kogge and H. S. Stone, "A parallel algorithm for the efficient solution of a general class of recurrence equations," *IEEE transactions on computers*, vol. 100, no. 8, pp. 786–793, 1973.
   N. H. Weste and D. Harris, *CMOS VLSI design: a circuits and systems perspective*, Pearson Education India, 2015.

- perspective. Pearson Education India, 2015. [21] A. A. Vatanjou, T. Ytterdal, and S. Aunet, "Energy efficient sub/near-
- threshold ripple-carry adder in standard 65 nm cmos," in 2015 6th Asia Symposium on Quality Electronic Design (ASQED), pp. 7-12, IEEE, 2015.
- [22] S. Goel, A. Kumar, and M. A. Bayoumi, "Design of robust, energy-efficient full adders for deep-submicrometer design using hybrid-emos logic style," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 12, pp. 1309–1321, 2006.

4.3 Paper III: Exploring optimal back bias voltages for ultra low voltage CMOS digital circuits in 22 nm FDSOI Technology

# Exploring optimal back bias voltages for ultra low voltage CMOS digital Circuits in 22 nm FDSOI Technology

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology (NTNU) O.S. Bragstads plass 2a, Trondheim, 7491, Norway somayeh.h.zadeh@ntnu.no, trond.ytterdal@ntnu.no, snorre.aunet.ntnu.no

Abstract-This study presents a strategy to determine optimal body bias voltages for ultra low voltage digital circuits in the 22 nm Fully Depleted Silicon On Insulator Technology (FDSOI). The efficiency of body biasing for achieving high functional yield has been investigated by using reverse back bias voltages for HVT devices. The strategy has been evaluated through the design of an ultra low voltage Xor based adder at supply voltages varying from 140-160 mV and temperature range 27-50  $^\circ C$  at 1 kHz frequency. The adder under optimal body bias consumes 4.67 percent less energy than zero body bias at Vdd=150 mV and frequency of 1 kHz. The adder is fully functional for one thousand Monte Carlo simulations at optimal back bias voltage. The yield has improved by 0.4 percent in optimal back bias voltage compared to zero body bias. The results show the lowest Energy per cycle, variability and high functional yield for the obtained optimal body bias voltage. Also, additional analysis confirms the dependency of optimal body bias voltage on the switching activity and operating conditions for a given technology. We also show that the relative energy variability is larger than the delay variability over the back bias voltage range.

Index Terms-optimal body bias, reverse back bias, HVT device, activity factor, variability, functional yield.

#### I. INTRODUCTION

In Internet of Things (IoT) applications, reducing power consumption to reduce the energy usage of the system and hence prolong battery lifetime is often a key issue for digital circuit designers. Reducing the supply voltage below the absolute values of the transistor threshold voltages decreases the energy dissipation, but also increases the delays. As can be seen from equation (1), transistor weak inversion current has an exponential relation with the gate source and threshold voltage of the transistor [1].

$$I_{ds} = I_{0} \cdot (e^{(kV_{gs}/V_T)} e^{((1-k)V_{bs}/V_T)}) (1 - e^{-V_{ds}/V_T} + V_{ds}/V_0)$$
(1)

 $I_0$  is a constant related to the channel width and length of the MOS transistor.  $V_T$  and  $V_0$ , are the thermal and the Early voltage, respectively. k is approximately 0.7-0.75 which is related to subthreshold slope factor  $(1 + C_{dep}/C_{ox})$ .  $(C_{ox})$  978-1-7281-2769-9/19/S1.00 ©2019 IEEE

and  $\left( C_{dep}\right)$  are the gate oxide and depletion capacitance, respectively. This equation can also be applied for PMOS with opposite polarity. Based on equation (1), digital circuits in the weak inversion regime are more sensitive to process, voltage and temperature (PVT) variations than circuits in the superthreshold regime. Body biasing has been applied to reduce the fluctuations of delay and energy due to global process and temperature variations [2]-[7]. In [5], an optimum reverse body bias to reduce the standby leakage current has been demonstrated for bulk CMOS technology. The authors in [6] have investigated the impact of using body biasing to match the leakage of pull up/pull down networks (PUN/PDN) and reduce the supply voltage of the inverter to 100 mV in bulk CMOS technology. In [7], a study on the effect of body biasing to improve the process and temperature variation in a sub-200 mV processor has been proposed in bulk CMOS technology. The body biasing technique in bulk CMOS requires overhead cost by using a triple well process. The efficiency of body biasing has been increased in FDSOI technology, to tackle the problems of ultra low voltage design [8], [9]. FDSOI technology shows a higher body effect factor compared to the bulk CMOS [8]. In FDSOI, for transistor level body biasing, there is no extra area similar to the bulk. Hence, selecting the optimal back bias voltage for circuits implemented in a FDSOI technology is a key issue for not only robustness against PVT variations but also the energy efficiency of the digital circuits.

In this study, the optimal back bias voltage for tuning the PUN/PDN in a 22 nm FDSOI technology has been investigated to improve the reliability and reduce the energy per cycle in digital circuits. Moreover, high threshold voltage transistors (HVT) have been used for reducing leakage current. Additionally, we have performed additional simulations to show that the optimal body bias voltage is dependent on the switching activity factor and the operating conditions. The most important challenge for ultra low voltage is maximizing robustness. Hence, using gates with maximum fan-in of two has been proposed in [7], [10], [11] for low voltage digital circuits. In order to investigate the potential of the proposed strategy, an Xor based adder including gates with fan-in of two has been designed and simulated with the obtained optimal back bias voltage in 22 nm technology. The Xor based adder is fully functional for supply voltage varying from 140-160 mV and temperature range of 27-50°C at the target operating frequency of 1 kHz. This frequency and temperature range is applicable to implantable medical IoT applications. The Xor based adder has the minimum energy per operation at the optimal back bias voltage.

The rest of the paper has been arranged as follow. In section 2, the PUN/PDN has been balanced to reduce the leakage and increase the static noise margins of the circuits. In section 3, the energy efficiency and variability of a 20 inverters chain have been explored to select the optimal back bias voltage. In section 4 and 5, the dependency of optimal back bias voltage has been investigated for different operating conditions. In section 6, the strategy has been evaluated through the design of an ultra low voltage Xor based adder. In the two last sections, we have discussed the results and it is concluded that by selecting optimal body bias voltages, the digital circuits achieve high functional yield and minimum energy per operation.

#### II. BALANCING OF PUN/PDN

Balancing the PUN/PDN strengths is an essential issue for the functionality of digital CMOS circuits in the ultra low voltage regime. By balancing the PUN/PDN strengths the static noise margins will be increased. To reduce the overall design complexity of the gates, the body bias of the PMOS is connected to the supply voltage. The gates have been designed and verified using Cadence Virtuoso design in a 22 nm FDSOI technology. Moreover, high threshold voltage transistors have been used for reducing leakage current. The HVT transistors have been optimized for reverse back bias voltages in this technology.

As shown in Fig.1, subthreshold Ion and Ileak are highly affected by transistor length, L. Leakage current variation due to the change of transistor length is very high between 20-28 nm in comparison with 28-36 nm. Therefore a length of 28 nm is used for reducing leakage, improving threshold voltage variation. Mismatch variation is proportional to the inverse of the square root of the transistor area. Therefore, to reduce the variation due to mismatch, subthreshold circuit designers should avoid using minimum length and width for the transistors. Therefore, the width of  $W_N$ = 200 nm has been used. In order to have regular layout,  $W_P = 2W_N =$ 400 nm has been selected. Increasing the width of PMOS transistors not only increase the area but also increase the parasitic capacitances and hence the power consumption of the circuits. For example, in order to have the same  $I_{leak}$  and  $I_{on}$  for NMOS and PMOS transistors, the width of PMOS should be upsized by 2.52X and 4.58X compared to the width of NMOS, respectively.

Therefore, using extended body bias voltage introduced in a 22 nm technology is an effective technique to balance the CMOS gates. Fig.2 shows the ratio of NMOS



Fig. 1: Subthreshold  $I_{on}$  and  $I_{leak}$  versus length for NMOS transistor in supply voltages, Vds = 140, 150, 160 mV, W=200 nm, for  $I_{on}$  and  $I_{leak}$ , Vgs = Vdd and 0, respectively.



Fig. 2: The ratio of NMOS Back Bias voltage = VdOS current,  $W_P = 2W_N = 400$  nm, PMOS Back Bias voltage = Vdd = 150 mV.

and PMOS on current  $I_{on,NMOS}/I_{on,PMOS}$  and off current  $I_{off,NMOS}/I_{off,PMOS}$  over a range of NMOS back bias voltage when the body bias of the PMOS is connected to Vdd = 150 mV. For balancing PUN/PDN, with  $W_P = 2W_N = 400$  nm, the NMOS back bias voltage has been selected such that  $I_{on,NMOS}/I_{on,PMOS} = 1$  and it is equal to -441 mV (simulations in this section have done at the typical corner and 27 °C).

Dynamic energy is independent of the PUN/PDN matching [7]. However, leakage energy is dependent on the matching of PUN/PDN. Therefore, to find the minimum energy for digital circuits the leakage current of PUN/PDN should be equal. As shown in Fig. 2 for  $W_P = 2W_N = 400$  nm, in order to match PDN/PUN, NMOS back bias voltage should be equal to -128 mV. This voltage is not the same as the one that obtained for high noise margin.



NMOS Back Bias Voltage (mV) Fig. 3: Energy per cycle of inverter chain for different activity factors over a range of NMOS back bias voltage.

#### III. ENERGY EFFICIENCY AND VARIABILITY

For finding the best back bias voltage to decrease the variability and energy per cycle, a chain of 20 inverters have been simulated at different activity factors [12]. A chain of inverters is an effective indicator of energy and variability of complex digital circuits in subthreshold regime [7].

Fig. 3 shows the energy per cycle of the inverter chain as a function of NMOS back bias voltage. The energy has been calculated for a time period required to propagate a single transition through the inverter chain. In ultra low voltage design, the leakage energy is a dominant source for the total energy of the system. At the specific NMOS back bias voltage, the leakage current of pull down and pull up networks is equal and consequently the overall energy consumption of the system is minimum.

As can be seen from Fig. 3, the minimum energy per cycle for different activity factors has different NMOS back bias voltages. Fig. 3 confirms the dependency of optimal body bias voltage on the switching activity of the circuit. By decreasing the activity factor, the dynamic energy reduces, and the ratio of active energy to leakage energy will reduce. Thus, the optimal body bias is moved to the larger reverse back bias voltages to reduce the leakage energy. In other words, since increasing the idle time results in higher Leakage energy attribution, the optimal body bias voltage for minimizing energy has been compensated with increasing NMOS reverse back bias voltage. Also, we can see for the activity factors from 1 to 0.01, the energy changes only 5 percent and this means static power dominates the total power consumption.

To investigate the reliability, delay variability and energy variability of the inverter chain have been analyzed. The variability has been calculated in terms of  $\delta$ (standard deviation)/  $\mu$ (mean) [13]. Fig.4 shows the energy per cycle versus delay achieved from one thousand Monte Carlo simulation [14] (both process and mismatch variations) for three different NMOS back bias voltages at Vdd = 150 mV. The NMOS back bias voltage which has the minimum energy per cycle for inverter chain has been defined as  $V_{BBN,op} = -227$  mV.  $V_{BBN} = 0$  has

been defined as  $V_{BBN,gnd}$  and  $V_{BBN} = -441$  mV has been defined as  $V_{BBN,Ion}$ . The energy and delay variability for one thousand Monte Carlo simulation have been reported in Fig.4. The amount of energy variability for  $V_{BBN,op}$  is lower than the others. The delay variability of  $V_{BBN,op}$  is lower than that of  $V_{BBN,Ion}$ . The delay variability of  $V_{BBN,op}$  is better than that of  $V_{BBN,Ion}$ . The delay variability for  $V_{BBN,op}$  is almost the same as  $V_{BBN,op}$ . The lowest and highest energy variability over the back bias voltage range from -441-0 mV are 0.063 and 0.097, respectively. The delay variability changes between 0.205 to 0.213. According to the simulations, the relative energy variability 1.54 is larger than the delay variability 1.04 over the back bias voltage range from -441-0 mV.

#### IV. OPTIMAL BACK BIAS VOLTAGE AND DIFFERENT WORKLOADS

The optimal back bias voltage not only depends on technology and circuit characteristics but also on the workload. In order to consider the impact of loading on the optimal back bias voltage, we added different loads of 3 inverters (FO3), 4 and 5 to each inverter. Different loads not only impacts the absolute amount of energy but also it changes the optimal back bias voltage. Energy per cycle for the chain of 20 inverters at different activity factors and different workloads are shown in Fig. 5. As can be seen, the optimal back bias voltage shifted to the right with increasing the workload. The optimal biasing voltage for FO3 varies from -212 mV to -222 mV, for activity factors from 1 to 0.1. For loads of FO4 and FO5, it changes to the range of -196 to -180 mV and -176 to -166 mV, respectively.

#### V. LOGIC DEPTH, SIZING AND VARIABILITY IN SUBTHRESHOLD REGIME

It is well known that the impact of variability is increased in the subthreshold regime due to the increased sensitivity of drain current to threshold voltage variation. In order to investigate the effect of logic depth and sizing on the variability, simulated delay and energy variation due the process and mismatch variations are shown in Fig. 6 and Fig. 7. We have performed 1000 Monte Carlo simulations in each case. Note that the delay variability for a chain of 20 inverters increases from  $\delta/\mu$ = 0.065 from mismatch to 0.207 from mismatch and process variation. This is caused by the fact that random dopant fluctuation is the dominant source of the variability at low voltages [15].

As shown in Fig. 6, the larger the logic depth of the inverters chain, the lower the variability of delay and energy. This is because variation tends to average out through the logic path. A chain of 25 inverters shows 7.23 and 16.21 percent reduction in delay and energy variability from both mismatch and process variations, respectively.

Also, we have simulated the energy and delay variability for chain of 20 inverters at different widths of NMOS and PMOS transistors. As expected, by increasing the width of transistors, the variability decreases.


Fig. 4: Variability of Energy versus Delay of inverter chain for NMOS back bias voltages (a) Optimal back bias voltage VBBN =-227 mV; (b) VBBN = 0; (c) VBBN =-441 mV.



Fig. 5: Energy per cycle of inverter chain for different activity factors over a range of NMOS back bias voltage(a) FO3 load to each inverter; (b) FO4 load to each inverter; (c) FO5 load to each inverter.



Fig. 6: Delay and energy Variability of chain of inverters with different number of inverters,  $W_P = 2W_N = 400 \text{ nm}$ ,  $V_{BBN} = V_{BBN,op}$ ,  $V_{BBP} = \text{Vdd}$ .

#### VI. ADDER CIRCUIT DESIGN AND SIMULATION

Addition is one of the fundamental and widespread arithmetic operations. Moreover, it is the basic building block for many other useful operations, such as subtraction, multiplication, etc. Hence, the design of energy efficient adders has been a key issue for many digital circuit designers.

The most important goal for ultra low voltage is maximizing robustness because of high sensitivity to PVT variations. Hence, using gates with maximum fan-in of two has been proposed for ultra low voltage digital circuits [7], [10], [11].

The Xor based adder including gates with fan-in of two has been designed and simulated to verify the approach proposed for selecting the optimal back bias voltage.



Fig. 7: Delay and energy Variability of chain of inverters with different sizing,  $W_P = 2W_N, V_{BBN} = V_{BBN,op}, V_{BBP} = V$ dd.



Fig. 8: Xor based adder

The schematics of the Xor based full adder, Nand and Xor gates have been shown in Fig. 8 and Fig. 9, respectively [16]. The simulations have been done for a target frequency of 1 kHz which is suitable for many IoT applications. The most critical issue at ultra low voltage circuits is PVT variations and consequently the functionality of the circuits. The func-



Fig. 10: Inputs and Outputs of Xor based full adder at a supply voltage of 150 mV and 1 kHz frequency.

tionality of the adder has been tested systematically for three different NMOS back bias voltages ( $V_{BBN,op}$ ,  $V_{BBN,gnd}$  and  $V_{BBN,Ion}$ ). The outputs of the adder have been compared with both maximum acceptable low voltage (VOL) and minimum acceptable high voltage ( $V_{OH}$ ) which are equal to  $0.25 \times V dd$ and  $0.75 \times Vdd$ , respectively. The adder has not failed for one thousand Monte Carlo simulations (mismatch and process variations) with  $V_{BBN,op}$  = -227 mV at different conditions. For VBBN,gnd, the adder failed for 4 Monte Carlo simulations at Vdd=140 mV, 27°C and 4 Monte Carlo simulations at Vdd=140 mV, 50°C. At  $V_{BBN,Ion}$  = -441 mV the adder failed for 4 Monte Carlo simulations at Vdd=140 mV, 50°C. Hence, the functional vield of the adder has improved by 0.4 percent in optimal back bias voltage compared to other body bias voltages. The full adder has been simulated at 1 kHz frequency which is the case for many IoT applications [15], over the temperature range 27-50°C, which is appropriate for implantable medical devices with the supply voltage varying from 140-160 mV to reduce the power consumption. Different performance metrics of Xor based adder including power, worst case delay, and energy per operation have been calculated for different NMOS back bias voltages. For accurate power measurement, all different input transitions should be considered because the changing of an input transition may not necessarily change the output results. Fig. 10 shows the different input transitions used for estimating the average power consumption of the full adder [17].

The power, speed and energy per operation of the adder for

the supply voltage of 150 mV at 1 kHz frequency and 27 °C over the back bias voltage range have been shown in Fig. 11. As can be seen from the Fig. 11(c), the value of energy per operation for the optimal NMOS back bias voltage ( $V_{BBN,op} = -227$  mV) is minimal compared to the other back bias voltage and the adder under optimal body bias voltage consumes 4.67 percent less energy than zero body bias voltage.

#### VII. DISCUSSION

The simulation results from Fig.3 indicate that the energy at the specific back bias voltage is minimum. Also, it shows the minimum energy per cycle for different activity factors has different NMOS back bias voltages. By decreasing the activity factor, the dynamic energy reduces, and the ratio of active energy to leakage energy will reduce. Thus, the optimal body bias is moved to the larger reverse back bias voltages to reduce the leakage energy. By changing the activity factor from 1 to 0.01, the energy changes only 5 percent and this means in ultra low voltage design, the leakage energy is a dominant source for the total energy of the system, and static power dominates the total power consumption of the low frequency digital circuits.

Comparing the energy and delay variability for different back bias voltages, the obtained optimal back bias voltage has the smallest energy variability. The delay variability of  $V_{BBN,op}$  is better than that of  $V_{BBN,Ion}$ . The delay variability for  $V_{BBN,gnd}$  is almost the same as  $V_{BBN,op}$ . Based on these simulations, the relative energy variability 1.54 is larger than the delay variability 1.04 over the back bias voltage range from -441-0 mV.

As Fig.5 shows, for different loads the optimal back bias voltage has changed. As shown in Fig. 6 and Fig. 7, by adding the process variations, the delay variability increases by more than 2X.

The result from Monte Carlo simulation for Xor based adder reveals that the functional yield of the adder has improved by 0.4 percent in optimal back bias voltage compared to other body bias voltages.

As expected, the power for the adder by increasing the reverse back bias voltage has decreased, and the delay has increased. Additionally, the energy per operation of the Xor based adder under optimal body bias has improved by 4.67 percent compare to the zero body bias.

It is worth mentioning that, in [18], we have compared ultra low voltage subthreshold adder topologies at 22 nm FDSOI technology and concluded the Xor based adder is the best of all. We have investigated more later and found out that the power consumption at low frequencies for Minority-3 based adder is better than that of Xor based adder at this technology.

#### VIII. CONCLUSION

Design of digital circuits in the subthreshold regime is promising for many ultra low energy IoT applications. The most critical concern in ultra low voltage design is PVT



Fig. 11: Performance metrics of an Xor based adder at Different NMOS back bias voltages and Vdd= 150 mV. (a) Power consumption; (b) Delay; (c) Energy per operation.

variations. In this study, we demonstrate the optimal back bias voltages in the 22 nm technology to balance PDN/PUN strength and improve the variability to PVT variations and consumed energy. In the case of low speed applications, using HVT transistors optimized for reverse back bias voltage allow to reduce the leakage current and hence static power consumption. Additionally, the sizing strategy resulted in a regular layout and reducing the variability. The strategy has been evaluated through designing of an ultra low voltage Xor based adder at the supply voltages varying from 140-160 mV and temperature range 27-50 °C at 1 kHz operating frequency. This frequency and temperature range is applicable to implantable medical IoT applications. As a result, using the optimal back bias voltages for ultra low voltage circuits achieved the lowest Energy per cycle, variability to PVT variations, and high functional yield. The adder under optimal body bias consumes 4.67 percent energy lower than zero body bias. We have shown the dependency of optimal body bias voltage on the characteristics of the design and on operating conditions for a given technology. We also show, the relative energy variability is larger than the delay variability over the back bias voltage range.

#### REFERENCES

- [1] A. G. Andreou, K. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, K. Strohbehn, et al., "Current-mode subthreshold mos circuits for analog vlsi neural systems," *IEEE Transactions on neural networks*, vol. 2, no. 2, pp. 205–213, 1991.
- [2] T. Kobayashi and T. Sakurai, "Self-adjusting threshold-voltage scheme (sats) for low-voltage high-speed operation," in *Proceedings of IEEE Custom Integrated Circuits Conference-CICC'94*, pp. 271–274, IEEE, 1994.
- [3] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, et al., "A 0.9-v, 150mhz, 10-mw, 4 mm/sup 2/, 2-d discrete cosine transform core processor with variable threshold-voltage (vt) scheme," *IEEE Journal of Solid-State Circuits*, vol. 31, no. 11, pp. 1770–1779, 1996.
  [4] J. T. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mv multiply-
- [4] J. T. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mv multiplyaccumulate unit using an adaptive supply voltage and body bias architecture," *IEEE journal of solid-state circuits*, vol. 37, no. 11, pp. 1545– 1554, 2002.
- [5] A. Keshavarzi, S. Narendra, S. Borkar, C. Hawkins, K. Royi, and V. De, "Technology scaling behavior of optimum reverse body bias for standby leakage power reduction in cmos ic's," in *Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No. 99TIR477)*, pp. 252–254, IEEE, 1999.

- [6] A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, and E. Nowak, "Low-power cmos at vdd= 4kt/q," in *Device Research Conference Conference Digest (Cat. No. 01TH8561)*, pp. 22–23, IEEE, 2001.
- [7] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, et al., "Exploring variability and performance in a sub-200-mv processor," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 4, pp. 881–891, 2008.
- [8] D. Jacquet, F. Hasbani, P. Flatresse, R. Wilson, F. Arnaud, G. Cesana, T. Di Gilio, C. Lecocq, T. Roy, A. Chhabra, et al., "A 3 ghz dual core processor arm cortex tm-a9 in 28 nm utbb fd-soi cmos with ultrawide voltage range and energy efficiency optimization," *IEEE Journal* of *Solid-State Circuits*, vol. 49, no. 4, pp. 812–826, 2014.
- R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage logic circuits exploiting gate level dynamic body biasing in 28 nm utbb fd-soi," *Solid-State Electronics*, vol. 117, pp. 185–192, 2016.
   A. Wang and A. Chandrakasan, "A 180mv fft processor using subthresh-
- [10] A. Wang and A. Chandrakasan, "A 180mv fft processor using subthreshold circuit techniques," in 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No. 04CH37519), pp. 292–529, IEEE, 2004.
- [11] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K. K. Das, W. Haensch, E. J. Nowak, and D. M. Sylvester, "Ultralow-voltage, minimum-energy emos," *IBM journal of research and development*, vol. 50, no. 45, pp. 469–490, 2006.
  [12] M. Seok, D. Jeon, C. Chakrabarti, D. Blaauw, and D. Sylvester,
- [12] M. Seok, D. Jeon, C. Chakrabarti, D. Blaauw, and D. Sylvester, "Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design," in *Design Automation Conference (DAC), 2011 48th* ACM/EDAC/IEEE, pp. 990–995, IEEE, 2011.
- [13] S. Nassifi, K. Bernstein, D. J. Frank, A. Gattiker, W. Haensch, B. L. Ji, E. Nowak, D. Pearson, and N. J. Rohrer, "High performance cmos variability in the 65nm regime and beyond," in *Electron Devices Meeting*, 2007. IEDM 2007. IEEE International, pp. 569–571, IEEE, 2007.
- [14] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proceedings of the* 2005 international symposium on Low power electronics and design, pp. 20–25, ACM, 2005.
- [15] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, et al., "Energy-efficient subthreshold processor design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 8, pp. 1127–1137, 2009.
- [16] N. H. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
- [17] S. Goel, A. Kumar, and M. A. Bayoumi, "Design of robust, energyefficient full adders for deep-submicrometer design using hybrid-emos logic style," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 12, pp. 1309–1321, 2006.
- [18] S. H. Zadeh, T. Ytterdal, and S. Aunet, "Comparison of ultra low power full adder cells in 22 nm fdsoi technology," in 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), pp. 1–5, IEEE, 2018.

# 4.4 Paper IV: An ultra low voltage subthreshold standard cell based memories for IoT applications

### An ultra low voltage subthreshold standard cell based memories for IoT applications

Somayeh Hossein Zadeh dept. Electronic Systems NTNU Trondheim, Norway somayeh.h.zadeh@ntnu.no Trond Ytterdal dept. Electronic Systems NTNU Trondheim, Norway trond.ytterdal@ntnu.no Snorre Aunet dept. Electronic Systems NTNU Trondheim, Norway snorre.aunet.ntnu.no

Abstract—In this paper, we have designed an ultra low voltage standard cell based memory (SCM) in the subthreshold domain. The SCM has been synthesized, placed and routed based on an ultra low voltage full custom standard cell library using 130 nm process technology, functional for subthreshold supply voltages as low as 140 mV, which is well below the supply voltages reported in the previous works of SCMs. This SCM has been designed based on the standard D-latch which holds data at a supply voltage as low as 170 mV. The designed SCM is functional for a temperature range of 27-50 °C applicable for implantable biomedical systems requiring small memory capacity. According to the simulations from the extracted netlist, the energy per bit access at Vdd = 200 mV and 333 kHz frequency, suitable for many IoT applications, is 1.54 fJ.

Index Terms—standard cell based memory, subthreshold, energy per bit, IoT applications.

#### I. INTRODUCTION

Power consumption is the primary concern in many Internet of Things (IoT) applications operating in the kHz range frequency. Reducing the supply voltage reduces both static and dynamic power consumption. Hence, reducing the supply voltage is the well-known power reduction method among all methods [1]. Therefore, ultra-low power and low energy circuits translate to ultra-low sub-threshold circuits. However, operating in this region has design challenges such as functional yield and sensitivity to process, voltage, and temperature (PVT) variations.

Static random access memories (SRAMs) take on an average the dominant source for the area of the systems [2]. Thus, they can be one of the important parts of power consumption and energy dissipation of the systems. Additionally, SRAM is the first block that reduces the functional yield of the circuits when reducing the supply voltage. Thus, it may limit the minimum energy per operation for the circuits.

In the literature, many studies have focused on designing SRAM macros in the subthreshold regime.

Standard cell based memories (SCMs) have been presented for systems with smaller memory sizes, such as implantable biomedical devices as possible storage elements to the SRAM macros [3]. In SCMs, the storage cells are flip flops and latches. These arrays can be easily synthesized, placed, and routed. Compared to the SRAM macros, SCMs do not require peripheral circuits such as sense amplifiers, buffer foot drivers, and devices for supply voltage gating [4]. Hence, for the low frequency applications with small memory sizes, the SCMs can have lower area than SRAM macros [5].

In the literature, several studies have been presented for designing SCMs. Different architectures for the read and write logic and array of storage cells have been proposed for improving the energy and speed for different applications [6]. In [5], to reduce the energy per operation, a full custom D-latch unit has been proposed. In [3], the impact of multiple threshold voltages (Vth) option has been considered in the design of SCMs. Also, the impact of using a pass transistor based latch has been explored to reduce the area footprint, in [3].

Most of the previous works on SCMs have focused on high performance standard cell based memories by circuit and architectural modifications. In [7], a design technique has been presented to optimize the physical layout of an SCM. None of these works has performed a study on reducing the minimum supply voltage for standard cell based memories. This is an advantage for many IoT applications like systems using energy harvesting devices and implantable biomedical devices. In such cases, only low supply voltages are available or they are battery operating circuits with long standby time [8].

In this paper, the best SCM architecture reported in [5] which is based on the D-latch and suitable for small capacity applications, has been designed and explored in the subthreshold regime.

Standard cell libraries in the superthreshold regime are not developed for subthreshold operation [9]. This study design a full custom cell library for the subthreshold regime to reduce the static power and improve the functionality of the SCMs at the ultra low supply voltages. We have used the CMOS Dlatch as a reliable and high functional storage cell for ultra low supply voltages as low as 170 mV for implantable biomedical devices with low speed requirements. We have considered the reliability of the memory at the ultra low voltages through Monte Carlo simulations in the presence of both mismatch and process variations. All of the results are based on the post layout simulations using a 130 nm bulk CMOS process technology.

#### 978-1-7281-7296-5/20/\$31.00 ©2019 IEEE

Authorized licensed use limited to: Norges Teknisk-Naturvitenskapelige Universitet. Downloaded on December 02,2020 at 09:15:24 UTC from IEEE Xplore. Restrictions apply.

Contributions:

1. Compared to the published SCMs, our SCM has the lowest functional supply voltage and higher stability which make it a desirable option for energy harvesting devices with only low supply voltage an available.

2. Compared to the SCM at the same technology node [6], the presented SCM has lower energy per bit access. It is 40 times more energy efficient compared to the SCM at the same technology node.

3. Compared to the SRAM macro at the same technology node, the presented SCM is 2.77X faster.

The rest of the paper has been structured as follows: The full custom standard cell library developed for subthreshold supply voltages has been described in section 2. The architecture used for the SCM has been illustrated in section 3. In Section 4 and 5, the simulation results have been displayed and discussed, respectively. Section 6 concludes the paper.

#### II. STANDARD CELL LIBRARY DEVELOPED FOR SUBTHRESHOLD SCMS

The prominent part of the power for the low frequency circuits is static power. Therefore, leakage current must be reduced for total power reduction. The author in [10] shows that increasing the channel length reduces the leakage power efficiently and increases the robustness against the PVT variations.

Our library employs longer channel length as an efficient leakage reduction method. Fig. 1 displays  $I_{on}/I_{off}$  of the minimum size NMOS transistor where gate source voltage is equal to drain source voltage and zero for  $I_{on}$  and  $I_{off}$ , respectively. As we identify from Fig. 1, the slope between 130-190 nm is much sharper than that of the rest of the range. Here, in our library, the length is 190 nm to lower the static power and enhance reliability.

The proposed library contains gates such as latches with different driving strengths [11]. All of the gates in the standard cell library have been designed with high threshold voltage transistors to decrease the static power consumption.

To find the energy efficient digital circuits the current must be equal between PUN/PDN [13]. Hence, The dimensions of



Fig. 1: Ion/Ioff for the minimum size NMOS transistor vs the channel length, Vds = 150 mV [11].

the transistors are chosen 1.8 um and 300 nm for the PMOS and NMOS transistors to balance the PMOS and NMOS transistors, achieve the same leakage current for them, and increase the functional yield in the SCM circuit [11].

All the cells in the library including standard D-latch cell are fully functional for 1000 Monte Carlo simulations including both mismatch and process variations, at supply voltages as low as 140 mV and temperature range of 27-50 °C, operating at the frequency of 5 kHz. The outputs of the cells have been compared with both maximum allowable low voltage ( $V_{OL}$ ) and minimum allowable high voltage ( $V_{OH}$ ).  $V_{OL}$  and  $V_{OH}$ have been defined as  $0.25 \times Vdd$  and  $0.75 \times Vdd$ , respectively.

#### III. STANDARD CELL BASED MEMORY ARCHITECTURE

The memory consists of W = 4 words, B = 4 bits for each of the words. The words are accessible by A = log2 B bits addresses. The write and read ports are separated. Therefore, our SCM architecture includes write and read logic and storage cells array.

#### A. write logic

When considering an array of  $W \times B$  storage cells, the write logic selects one of the W words based on the selected address. The selected storage cell will be updated in the next active clock. Based on the results reported in [5], the energy efficient option for small SCMs capacity is using storage cells (flip flops or latches) with an enable feature, or with an equivalent logic. We have selected the second option for our SCM.

#### B. read logic

Our read logic includes both combinatorial and sequential components. In the read logic, the output word will be connected to the output based on the selected address. The authors in [5] have shown that in the case of small capacity memory, multiplexer based read logic consumes lower energy than Tri-State buffer based read logic. Therefore, the read logic is derived from the multiplexer in this study. To have a reliable subthreshold circuit, the multiplexers have been realized by Nand and Nor gates.

#### C. storage cells array

Latches which have lower area than flip flops have been used as storage cells. Based on the comparison between the flip flops and latches in [5], latches are more energy efficient compared to flip flops. We have used the standard cell CMOS



Fig. 2: The schematic of the CMOS D-latch [12].

Authorized licensed use limited to: Norges Teknisk-Naturvitenskapelige Universitet. Downloaded on December 02,2020 at 09:15:24 UTC from IEEE Xplore. Restrictions apply.

#### 2020 28th Iranian Conference on Electrical Engineering (ICEE)



D-latch as a reliable and high functional storage cell [12] for ultra low supply voltages. The schematic of standard cell CMOS D-latch has been shown in Fig. 2.

The Nand race free D-latch is a single phase clock latch (no overlap between the clock and inverted clock), and this simple latch consists of two stages of logic function and latching. When the clock is low, the first part will not change with input changing and so the output part also will not change. When the clock is high, the first part evaluates the input and sends it to the output node.

The total schematic of the standard cell based memory has been shown in Fig. 3.

#### D. Stability analysis in storage cells

Considering the subthreshold transistor current, digital circuits in the subthreshold regime are extremely vulnerable to PVT variations. Hence, robust operation in this regime is a limiting factor.

To see the stability of the latch gate, we have considered the static noise margin (SNM) of the Nand gate in the D-latch. To have a stable latch, the cross-coupled Nand pair must hold the data in the presence of voltage noise.

The Negative Slope Criteria (NSC) technique has been used for calculating the SNM high and low [14]. In this technique, the two points of a voltage transfer characteristic with the gain of unity have been identified. These points are  $V_{IH}$  and  $V_{IL}$ , which represent the lowest input voltage as a logic one and the highest voltage as a logic zero, respectively. With those points,  $V_{OH}$  and  $V_{OL}$  are the lowest output voltage and the uppermost voltage, respectively.

The minimum value of SNM high and low is referred to as the hold data ability of the D-latch.

The result of 1 k-points Monte Carlo simulations in the presence of both process and mismatch variations have been



Fig. 5: Static noise margin low.

shown in Fig. 4 and Fig. 5. In these figures, the subthreshold SNM high and low is normally distributed at the typical temperature of  $27^{\circ}$ C. In the results, the worst case voltage transfer characteristic (VTC) has been considered for the gates. Based on the figures, the minimum static noise margin for our gates in Vdd = 170 mV for 1 k-points Monte Carlo simulations is positive. Hence, there is no hold failure at Vdd = 170 mV.

The butterfly plots for different supply voltages have been shown in Fig. 6, Fig. 7 and Fig. 8. The plots have been obtained for different temperatures in the presence of both mismatch and process variations for one thousand Monte Carlo simulations. As we can see in Fig. 8, raising the supply voltage improves the SNM.

As can be seen from Fig. 6 and Fig. 7, the voltage transfer characteristic of the Nand gate and the inverse of VTC for Nand gate at Vdd = 170 mV for both temperatures of 27 and 50 °C are separate.

#### IV. RESULT

Cadence Genus tool has been used to synthesize the structure of the SCM at the gate level by using a full custom standard cell library. The layout of the SCM is generated automatically by the Cadence place and route tool. All of the simulations have been performed from the post layout simulations which means, parasitics have been considered in the simulations.

In this study, the 16-bit SCM has been designed and simulated. Different performance metrics including power, delay, and leakage current for different supply voltages at 5 kHz frequency and 27 °C are detailed in Table I. The SCM access time/delay includes both read and write time. The power and

#### 2020 28th Iranian Conference on Electrical Engineering (ICEE)



Fig. 6: Butterfly plots of the Nand gate for Vdd = 170 mV at 27  $^{\circ}$ C and 1 k-points Monte Carlo simulations.



Fig. 7: Butterfly plots of the Nand gate for Vdd = 170 mV at 50 °C and 1 k-points Monte Carlo simulations.



Fig. 8: Butterfly plots of the Nand gate for Vdd = 200 mV at 27  $^\circ C$  and 1 k-points Monte Carlo simulations.

the leakage current in this table are presented for the latches and the peripherals.

For many IoT applications like implantable biomedical devices and energy harvest devices, the available supply voltage is limited. Hence, the priority in such applications is high functionality at ultra low supply voltages.

Table II compares the measurement results of the published SCMs. As can be seen, the presented work in this study outperforms the works shown in the table in terms of the minimum supply voltage.

To have a comparison between the SCM and SRAM macros, we have represented several subthreshold SRAM macros and

| TABLE  | I: | Different | metrics | for | the | Standard | cell | based | memory | at | 27°C | and |
|--------|----|-----------|---------|-----|-----|----------|------|-------|--------|----|------|-----|
| 5 kHz. |    |           |         |     |     |          |      |       |        |    |      |     |

| Vdd(mV) | Power(nW) | delay(uS) | Leakage(nA) |
|---------|-----------|-----------|-------------|
| 170     | 3.69      | 1.77      | 20.81       |
| 200     | 4.46      | 0.89      | 22.32       |

the published SCMs using 65, 90, and 130 nm technologies in Table III. In this table, for comparison, the total energy is normalized to the total capacity of the array.

As can be seen from Table III, compared to the SCM in [6] designed at the same technology node, the presented work in this study is 40 times energy efficient. Our SCM is almost 3 times faster over its SRAM counterpart in [15] at the same technology node.

#### A. Energy and performance

The proposed SCM has the advantages of ultra low power thanks to the functionality at the ultra low voltages. The direct result of reducing supply voltage is reducing both static and dynamic power consumption, although it increases the delay of the memory, which is not a problem for applications with relaxed throughput requirements.

#### V. DISCUSSION

Table II summarizes some of the works focusing on the design of standard cell based memories. As the table shows, the study presented in this work is the only work focused on SCMs with the lowest supply voltage for low throughput and limited supply voltage applications. In such applications with relaxed throughput requirements, the static power consumption dominates the total power consumption [18].

Table III compares the presented post-layout simulation results with the measurement results of the different published SRAM macros and SCMs at the subthreshold regime. The SRAM designed in [15] using 130 nm technology has Vdd(min) = 200 mV which is larger than that of the presented SCM in this study.

To have a comparison between SRAM macros and our SCM at the same technology and the same supply voltage, we have calculated the energy per bit per access at the highest operating frequency of the SCM at the supply voltage Vdd = 200 mV. Considering [15], as we expected, the presented SCM is 2.77X times faster than this SRAM design. The energy per bit access for our study is much lower compared to both SCMs in [3] and [6]. The energy per bit access advantageous of our SCM compared to [3] and [6] is 52X and 40X, respectively. This is because the SCM in [6] is based on the flip-flop which consumes more power in comparison with the SCM based upon the latch.

Comparing the frequency, the SCM in [3] has the fastest SCM among all which is thanks to the pass-latch gate used in the design. The lowest minimum supply voltage for our study is lower than that of the SCM in [3], thanks to the reliable standard D-latch gate.

Authorized licensed use limited to: Norges Teknisk-Naturvitenskapelige Universitet. Downloaded on December 02,2020 at 09:15:24 UTC from IEEE Xplore. Restrictions apply.

| Ref.                                                                                                 | [3]                                             | [16]                                            | [17]                                               | Th                                                                                                                              | is work         |  |
|------------------------------------------------------------------------------------------------------|-------------------------------------------------|-------------------------------------------------|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-----------------|--|
| Optimization target<br>Latch type<br>Read architecture<br>Transistor type<br>Min supply voltage (mV) | Area + Speed<br>Pass-latch<br>MUX<br>LVT<br>300 | Speed + Leakage<br>D-latch<br>MUX<br>HVT<br>350 | Leakage + Area<br>D-latch<br>3-State<br>HVT<br>420 | Lowest functional voltage (Robustness)<br>Standard cell D-latch<br>MUX (realized with Nand and Nor gates)<br>Low Leakage<br>170 |                 |  |
|                                                                                                      | TABLE III                                       | Comparison with publi                           | ished SRAM macros a                                | nd SCMs                                                                                                                         |                 |  |
| Ref.                                                                                                 | [3](SCM)                                        | [6](SCM)                                        | [18](SRAM)                                         | [15](SRAM)                                                                                                                      | This work (SCM) |  |

90

461 (0.105 V)

0.5

measurement

130

130

100000

62.5

46.87

Post-layout

| TABLE II: Comparison with published SC |
|----------------------------------------|
|----------------------------------------|

| For area comparison at the same technology node, we have           | [6] F      |
|--------------------------------------------------------------------|------------|
| selected [6]. As we can see in Table III, the area for [6] is less | a          |
| than our work for the same technology. The area for the SCM        | <i>I</i> . |
| in our study is increased compared to the others by using the      | [7]        |
| high transistor count latch used in the array and having a high    | а          |

65

42000

80.8

8.3

measurement

300

#### VI. CONCLUSION

functional yield at the ultra low supply voltages.

Technology (nm)

Max frequency (kHz)

Energy (fJ/bit)

Area/bit(um)

Results

Min supply voltage (mV)

This paper presents a SCM functional for ultra low voltages applicable for implantable biomedical applications. By designing an ultra low voltage full custom standard cell library optimized for ultra low voltages, the SCM is functional for supply voltages as low as 170 mV. The proposed SCM in this study has been compared with other published SCMs and shows that the presented work has the lowest functional supply voltage between published SCMs.

The designed SCM is functional for the temperature range of 27-50 °C applicable for implantable biomedical devices requiring small memory capacity. According to the simulation from the extracted netlist, the energy per bit access at Vdd = 200 mV and 333 kHz frequency suitable for many IoT applications is 1.54 fJ.

#### REFERENCES

- [1] R. Gonzalez, B. M. Gordon, and M. A. Horowitz, "Supply and threshold voltage scaling for low power cmos," IEEE Journal of Solid-State Circuits, vol. 32, no. 8, pp. 1210–1216, 1997.
- L. Benini, A. Macii, and M. Poncino, "Energy-aware design of embed-[2] ded memories: A survey of technologies, architectures, and optimiza-tion techniques," ACM Transactions on Embedded Computing Systems
- (*TECS*), vol. 2, no. 1, pp. 5–32, 2003.
  [3] O. Andersson, B. Mohammadi, P. Meinerzhagen, A. Burg, and J. N. Rodrigues, "Ultra low voltage synthesizable memories: A trade-off Roungues, Charlow Conservations on Circuits and Systems I: Regular Papers, vol. 63, no. 6, pp. 806–817, 2016.
- [4] N. Verma and A. P. Chandrakasan, "A 65nm 8t sub-vt sram employing sense-amplifier redundancy," in 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 328-606, IEEE, 2007
- [5] P. Meinerzhagen, S. Y. Sherazi, A. Burg, and J. N. Rodrigues, "Benchmarking of standard-cell based memories in the sub-vt domain in 65-nm cmos technology," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 1, no. 2, pp. 173–182, 2011.

P. Meinerzhagen, C. Roth, and A. Burg, "Towards generic low-power rea-efficient standard cell based memory architectures," in 2010 53rd EEE International Midwest Symposium on Circuits and Systems, p. 129-132, IEEE, 2010.

130

120

19.8

measurement

200

130

333

154 (02 V)

59.8

Post-layout

170

- A. Teman, D. Rossi, P. Meinerzhagen, L. Benini, and A. Burg, "Power, area, and performance optimization of standard cell memory arrays through controlled placement," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 21, no. 4, p. 59, 2016.
- N. Lotze and Y. Manoli, "A 62 mv 0.13 m cmos standard-cell-based design technique using schmitt-trigger logic," *IEEE journal of solid*-[8] state circuits, vol. 47, no. 1, pp. 47–60, 2011. [9] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing
- for minimum energy operation in subthreshold circuits," IEEE Journal
- of Solid-State Circuits, vol. 40, no. 9, pp. 1778–1786, 2005.
  [10] D. S. Truesdell and B. H. Calhoun, "Channel length sizing for power minimization in leakage-dominated digital circuits," in 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), pp. 1–2, IEEE, 2018.
   S. H. Zadeh, T. Ytterdal, and S. Aunet, "Ultra-low voltage subthreshold
- binary adder architectures for iot applications: Ripple carry adder or kogge stone adder," in 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of Systemon-Chip (SoC), pp. 1-7, IEEE, 2019.
- [12] S. Okugawa and N. Inoue, "Systematic design of d flip-flops using two
- A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, and E. Nowak, "Low-power cmos at vdd= 4kt/q," in *Device Research* [13] Conference. Conference Digest (Cat. No. 01TH8561), pp. 22-23, IEEE, 2001.
- [14] J. R. Hauser, "Noise margin criteria for digital logic circuits," IEEE Transactions on Education, vol. 36, no. 4, pp. 363–368, 1993.
  [15] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A high-density subthreshold
- sram with data-independent bitline leakage and virtual ground replica scheme," in 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 330-606, IEEE, 2007.
- O. Andersson, B. Mohammadi, P. Meinerzhagen, A. Burg, and J. N. Rodrigues, "Dual-v t 4kb sub-v t memories with; 1 pw/bit leakage in 65 nm cmos," in 2013 Proceedings of the ESSCIRC (ESSCIRC), pp. 197-200, IEEE, 2013.
- [17] P. Meinerzhagen, O. Andersson, B. Mohammadi, Y. Sherazi, A. Burg, and J. N. Rodrigues, "A 500 fw/bit 14 fj/bit-access 4kb standard-cell based sub-v t memory in 65nm cmos," in 2012 Proceedings of the ESSCIRC (ESSCIRC), pp. 321-324, IEEE, 2012.
- [18] M.-F. Chang, S.-W. Chang, P.-W. Chou, and W.-C. Wu, "A 130 mv sram with expanded write and read margins for subthreshold applications," IEEE Journal of Solid-State Circuits, vol. 46, no. 2, pp. 520–529, 2010.

Authorized licensed use limited to: Norges Teknisk-Naturvitenskapelige Universitet, Downloaded on December 02,2020 at 09:15:24 UTC from IEEE Xplore. Restrictions apply.

4.5 Paper V: Multi-threshold voltage and dynamic body biasing techniques for energy efficient ultra low voltage subthreshold adders

## Multi-threshold Voltage and Dynamic Body Biasing Techniques for Energy Efficient Ultra Low Voltage Subthreshold Adders

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology (NTNU) O.S. Bragstads plass 2a, Trondheim, 7491, Norway somayeh.h.zadeh@ntnu.no, trond.ytterdal@ntnu.no, snorre.aunet.ntnu.no

Abstract-This paper designs and reports energy efficient subthreshold adders using 22 nm FDSOI technology. The dynamic body biasing technique and multi-threshold voltage devices have been used to match Pull up/Pull down networks (PUN/PDN). The post-layout simulation results show that the logic gates and full adder circuit based on dynamic body biasing are more robust than those with conventional body bias against process, voltage, and temperature (PVT) variations at ultra low supply voltages. The adder based on the conventional and dynamic body biasing techniques have achieved energy per addition of 0.23 fJ at Vdd = 300 mV and 0.56 fJ at Vdd = 140 mV, respectively. Compared to the other published subthreshold adders in [1] and [2], the energy per addition for our designed adders improved by 2.82X, 28.1X, respectively. The minimum operating supply voltage for dynamic and conventional body bias adders based on Monte Carlo simulations taking into account both mismatch and process variations are 140 and 200 mV, respectively. The area for conventional body biased adder has been reduced by 43.9 and 38.8 percent compared to those of the adders in [1] and [3], respectively.

Index Terms—22 nm FDSOI technology, dynamic body biasing technique, multi-threshold voltage devices, post-layout simulation, PVT variations, area.

#### I. INTRODUCTION

The significant demand for ultra low power IoT applications has increased the motivation for designing subthreshold circuits. However, subthreshold operation has design challenges which affect the functional yield and sensitivity to process, voltage, and temperature (PVT) variations [4].

FDSOI technology offers different threshold voltage devices to adjust the threshold voltage based on designer applications [5].

Different techniques have been presented for designing ultra low subthreshold digital circuits [6]–[9].

In this study, multi-threshold voltage transistors and dynamic body biasing [9] have been used to balance the PUN/PDN and design extremely energy efficient logic gates. The efficiency of the techniques has been tested by designing the energy efficient minority3 based full adders with and without dynamic body bias. The minority3 based full adder has been selected because [10] and [11] show that it is power and energy efficient compared to full adders based on the Boolean gates. The RVT/HVT (regular/high threshold voltage) devices have been used to balance the PMOS and NMOS devices with minimum sized PMOS devices and reduce the leakage current. Applying dynamic body bias improves the robustness against PVT variations at ultra low supply voltages. Using HVT devices results in slow circuits which is not a problem for low frequency applications. The frequency for wireless sensor network applications ranges between few Hz to kHz [12].

The contributions of this study are as follows:

1. Compared to the published adders using FDSOI technology, our adders have the lowest energy per addition.

2. Compared to the adders designed at the FDSOI technology [1] and [2], the presented adders have the lowest area.

3. Comparing two adders designed in this paper at the same technology, the area of the dynamic body bias based adder increases 10 percent compared to the conventional body bias implementation, while it is functional at supply voltages as low as 140 mV taking into account both mismatch and process variations.

The paper is structured as follows: Device sizing based on multi-threshold devices for subthreshold supply voltages has been described in section 2. Section 3 describes the full adders and the layouts. In sections 4 and 5, the simulation results have been displayed and discussed. Section 6 concludes the paper.

#### II. DEVICE SIZING

The prominent objective in subthreshold transistor sizing is matching PUN/PDN in digital circuits [13]. Matching of PUN/PDN increases the static noise margin (SNM) and the functionality at the ultra low supply voltages.

Large widths for the PMOS transistors causes larger capacitance in the circuit and hence more area and power consumption. To obtain an adequate balance between the PMOS and NMOS transistors without PMOS upsizing, the multi-threshold voltage technique and dynamic body bias have been used.



Fig. 1: The schematic of the minority3 based full adder.

HVT NMOS transistors and minimum sized RVT PMOS transistors have been chosen.

The width of the NMOS HVT transistors have been found by sweeping the input voltage such that the voltage transfer characteristic has equal input and output at the supply voltage Vdd/2. The three inputs of the gate were connected.

Mismatch variation is approximately proportional to the inverse of the square root of the transistor area. Increasing the gate lengths of the subthreshold circuits improve the robustness and functional yield of the circuits. Traditionally, to minimize energy, channel lengths of the transistors should be sized as small as possible. However, it has been shown that in the subthreshold regime, the channel length upsizing is an efficient technique for reducing leakage power [14].

The gate length, NMOS, and PMOS transistor widths are 40 nm, 120 nm, and 80 nm, respectively.

#### III. FULL ADDERS AND LAYOUTS

Fig. 1 illustrates the schematic of the minority3 based full adder. The layout of the both adders based on conventional and dynamic body bias are shown in Fig. 2.

Since both RVT and HVT transistors use regular wells, which means that PMOS/NMOS transistors are placed in Nwell and Pwell, respectively. In this case, the body of the PMOS, and NMOS transistors are connected to Vdd and Gnd, respectively.

The layout of the adders requires three metal layers. Based on the standard design rules in 22 nm FDSOI technology, the poly-to-poly space has been restricted to discrete spacing. The poly layer has not been rounded or routed and dummy poly is used. The dummy poly has been shared with the neighbor gate to reduce the adder area. The adder area for the conventional and dynamic body bias adders are  $6.62 \times 1.21 = 8.01$  and  $9.49 \times 0.93 = 8.82 \ um^2$ , respectively. The area of the adder with conventional body bias has improved by 10 percent compared to that of the adder with dynamic body bias.

#### A. Variability at ultra low supply voltages

The threshold voltage of the PMOS and NMOS devices change by PVT variations, but the variations are not similar for the NMOS and PMOS. This means the NMOS driven current will become higher/lower than that of the PMOS which degrades the balancing. The dynamic body biasing technique has been used for all the gates to mitigate variability against PVT variations and increases the functionality of the circuits at ultra low supply voltages.

In the dynamic body biasing technique, the body bias voltage of the PUN/PDN has been handled dynamically by an inverter for each gate as a body bias generator.

Fig. 3 and Fig. 4 show the schematic of the minority3 gates with conventional and dynamic body biasing, respectively.

To investigate the robustness of the gates with dynamic body biasing compared to that of the conventional one at ultra low supply voltages, the total current divided by the leakage current variabilities of the minority3 gates based on the both methods are shown in Fig. 5 and Fig. 6, respectively.

The variability has been calculated in terms of  $\delta$ (standard deviation)/  $\mu$ (mean) [15]. The variability of this ratio at the supply voltage Vdd = 150 mV and T = 50 °C for the minority3 based on conventional body bias for 1000 Monte Carlo [16] simulations taking into account both mismatch and process variations is 0.24 and is equal to 0.21 for minority3 with dynamic body bias. T = 50 °C has been considered because for the implantable biomedical application the temperature is not more than this temperature. The variability of this ratio for dynamic body bias is better than that of the conventional body bias.

The variability of the low static noise margin (SNM) of an inverter with conventional and dynamic body bias has been calculated. The Negative Slope Criteria (NSC) technique has been used for calculating the SNM high and low [17]. For an inverter, the gate length, NMOS and PMOS transistors width are 40 nm, 120 nm, and 80 nm, respectively.

The variability of low SNM has been calculated at Vdd = 150 mV for 1000 Monte Carlo simulations. As can be seen from Figs. 7 and Fig. 8, the SNM for an inverter based on dynamic body bias at Vdd = 150 mV is larger than that of the conventional one. Also, the variability of low SNM for the inverter with conventional and dynamic body bias is 0.337 and 0.305, respectively. It shows better SNM and variability of SNM for the gates with dynamic body bias at ultra low supply voltages.

#### IV. RESULTS

To find the minimum working supply voltage for the adders, the output of the adders have been compared with  $V_{OL}$  and  $V_{OH}$  for different input vectors.  $V_{OL}$  and  $V_{OH}$  are equal to 0.25\*Vdd and 0.75\*Vdd, respectively.

The functionality has been tested through 1000 [16] Monte Carlo simulations at various temperatures taking into account mismatch and process variations.

To test the adders, the test bench employs FO4 inverters at each of the adder outputs. Input inverters for full adders make them realize realistic input signals. For accurate power measurement, all different input transitions have been considered [18]. Fig. 9 shows different input transitions used for estimating the average power consumption of the full adders [19].



(b) minority3 based full adder with dynamic body bias

Fig. 2: The layout of the minority3 based full adder with conventional and dynamic body bias. Red = poly, green = active area, blue = metal 1, pink = metal 2, orange = metal 3.



Fig. 3: The schematic of the minority3 with conventional body bias.  $W_{NMOS}$  = 120 nm ,  $W_{PMOS}$  = 80 nm and L = 40 nm.



Fig. 4: The schematic of the minority3 with dynamic body bias.  $W_{NMOS}$  = 120 nm ,  $W_{PMOS}$  = 80 nm and L = 40 nm.

The adder with dynamic body bias is functional at the supply voltage as low as 140 mV for the temperature range 27-50 °C. The minimum supply voltage for adder with conventional body bias is 200 mV.

The energy per addition, static power, delay and the area of



Fig. 5: The  $I_{total}/I_{static}$  of the minority3 with conventional body bias.



Fig. 6: The  $I_{total}/I_{static}$  of the minority3 with dynamic body bias.

the designed adders have been tabulated in Table I and Table II. Table I and Table II are the result for Vdd = 300 mV and Vdd = minimum supply voltage, respectively. For the dynamic body bias adder, the result is reported just at the minimum supply voltage.

To have a comparison between the designed adders and current state-of-the-art using FDSOI technology, the performance metrics for different adders have been added in Table I and Table II.

For the adder with conventional body bias, the performance metrics have been reported for Vdd = 300 mV to compare to the results of other works. For the adder with dynamic body bias the metrics have been calculated for the minimum supply voltage. The ultra low voltage circuits especially are



Fig. 7: The low SNM of the inverter with conventional body bias.



Fig. 8: The low SNM of the inverter with dynamic body bias.

appropriate for energy harvesting applications with only low voltage available.

The adder based on conventional body bias achieved the energy per operation of 0.23 fJ at Vdd = 300 mV. The adder with dynamic body bias has 0.56 fJ energy per operation at Vdd = 140 mV.

As can be seen from Table I, the leakage power, energy per operation, and area for our adders are lower compared to the other works. The delays for our adders are larger than the others. This is because high threshold voltage devices have been used in this study.

The 8 bit ripple carry adder (RCA) topology has been implemented based on the minority3 based full adder with conventional body bias. The RCA has been selected because in [20] and [21] have been shown that the RCA from carry propagate family is energy efficient compared to parallel adders at the same speed with slightly increased supply voltage. The area for the 8 bit RCA with conventional body bias is  $26.1 \times 2.23 = 58.2 \ um^2$ . The layout for the RCA has been shown in Fig. 10.

Fig. 11 shows the normalized energy per addition of the 8 bit RCA adder with conventional body bias versus supply voltage at maximum operating speed. The normalized delay and the static power consumption of this adder versus supply voltage are also shown in Fig. 11. The minimum energy point of 0.11 fJ is achieved for Vdd = 200 mV. The energy, delay, and static power have been normalized to the minimum value of these parameters.

The minimum energy point of this adder occurs at the minimum functional supply voltage.



#### V. DISCUSSION

The performance metrics like delay, area, static power and energy per addition have been reported in Table I. Among all the published papers listed in this table, the energy per addition of our adders is the best. Compared to the energy per addition reported in [1], the energy per addition of the minority3 based adder based on conventional body bias has been improved by 2.82.

As can be seen from Table I, at the same supply voltage Vdd = 300 mV, the work in [2] exhibits the smallest delay by using low threshold LVT devices. Compared to the conventional body bias adder in this study, the adder in [2] is 37.5X faster. As expected, the adder in [2] has the highest static power among all by using LVT devices.

The results in Table I indicate that the adders designed here have the minimum static power consumption thanks to the HVT devices. The static power of the adder based on conventional body biasing is reduced by 49.7X and 3.13X compared to those of [2] and [1], respectively.

As stated in the previous section, the adder based on the dynamic body bias technique is more robust against PVT variations at ultra low supply voltages and it is functional for supply voltages as low as 140 mV, while the minimum operating voltage for adder based on conventional body bias is 200 mV. The author in [3] has mentioned that for applications with only low supply voltage available, reducing supply voltage is substantial even at the cost of extra area, energy per operation, and leakage current. Although the energy per operation of the adder based on dynamic body bias is higher than that of the adder based on conventional body bias, it is lower than the minimum energy point of the referenced adders in Table I.

The area of the adder based on conventional body biasing is 10 percent lower than that of the adder based on dynamic body bias. The larger area for this adder was expected considering inverter feedback as a dynamic body bias for the gates. The area of the proposed adders in this study is much lower than other references which is expected considering 22 nm FDSOI technology. The area for the conventional body biased adder has been reduced by 43.9 and 38.8 percent compared to those of the adders in [1] and [2].

| Ref.                          | [1]                       | [1]              | [2]                              | [22]                           | [23]                     | This work (conventional body bia                      |  |
|-------------------------------|---------------------------|------------------|----------------------------------|--------------------------------|--------------------------|-------------------------------------------------------|--|
| Technology FDSOI (nm)         | 28                        | 28               | 28                               | 28                             | 28                       | 22                                                    |  |
| leakage power (pW)            | 13.9                      | 34.3             | 221                              | -                              | -                        | 4.44                                                  |  |
| Energy per bit operation (fJ) | 0.65                      | 0.77             | 6.48                             | 0.62                           | 1.03                     | 0.23                                                  |  |
| Delay (uS)                    | 0.10                      | 0.38             | 0.02                             | 0.06                           | 0.06                     | 0.75                                                  |  |
| Supply voltage (mV)           | 300                       | 250              | 300                              | 300                            | 240                      | 300                                                   |  |
| Area/bit(um <sup>2</sup> )    | 12.6                      | 14.3             | 13.1                             | 25.5                           | -                        | 8.01                                                  |  |
| Devices                       | RVT                       | RVT              | RVT/LVT                          | -                              | LVT                      | RVT/HVT                                               |  |
| Results                       | Measurement               | Measurement      | Post-layout                      | Post-layout                    | Post-layout              | Post-layout                                           |  |
| TABLE II: Comp<br>Ref.        | arison with publi<br>[18] | shed subthreshol | d adders using<br>his work (conv | FDSOI techno<br>rentional body | logy at the min<br>bias) | nimum supply voltage<br>This work (dynamic body bias) |  |
| Technology FDSOI (nm)         | 22                        |                  |                                  | 22                             |                          | 22                                                    |  |
| leakage power (pW)            | 3.28                      |                  | 1.53                             |                                |                          | 8.7                                                   |  |
| Energy per bit operation (fJ) | 0.17                      | 7                | 0.11                             |                                |                          | 0.56                                                  |  |
| Delay (uS)                    | 11                        |                  | 8                                |                                |                          | 70                                                    |  |
| Supply voltage (mV)           | 150                       | )                | 200                              |                                |                          | 140                                                   |  |
| $\Delta rea/bit(um^2)$        | _                         |                  | 8.01                             |                                |                          | 8.82                                                  |  |

TABLE I: Comparison with published subthreshold adders using FDSOI technology at Vdd = 300 mV

RVT/HVT

Post-layout

HVT

Schematic

Devices Results

Fig. 10: The layout of the 8 bit minority3 based RCA with conventional body bias. Half of the full adders are placed on the upper half, and the rest in the lower.



Fig. 11: The normalized Energy per operation, delay and the static power consumption for the 8 bits RCA adder with conventional body bias at the maximum operating frequency versus supply voltage. (a) Energy per operation (The MEP = 0.11 fJ is for Vdd = 200 mV); (b) Delay; (c) static power consumption.

#### VI. CONCLUSION

The design exploration for ultra low voltage, and energy efficient subthreshold adders has been discussed in this paper using 22 nm FDSOI technology. The dynamic body biasing technique and multi-threshold voltage devices have been used to match the PUN/PDN. Post-layout simulation results show that the logic gates and circuits based on dynamic body biasing are more robust against process, supply voltage and temperature (PVT) variations at ultra low supply voltages. The dynamic body bias technique helps to reduce working voltage by 60 mV. Ultra low supply voltages are suitable for energy harvesting applications with only ultra low voltages available. The energy of the adder based on the conventional and the dynamic body biasing techniques at the maximum operating

frequency are 0.23 and 0.56 fJ, respectively. Comparing the energy efficiency of the designed adders to the previous works in [1] and [2], the energy per addition for our designed adders improved by 2.82X and 28.1X, respectively.

RVT/HVT

Post-layout

#### REFERENCES

- A. A. Vatanjou, E. Låte, T. Ytterdal, and S. Aunet, "Ultra-low voltage and energy efficient adders in 28 nm fdsoi exploring poly-biasing for device sizing," *Microprocessors and Microsystems*, vol. 56, pp. 92–100, 2018.
- [2] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Extended exploration of low granularity back biasing control in 28nm utbb fd-soi technology," in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 41–44, IEEE, 2016.
- [3] N. Lotze and Y. Manoli, "A 62 mv 0.13 m cmos standard-cell-based design technique using schmitt-trigger logic," *IEEE journal of solid-state circuits*, vol. 47, no. 1, pp. 47–60, 2011.

- [4] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proceedings of the* 2005 international symposium on Low power electronics and design, pp. 20–25, 2005.
- [5] R. Carter, J. Mazurier, L. Pirro, J. Sachse, P. Baars, J. Faul, C. Grass, G. Grasshoff, P. Javorka, T. Kammler, et al., "22nm fdsoi technology for emerging mobile, internet-of-things, and rf applications," in 2016 IEEE International Electron Devices Meeting (IEDM), pp. 2–2, IEEE, 2016.
- [6] L. Wei, K. Roy, and V. K. De, "Low voltage low power cmos design techniques for deep submicron ics," in VLSI Design 2000. Wireless and Digital Imaging in the Millennium. Proceedings of 13th International Conference on VLSI Design, pp. 24–29, IEEE, 2000.
- [7] S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, "Full-chip subthreshold leakage power prediction and reduction techniques for sub-0.18-/spl mu/m emos," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 3, pp. 501–510, 2004.
  [8] J. T. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mv multiply-
- [8] J. T. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mv multiplyaccumulate unit using an adaptive supply voltage and body bias architecture," *IEEE journal of solid-state circuits*, vol. 37, no. 11, pp. 1545– 1554, 2002.
- [9] A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, and E. Nowak, "Low-power cmos at vdd= 4kt/q," in *Device Research Conference. Conference Digest (Cat. No. 01TH8561)*, pp. 22–23, IEEE, 2001.
- [10] S. Aunet and Y. Berg, "Three sub-fj power-delay-product subthreshold cmos gates," *IFIP VLSI SoC, Perth, Australia*, vol. 1719, 2005.
- [11] A. A. Vatanjou, E. Late, T. Ytterdal, and S. Aunet, "Ultra-low voltage adders in 28 nm fdsoi exploring poly-biasing for device sizing," in 2016 IEEE Nordic Circuits and Systems Conference (NORCAS), pp. 1–4, IEEE, 2016.
- [12] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, et al., "Energy-efficient subthreshold processor design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 8, pp. 1127–1137, 2009.
- [13] Y. Pu, H. Corporaal, Y. Ha, et al., "Vt balancing and device sizing towards high yield of sub-threshold static logic gates," in Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED'07), pp. 355–358, IEEE, 2007.
- [14] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, "Channel length upsize for robust and compact subthreshold sram," *Proc. FTFC*, pp. 117– 120, 2008.
- [15] S. Nassif, K. Bernstein, D. J. Frank, A. Gattiker, W. Haensch, B. L. Ji, E. Nowak, D. Pearson, and N. J. Rohrer, "High performance cmos variability in the 65nm regime and beyond," in *Electron Devices Meeting*, 2007. *IEDM 2007. IEEE International*, pp. 569–571, IEEE, 2007.
- M. Lanuzza, R. Taco, and D. Albano, "Dynamic gate-level body biasing for subthreshold digital design," in 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4, IEEE, 2014.
   J. R. Hauser, "Noise margin criteria for digital logic circuits," IEEE
- [17] J. R. Hauser, "Noise margin criteria for digital logic circuits," *IEEE Transactions on Education*, vol. 36, no. 4, pp. 363–368, 1993.
- [18] S. H. Zadeh, T. Ytterdal, and S. Aunet, "Comparison of ultra low power full adder cells in 22 nm fdsoi technology," in 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), pp. 1–5, IEEE, 2018.
- [19] S. Goel, A. Kumar, and M. A. Bayoumi, "Design of robust, energyefficient full adders for deep-submicrometer design using hybrid-emos logic style," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 12, pp. 1309–1321, 2006.
- [20] V. Beiu, A. Djupdal, and S. Aunet, "Ultra low-power neural inspired addition: When serial might outperform parallel architectures," in *International Work-Conference on Artificial Neural Networks*, pp. 486–493, Springer, 2005.
- [21] S. H. Zadeh, T. Ytterdal, and S. Aunet, "Ultra-low voltage subthreshold binary adder architectures for iot applications: Ripple carry adder or kogge stone adder," in 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of Systemon-Chip (SoC), pp. 1–7, IEEE, 2019.
- [22] R. Taco, I. Levi, M. Lanuzza, and A. Fish, "Low voltage ripple carry adder with low-granularity dynamic forward back-biasing in 28 nm utbb fd-soi," in 2015 IEEE S01-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), pp. 1–2, IEEE, 2015.
- [23] H. Attarzadeh, S. Aunet, and T. Ytterdal, "An ultra-low-power/highspeed 9-bit adder design: Analysis and comparison vs. technology

from 130nm-lp to utbb fd-soi-28nm," in 2015 Nordic Circuits and Systems Conference (NORCAS): NORCHIP & International Symposium on System-on-Chip (SoC), pp. 1–4, IEEE, 2015. 4.6 Paper VI: Comparative study of single, regular and flip well subthreshold SRAMs in 22 nm FDSOI technology.

## Comparative Study of Single, Regular and Flip Well Subthreshold SRAMs in 22 nm FDSOI Technology

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology (NTNU) O.S. Bragstads plass 2a, Trondheim, 7491, Norway somayeh.h.zadeh@ntnu.no, trond.ytterdal@ntnu.no, snorre.aunet.ntnu.no

Abstract-This study presents a comparative study of single, regular and flip well subthreshold SRAMs in 22 nm FDSOI technology. A 7T loadless SRAM cell with a decoupled read and write port has been used as a case study. Simulation results. based on the extracted netlist from layout, show that the speed of the flip well SRAM is significantly better than that of the single and regular well SRAMs. In terms of leakage current, single well is the best option. The regular well type has lower static noise margin (SNM) variability. Among all devices used (HVT, RVT, LVT and, SLVT) available in a commercially available 22 nm FDSOI technology, the best combination for minimizing energy per access is HVT devices as driver transistors and RVT for the rest of the transistors. This study may help designers to select an optimal architecture based on their application and performance requirements. The 22 nm FDSOI technology enables a wide range of back gate bias voltages to improve the read stability and write ability of the SRAMs and, hence, their minimum operating voltage and power consumption.

Index Terms—7T pull-up loadless SRAM cell, decoupled read and write port, Multithreshold devices, single, regular and flip well, 22 nm FDSOI technology.

#### I. INTRODUCTION

Ultra low power battery operated devices are interesting for many Internet of Things (IoT) applications like implantable biomedical devices, wireless sensor network, and devices for environmental monitoring. Among many techniques, supply voltage scaling in the subthreshold region has been used to tackle the switching and leakage power consumption issues. However, operating in this region has design challenges such as low functional yield and, particularly, data stability of the SRAM cells. The design of robust and high-density SRAM cells for such applications is indispensable [1].

Furthermore, bulk CMOS technology scaling makes it difficult for SRAMs to achieve adequate functional yield and low leakage current, especially at ultra low supply voltages [2].

Fully depleted silicon on insulator (FDSOI) technologies have been demonstrated as an attractive alternative candidate to CMOS bulk technologies to tackle the challenges such as power consumption and variability faced by bulk CMOS technologies. The ultra thin body and thin buried oxide layer in FDSOI technology reduce the junction capacitances and drain induced barrier lowering effect (DIBL) over its bulk CMOS counterparts. Hence, body biasing efficiency has been

978-1-7281-9226-0/20/\$31.00 ©2020 IEEE

improved significantly compared to bulk CMOS technologies [3]. Also, in FDSOI the mismatch variations have been reduced by using undoped channel [3].

Threshold voltage variations caused by Random Dopant Fluctuations (RDF) makes the read and write stability of the 6T conventional SRAM cell worse at subthreshold supply voltages. Several studies have been presented for ultra low voltage SRAM design. Many studies have been developed to improve the read stability [4]–[8], write margin [5], [7], [9] and, the cell reliability [10]–[12] which is related to the bitline swing.

The most common assist method is to use decoupled read and write port [4], [6]. In this method, the storage nodes are decoupled from the bitlines. Subthreshold SRAM cells with decoupled read and write ports either have a high number of transistors or they are single-ended. In general, differential SRAM cells are more robust over their single-ended counterparts.

Multithreshold devices technique has been introduced to reduce the leakage current and improve the reliability of the SRAM cells [11], [12]. In [13], the effect of this technique has been considered for enhancing energy efficiency.

The 4T loadless SRAM cell has been presented as an alternative to the conventional 6T SRAM cell for high density and high speed applications [14], [15], [16]. In [17], the pull-up loadless SRAM with a single-ended read port has been used for ultra low voltage subthreshold region. In [18], the 7T pull-down loadless cell with differential read-disturb-free operation has been proposed for the superthreshold regime. A 7T pull-down SRAM requires wide PMOS devices to retain data [17]. This is due to the fact that carrier mobility for NMOS transistor is higher than that of the PMOS transistor. Therefore, the pull-up loadless for access transistors) can satisfy the principles for retention data with minimum width size.

The goal of this paper is to explore different devices and hence, different combinations of wells for SRAM cells. The multithreshold technique in 22 nm FDSOI has been used to determine the best one for subthreshold supply voltages based on the designer's needs. The paper uses a 7T pull-up loadless SRAM cell with differential read and write port as a case study to achieve an area-efficient SRAM cell operating at



Fig. 1: The schematic of the 7T pull-up loadless SRAM. All of the widths and lengths are 80 and 40 nm, respectively.

subthreshold supply voltages.

This paper has been organized as follows. Section 2 explains the 7T pull-up loadless SRAM architecture. In Section 3, the simulation results have been presented and compared with each other. The results have been discussed in section 4. The paper has been concluded in section 5.

#### II. 7T LOADLESS SRAM ARCHITECTURE

Fig. 1 shows the schematic of the 7T SRAM cell. The cell has two PMOS transistors for holding data. In conventional 4T SRAM cell the storage node cannot retain the data due to the leakage current through access transistors. Hence, it requires two main considerations that had to be considered to retain the data stored in the cell [17].

- 1) the leakage current through the driver elements must be lower than that of access transistors.
- the on current that flows through the driver transistors must be considerably larger than the leakage current of access transistors.

Therefore, in the single well type, the high threshold voltage (HVT) transistors as the driver elements and the low threshold voltage (LVT) transistors as the access transistors have been used. In the regular well type, the HVT transistors (regular well) with ultra low leakage have been selected as the driver elements and the regular threshold voltage (RVT) transistors as the access transistors to fulfill the above principles. HVT devices have been used in the latch to reduce the leakage current and RVT (regular well) devices for read access and write transistors to minimize the speed reduction. In the flip well type, the LVT and the super low threshold voltage (SLVT) transistors with low threshold voltage have been selected as driver elements and access transistors, respectively. LVT and SLVT devices have been used to achieve high speed.

For stable read operation in the subthreshold regime, a differential read buffer with only three transistors has been used to isolate the bitline from the internal storage nodes. By adding the differential read buffer, the SRAM cell has 7 transistors. It consists of N3, N4, and N5.

#### A. Read and Hold Operations

Fig. 2 depicts the read, write and hold operations of the 7T SRAM cell. During a read operation, the storage nodes



Fig. 2: Timing Diagram of the 7T pull-up loadless SRAM for write 0, hold 0, read 0, write 1, hold 1 and read 1 operations.

are cut from the read path. Based on the structure shown in Fig. 4, the read (RD) and write (WE) signals activate the read and write modes, respectively. These signals in read mode are active and inactive, respectively. The RWL signal activates the N5 transistor, while the WL line remains disabled. Based on the voltages of the internal nodes, the BLR and BLBR nodes are discharged through N3 or N4.

The latch type voltage sense amplifier has been used to sense the difference of the BLR and BLBR for the read operation, and the output appears at Out.

#### B. Write Operation

For pull-up loadless SRAM cells, the bitlines are precharged to zero volts in data retention mode and read mode. Before writing, one of the bitlines is pulled up to Vdd while the other one remains at zero volts.

In a write operation, Based on the structure shown in Fig. 4, two signals, WE and RD are high and low, respectively. The RWL signal is cut off. The bitlines are discharged through two NMOS access transistors which are turned on. Data signal is the value to be written to the cell.

Since the bitlines are precharged to the zero, the write margin of the loadless pull-up cell is supply voltage minus the required voltage on the bitline to flip the cell.

For finding the write margin, the Q and QB nodes of the cell are initialized to logic zero and one, respectively. The bitline voltage is then increased until the state of the cell flip.

#### C. Layout and Area

The layout of the SRAM cell is shown in Fig. 3. To satisfy the holding data principles with minimum width transistors, the PMOS transistors as the driver elements and NMOS for the access transistors have been selected. This is due to the fact that carrier mobility for the NMOS transistor is higher



Fig. 3: Layout of selected layers of the 7T pull-up loadless SRAM. Red = poly, green = active area, blue = metal 1, pink = metal 2, yellow = metal 3.

than that of the PMOS transistor. The width has been selected 80 nm for the transistors.

Channel length upsizing is used to reduce the leakage current. The gate length for all of the transistors is equal to 40 nm.

We have used the thinCell layout which has been used for the first time in [19]. This layout is easier to print and has a lower systematic mismatch. It requires 3 metal layers. The bitlines and the power supply run vertically and the wordline horizontally. To minimize cell area, the cells in a column are flipped vertically to share the rail connections. Based on the standard design layout rules in the 22 nm FDSOI technology used, the poly-to-poly spacing has been restricted to discrete spacing and utilizing dummy poly. The dummy poly in neighboring SRAM cells has been shared to save area in SRAM arrays. The cell bit area is  $0.745 \ um^2$ .

#### III. SIMULATION RESULTS

The performance metrics of the different well type 7T SRAM cells have been determined by using post-layout simulations for the structure shown in Fig. 4. It includes a column of 32 bit cells, precharge circuit [20], write driver and voltage sense amplifier.

In all simulations, the SRAM column is based on the netlist from the extracted view and the peripherals based on the schematic.

Due to the differential read buffer, the latch type voltage sense amplifier has been used to sense the difference of the BLR and BLBR for the robust read operation. The width sizes for the sense amplifier has been selected based on the sizes in [21]. Since, bitlines have high capacitances, the precharge circuit needs to provide a large current. Hence, the width of the transistors in the precharge circuit is 220 nm. The width of the write driver transistors is large to handle large current. The other factor for selecting the size of the write driver transistors is having a high write yield in the presence of process and mismatch variations. The width of the PMOS and NMOS transistors in the inverters of the driver circuit are 800 and 400 nm, respectively. All of the transistors in the peripheral circuit have minimum size length.

To validate the functionality of the different well type 7T pull-up loadless SRAMs architecture at low supply voltages, 1k Monte Carlo simulations for each write 0, write 1, read 0, read 1 and hold 0 and 1 operations have been done.

#### A. SRAM Read and Hold Stability

The butterfly plots for read and hold state for different supply voltages at room temperature is shown in Fig. 5. As



Fig. 4: A column of 32 bits cells, precharge circuit, write driver and voltage sense amplifier.

we can see in Fig. 5, raising the supply voltage improves the SNM. In addition to Fig. 5, the butterfly curves are separate and have a large eye for the different supply voltages. In the case of a single well, PMOS and NMOS transistors share the back gate bias voltage. The well bias has been selected to



Fig. 5: Butterfly curve for different well type SRAM cells. (a) Single well; (b) Regular well; (c) Flip well.



Fig. 6: Read and Hold SNM Distribution of the different well type 7T SRAM at Vdd = 300 mV. (a) Single well; (b) Regular well; (c) Flip well.



improve both read stability, hold stability and write ability. On the regular well, the well bias of the PMOS and NMOS transistors are connected to Vdd and Gnd, respectively. On the flip well, the well bias of the PMOS transistors have been connected to the Vdd.

Fig. 6 shows the distribution of the read and hold static noise margin of the different well type 7T SRAM for Vdd = 300 mV at typical temperature for 1000 Monte Carlo simulations [22] in presence of both process and mismatch variations. The SNM has been calculated based on the method defined in [23].

The SNM distribution is lognormal in accordance with [24]. The SNM variability for different well types is tabulated in Table I. The SNM variability has been calculated in terms of  $\delta$ (standard deviation)/  $\mu$ (mean) [25].

In [22] it was shown that a high yield SRAM cell has  $\mu(\text{mean})/\delta(\text{standard deviation})$  greater than 5.5 for 1k Monte Carlo simulations. As it is shown in Table I, the result for our SRAM is much larger than 5.5.

As can be seen from Table I, the read and hold SNM variability for the single well is larger than the flip and regular well. The robustness against variability for the regular well is the best among all. The average SNM for the flip well is higher than that of the two others.

The SNM of the single, regular and flip well at the supply voltage of 240 mV is positive for 1000 Monte Carlo simulations in the presence of both mismatch and process variations.

#### B. SRAM Write Ability

On contrary to the conventional 6T SRAM cell, during a write operation, there are no pull-down transistors to fight with access transistors. Hence, the write ability of the 4T SRAM cells is significantly better than 6T based SRAM cells. When the WL is asserted, the storage nodes will charge through BL/BLB.

To have a high write ability, the on current of the access transistors should be higher than that of the driver transistors.

Fig. 7 shows the write margin of the three well types 7T SRAM versus supply voltage at typical temperature.

TABLE I: Read and Hold SNM variability for different well types SRAM at Vdd = 300 mV

| Cell Type    | Mean (mV) | Std (mV) | Variability | μ/δ   |
|--------------|-----------|----------|-------------|-------|
| tegular well | 0.089     | 0.008    | 0.098       | 11.12 |
| Flip well    | 0.090     | 0.010    | 0.111       | 10    |
| Single well  | 0.086     | 0.011    | 0.127       | 7.81  |

| TABL            | E II: Performance metrics of th          | ne 7T pull-up SRAM                                                                                                            |                                                                                                                                                                                                                                                      |
|-----------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Read Delay (ns) | Read Freq (MHz)                          | Leakage Power/bit (pW)                                                                                                        | Energy/bit/op (aJ                                                                                                                                                                                                                                    |
| 25              | 40                                       | 16.7                                                                                                                          | 184.1                                                                                                                                                                                                                                                |
| 9               | 111                                      | 1360                                                                                                                          | 734.4                                                                                                                                                                                                                                                |
| 40              | 25                                       | 12.3                                                                                                                          | 248.6                                                                                                                                                                                                                                                |
|                 | TABL<br>Read Delay (ns)<br>25<br>9<br>40 | TABLE II: Performance metrics of th       Read Delay (ns)     Read Freq (MHz)       25     40       9     111       40     25 | TABLE II: Performance metrics of the 7T pull-up SRAM           Read Delay (ns)         Read Freq (MHz)         Leakage Power/bit (pW)           25         40         16.7           9         111         1360           40         25         12.3 |

#### C. Performance Evaluation

ŀ

Different performance metrics including static power, delay, read frequency and energy per bit per operation for supply voltage Vdd = 300 mV at 27 °C are summarized in Table II. The leakage power in this table is presented for a bit cell. The energy per bit per operation is calculated for both read and write operations.

#### IV. DISCUSSION

As can be seen from Table II, the regular well SRAM exhibits a smaller amount of energy per bit per operation than that of the single and flip well. In comparison with the single well, the long delay caused by reverse body bias in read buffer causes large active energy. The regular well SRAM cell is 1.35X and 3.98X more energy efficient compared to the single and the flip well counterparts, respectively.

Even though the single well SRAM consists of HVT and LVT transistors, it shows lower read frequency compared to the regular well SRAM. This is due to the fact that in the single well SRAM, PMOS and NMOS transistors share the well bias, and here, the selected well bias for Vdd = 300 mV to improve both read and write is  $1.5 \times Vdd$ . This means the NMOS transistors (LVT transistors ) in the read operation. Furthermore, the speed of the flip well SRAM is considerably higher than that of the single and regular well SRAMs because the SLVT and LVT transistors with low threshold voltage are used.

Comparing the 1-bit leakage power for cells, the single well SRAM cell has the lowest leakage power of 12.3 pW, thanks to the reverse body bias of the shared well bias. The leakage power advantage of the single well SRAM cell compared to the regular and flip is 1.36X and 110.56X, respectively.

The 22 nm FDSOI technology enables a wide range of back gate voltages to improve the read stability and write ability of the SRAMs and, hence, their minimum operating voltage. Based on the Monte Carlo simulations, the read and hold SNM in presence of both process and mismatch variations for 240 mV are positive which means the cells can retain the data at this supply voltage.

#### V. CONCLUSION

This study presented the performance metric analysis of different well type SRAM cells at subthreshold supply voltages. The 7T pull-up loadless SRAM with decoupled read and write ports has been selected as a case study. In this paper, among all devices used (HVT, RVT, LVT and, SLVT) available in a commercially available 22 nm FDSOI technology, the best combination for minimizing energy per access is HVT devices as the driver transistors and RVT for the rest of the transistors. The single well SRAM has the lowest leakage per bit cell over its regular and flip well counterparts. The regular well type has lower static noise margin (SNM) variability. The 22 nm FDSOI technology enables a wide range of back gate bias voltages to improve the read stability and write ability of the SRAMs and, hence, their minimum operating voltage.

#### REFERENCES

- T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A high-density subthreshold sram with data-independent bitline leakage and virtual ground replica scheme," in 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 330–606, IEEE, 2007.
- [2] A. Wang and A. Chandrakasan, "A 180mv fft processor using subthreshold circuit techniques," in 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No. 04CH37519), pp. 292–529, IEEE, 2004.
- [3] T. Skotnicki, C. Fenouillet-Beranger, C. Gallon, F. Boeuf, S. Monfray, F. Payet, A. Pouydebasque, M. Szczap, A. Farcy, F. Arnaud, et al., "Innovative materials, devices, and cmos technologies for low-power mobile multimedia," *IEEE Transactions on Electron Devices*, vol. 55, no. 1, pp. 96–130, 2007.
- [4] N. Verma and A. P. Chandrakasan, "A 256 kb 65 nm 8t subthreshold sram employing sense-amplifier redundancy," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, 2008.
- [5] B. H. Calhoun and A. Chandrakasan, "A 256kb sub-threshold sram in 65nm cmos," in 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, pp. 2592–2601, IEEE, 2006.
- [6] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2 v, 480 kb subthreshold sram with 1 k cells per bitline for ultra-low-voltage computing," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, 2008.
- [7] M.-F. Chang, S.-W. Chang, P.-W. Chou, and W.-C. Wu, "A 130 mv sram with expanded write and read margins for subthreshold applications," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 2, pp. 520–529, 2010.
- [8] A. Biswas and A. P. Chandrakasan, "A 0.36 v 128kb 6t sram with energy-efficient dynamic body-biasing and output data prediction in 28nm fdsoi," in *ESCIRC Conference 2016: 42nd European Solid-State Circuits Conference*, pp. 433–436, IEEE, 2016.
- [9] M. E. Sinangil, H. Mair, and A. P. Chandrakasan, "A 28nm high-density 6t sram with optimized peripheral-assist circuits for operation down to 0.6 v," in 2011 IEEE International Solid-State Circuits Conference, pp. 260–262, IEEE, 2011.

- [10] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki, A. Miyanishi, and Y. Kihara, "A 28-nm dual-port sram macro with active bitline equalizing circuitry against write disturb issue," in 2010 Symposium on VLSI Circuits, pp. 99–100, IEEE, 2010.
- [11] P. Liu, J. Wang, M. Phan, M. Garg, R. Zhang, A. Cassier, L. Chua-Eoan, B. Andreev, S. Weyland, S. Ekbote, et al., "A dual core oxide 8t sram cell with low vccmin and dual voltage supplies in 45mm triple gate oxide and multi vt cmos for very high performance yet low leakage mobile soc applications," in 2010 Symposium on VLSI Technology, pp. 135–136, IEEE, 2010.
- C. Diaz, K. Young, J. Hsu, J. Lin, C. Hou, C. Lin, J. Liaw, C. Wu, C. Su, C. Wang, et al., "A 0.18/spl mu/m cmos logic technology with dual gate oxide and low-k interconnect for high-performance and lowpower applications," in 1999 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No. 99CH36325), pp. 11–12, IEEE, 1999.
   B. Wang, J. Zhou, and T. T.-H. Kim, "Sram devices and circuits optiand Technical Papers (IEEE Cat. No. 99CH36325), pp. 11–12, IEEE, 1999.
- B. Wang, J. Zhou, and T. T.-H. Kim, "Sram devices and circuits optimization toward energy efficiency in multi-vth cmos," *Microelectronics Journal*, vol. 46, no. 3, pp. 265–272, 2015.
   X. Deng and T. W. Houston, "Loadless 4t sram cell with pmos drivers,"
- [14] X. Deng and T. W. Houston, "Loadless 4t sram cell with pmos drivers," May 4 2004. US Patent 6,731,533.
- [15] H. Shimizu, K. Ijitsu, H. Akiyoshi, K. Aoyama, H. Takatsuka, K. Watanabe, R. Nanjo, and Y. Takao, "A 1.4 ns access 700 mhz 288 kb sram macro with expandable architecture," in 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No. 99CH36278), pp. 190–191, IEEE, 1999.
- K. Takeda, Y. Aimoto, N. Nakamura, H. Toyoshima, T. Iwasaki, K. Noda, K. Matsui, S. Itoh, S. Masuoka, T. Horiuchi, et al., "A 16-mb 400-mhz loadless cmos four-transistor sram macro," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 11, pp. 1631–1640, 2000.
   E. Låte, T. Ytterdal, and S. Aunet, "A loadless 6t sram cell for sub-&
- [17] E. Låte, T. Ytterdal, and S. Aunet, "A loadless of sram cell for sub-& near-threshold operation implemented in 28 nm fd-soi cmos technology," *Integration*, vol. 63, pp. 56–63, 2018.
- [18] Y. H. Tseng, Y. Zhang, L. Okamura, and T. Yoshihara, "A new 7transistor sram cell design with high read stability," in 2010 International Conference on Electronic Devices, Systems and Applications, pp. 43–47, IEEE, 2010.
- [19] K. Osada, J. L. Shin, M. Khan, Y.-d. Liou, K. Wang, K. Shoji, K. Kuroda, S. Ikeda, and K. Ishibashi, "Universal-v/sub dd/0.65-2.0v 32-kb cache using a voltage-adapted timing-generation scheme and a lithographically symmetrical cell," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 11, pp. 1738–1744, 2001.
- [20] D. Halupka and A. Sheikholeslami, "Cross-coupled bit-line biasing for 22-nm sram," in 2009 Ph.D. Research in Microelectronics and Electronics, pp. 104–107, IEEE, 2009.
- [21] M. Khayatzadeh, F. Frustaci, D. Blaauw, D. Sylvester, and M. Alioto, "A reconfigurable sense amplifier with 3x offset reduction in 28nm fdsoi cmos," in 2015 Symposium on VLSI Circuits (VLSI Circuits), pp. C270– C271, IEEE, 2015.
- [22] F. Olivera and A. Petraglia, "Static noise margin trade-offs for 6t-sram cell sizing in 28 nm utbb fd-soi cmos technology," *Microelectronics Journal*, vol. 78, pp. 94–100, 2018.
- [23] E. Seevinck, F. J. List, and J. Lohstroh, "Static-noise margin analysis of mos sram cells," *IEEE Journal of solid-state circuits*, vol. 22, no. 5, pp. 748–754, 1987.
- [24] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proceedings of the* 2005 international symposium on Low power electronics and design, pp. 20–25, 2005.
- pp. 20–25, 2005.
  [25] S. Nassif, K. Bernstein, D. J. Frank, A. Gattiker, W. Haensch, B. L. Ji, E. Nowak, D. Pearson, and N. J. Rohrer, "High performance cmos variability in the 65nm regime and beyond," in *Electron Devices Meeting*, 2007. *IEDM 2007. IEEE International*, pp. 569–571, IEEE, 2007.

# 4.7 Paper VII: Subthreshold power PC and NAND race free flip-flops in frequency divider applications.

## Subthreshold Power PC and Nand Race-Free Flip-Flops in Frequency Divider Applications

Somayeh Hossein Zadeh, Trond Ytterdal, Snorre Aunet

Department of Electronic Systems, Faculty of Information Technology and Electrical Engineering

Norwegian University of Science and Technology (NTNU)

O.S. Bragstads plass 2a, Trondheim, 7491, Norway

somayeh.h.zadeh@ntnu.no, trond.ytterdal@ntnu.no, snorre.aunet.ntnu.no

Abstract-This study aims at comparing two subthreshold flipflop architectures in frequency divider applications, implemented and fabricated in 130 nm CMOS process technology. They are the Power PC (Performance Computing) and Nand race-free flipflops. Identification of a reliable and power efficient flip-flop, used in a frequency divider for ultra-low supply voltages, has been verified by measurements. The simulated results based on a netlist extracted from layout show that upsizing the Power PC flip-flop increases it's reliability while it may still provide lower power consumption than the Nand race-free flip-flop. Based on results verified by measurements for ten chip samples, both frequency dividers have demonstrated functionality down to a  $V_{dd}$  of 135 mV. The Power PC flip-flop based frequency divider is 24% more energy efficient than the Nand race-free counterpart at an ultra-low supply voltage of 160 mV. The energy per operation for the Power PC- and Nand race-free- frequency dividers at the minimum energy point (MEP) of 250 mV, and maximum operating frequency, are 12.2 and 12.5 fJ, respectively.

Index Terms—subthreshold, Power PC flip-flop, Nand race-free flip-flop, frequency divider, reliability, minimum supply voltage.

#### I. INTRODUCTION

Voltage scaling is the most effective technique to reduce static and dynamic power consumption. However, it will increase the delay time and hence, lowering the maximum operating frequency [1].

Many IoT applications such as implantable biomedical devices operate in the kHz range, and power consumption is the primary concern in such applications [2], [3]. However, the required voltage of the most implantable electronic devices is 2-3 V [3]. The output properties of the most recent in vivo energy harvesters (IVEHs) is 150 mV and below [3], [4] which could suit the low voltages for the circuits presented here, while saving energy by not having to use as energy costly DC-DC conversion as one would for higher supply voltages. Therefore, subthreshold circuits operating at the supply voltages lower than the absolute value of the threshold voltage of the transistors might be the best option for such applications.

In the subthreshold regime, the drain current increases exponentially with threshold and gate-source voltage [5]. The two main challenges in subthreshold digital circuit design are the threshold voltage variations caused mainly by random 978-1-7281-2769-9/19/\$31.00 ©2021 IEEE dopant fluctuations (RDF) and Ion to Ioff degradation. Hence, it will affect the functional yield of the digital gates.

Storage cells such as flip-flops, SRAM, and latches are the first blocks that reduce the functional yield of the circuits when reducing the supply voltage. This is because they are more vulnerable to the PVT variations compared to the logic gates.

Furthermore, the storage elements are often the dominant source for the area of the systems. Hence, they are one of the important components for power consumption and energy dissipation of the systems [6], [7]. Thus, the design of reliable flip-flops in the subthreshold regime is a key task for digital designers. In the literature, many studies have been done to compare different flip-flop architectures in the subthreshold regime [8]–[12].

The topology of the logic gates has an impact on the robustness and the performance metric of the total design in the ultra-low subthreshold circuits. The most robust choice for this goal is the standard CMOS logic. Additionally, the use of subthreshold transmission gate (TG) logic offers promising results at subthreshold circuits [13].

In this paper, we have selected and designed reliable Nand race-free and Power PC flip-flop architectures to compare and implement in the frequency divider application.

Based on the simulation results for low supply voltage and the ultra-low frequency of 1 kHz, the Power PC flip-flop is power efficient compared to the Nand race-free flip-flop.

The area for Power PC and Nand race-free flip-flops designed in this study is denser by 1.52X and 1.34X compared to those of the flip-flops in [10] designed using 65 nm technology, respectively.

According to the measured results for ten sample chips, the Power PC frequency divider is 24% more energy efficient compared to the Nand race-free counterpart in ultra-low supply voltage of 160 mV at the maximum operating frequency. The energy per operation for Power PC and Nand race-free frequency dividers at the minimum energy point (MEP) of 250 mV and maximum operating frequency is 12.2 and 12.5 fJ, respectively. The minimum functional supply voltage for both frequency dividers is 135 mV.

To the knowledge of the authors, the lowest supply voltage for the memory cells is 62 mV based on the Schmitt-Trigger logic cells reported in [14], but Schmitt-Trigger logic has more

Authorized licensed use limited to: Norges Teknisk-Naturvitenskapelige Universitet. Downloaded on April 01,2022 at 16:49:56 UTC from IEEE Xplore. Restrictions apply.



Fig. 1: The schematic of the Nand race-free flip-flop



Fig. 2: The schematic of the Power PC flip-flop

transistors and hence consumes more power and area compare to the standard static CMOS. The minimum supply voltage for memory cells based on the standard static CMOS is 132 mV reported in [11]. The minimum supply voltage for our reported flip-flops based on both standard static CMOS and transmission gates is 135 mV which is comparable with the minimum reported voltage for standard static CMOS.

The rest of the paper is arranged as follows. In Section 2, the sizing strategy used for the subthreshold regime is explained. Section 3 is dealt with the constructors of the different flip-flops and frequency dividers. Simulation and measurement results are demonstrated and discussed in sections 4 and 5, respectively. The paper is concluded in section 6.

#### II. SIZING STRATEGY FOR SUBTHRESHOLD

For low frequency systems that are leakage dominated, power reduction requires leakage current reduction. We have selected channel length upsizing as a leakage reduction technique. This technique, not only reduces the leakage current but also increases the robustness of the gates against the process, voltage, and temperature variations (PVT). The channel lengths of the transistors for both flip-flops have been kept constant through the design to reduce the mismatch. The width for PMOS and NMOS has been selected to balance the PMOS and NMOS transistors [15]. The author in [16] shows that the energy is minimum by balancing the leakage current through Pull-up and Pull-down networks. The other factor for selecting the width of the NMOS and PMOS transistor is having a high functional yield for flip-flops at ultra-low





Fig. 4: The layout of the Nand race-free flip-flop  $13.8 \times 6.15 = 84.8 \ \mu um^2$ .



Fig. 5: The layout of the Power PC flip-flop  $12.2 \times 6.15 = 75.0 \ \mu m^2$ .

supply voltages down to 140 mV. The width of the PMOS and NMOS have been selected as 1.8  $\mu m$  and 300 nm, respectively. In the Power PC flip-flop, we have increased the size of the transmission gate transitors to improve the robustness [17]. Through the passing data, transmission gate transitors have an important role. Therefore, they should be faster.

To reduce the leakage current, all the gates in the library have been designed with low leakage transistors (high threshold voltage transistors).

#### III. FLIP-FLOP AND FREQUENCY DIVIDER ARCHITECTURES

Two different flip-flop architectures have been selected and implemented. The Nand race-free and Power PC flip-flops have been selected among the different designs due to the static nature that makes them promising candidates for reliable subtreshold operation.

1: The Nand race-free flip-flop used in [18] is a reliable and simple case for ultra-low supply voltages. It is constructed of Nand and Inverter gates. The Nand based flip-flop has several advantages for ultra-low voltage libraries compared to the other flip-flops: the single phase clock signal in this flip-flop makes it more robust at subthreshold supply voltages. Regular structure and, hence, the regular layout is the other benefit of this flip-flop. This flip-flop is contention free node (the clock signal is single phase. Therefore, there is no Overlap between true and inverted clock signals to make contention in feedback loops of this flip-flop).





Fig. 7: Setup structure for calculating maximum operating frequency





Fig. 9: The socket containing one of the measured chips.

2: The Power PC flip-flop which has been used in the Power PC microprocessor [19] is the second flip-flop. This flip-flop has been used in previous studies reported in [8]–[10]. In this flip-flop, the first stage ( transparent-low latch) is driven by the clock signal, and the second stage (transparent-high latch ) is driven by the inverted clock signal.

Fig. 1 and Fig. 2 show the schematic of the Nand race-free and Power PC flip-flops, respectively.

#### A. Layout and Area of the Flip-flops

In this study, both flip-flops are implemented for a supply voltage of 140 mV and with low leakage transistors (high threshold voltage transistors). Fig. 4 and Fig. 5 show the layout of the Nand race-free and Power PC flip-flops, respectively. Two metal layers have been used for both layout implementations to make a fair comparison. The layout area is Based on the structure complexity, transistor sizes, and transistor counts which are 26 and 18 (two transistors for inverter) for the Nand



Fig. 10: Input-clock delay vs. clock-output delay for Power PC flip-flop based on simulations.



based and Power PC flip-flops, respectively. To have a fair comparison, the QB node also is made for Power PC flip-flop.

The layout area for the Power PC and the Nand race-free flip-flops are  $12.2\times6.15=75.0~\mu m^2$  and  $13.8\times6.15=84.8~\mu m^2$ , respectively. The layout area of the Nand based is 1.13X larger than that of the Power PC flip-flop. The netlist from the layouts of the flip-flops have been extracted by QRC parasitic extraction tool.

The structure of the frequency dividers has been synthesized automatically at the gate level by the Cadence Genus tool using our full custom standard cell library [15]. The cell library has been designed for subthreshold supply voltages. The layouts of the frequency dividers have been generated automatically by the Cadence Innovus place and route tool. The schematic of the frequency divider by three has been shown in Fig. 3.

#### B. Simulation and Measurement Setup

The functionality of the flip-flops is also tested. The output voltage of the flip-flop has been compared to the minimum allowable high voltage  $(0.75 \times V_{dd})$  and maximum allowable low voltage  $(0.25 \times V_{dd})$  for 1000 Monte Carlo simulations considering both mismatch and process variations.

The flip-flops have been tested and characterized in a standard 130 nm CMOS process technology using the Spectre based Virtuoso simulator. The supply voltage has been tested between 150 and 500 mV.



Fig. 12: Maximum operating frequency (Hz) versus supply voltages for both Nand race-free and Power PC flip-flops based on simulations.



Fig. 13: The distribution of the minimum functional supply voltage for ten sample chips based on measurements.

For flip-flops simulations, they have been placed between input and output buffers to consider the current consumption from the previous stage, and to make the real environmental conditions, respectively. The test-bench has been shown in Fig. 6. Fig. 7 shows the setup structure used to calculate the maximum frequency. The clock frequency has been increased linearly until the flip-flop can not latch the data.

The power supply voltage of the designed chip is generated by Rigol DP832A. To create the clock signal, the Agilent 33522A function generator is used . Keithley 6485 Picoammeter is used to measure the current. The ROHDE & SCHWARZ RTE 1022 oscilloscope is used to show the inputs and outputs waves. The chip prototype is shown in Fig. 8. The socket containing one of the measured chips is shown in Fig. 9.

#### IV. SIMULATION AND MEASUREMENT RESULTS

First, we have compared different performance metrics of the two flip-flops based on the netlist from the extracted view. Second, to prove the simulation results about the functionality of the circuits at ultra-low supply voltages , measurement results for frequency divider based on the two flip-flops have been compared.



Fig. 14: The mean energy at the maximum operating frequency for both frequency dividers based on measurements for ten chip samples.

The timing parameters including setup time, hold time, and clock to Q delay have been simulated and reported for both flip-flops. The setup and hold times have been calculated based on the method in [20]. The setup and hold times have been defined as the input-clock/clock-input delay when the clock-output delay is 5 percent of its nominal value. Fig. 10 and Fig. 11 illustrate the input-clock delay versus the clock-output delay for two flip-flops.

The average power consumption at 150 mV supply voltages for both flip-flops at 1 kHz frequency has been calculated. The power of the clock Inverter in the Power PC flip-flop has been included in the power calculations of this flip-flop.

The maximum operating frequency versus the supply voltages has been illustrated for both flip-flops in Fig. 12.

The voltage distribution of the minimum functional supply for ten chip samples is shown in Fig. 13. The minimum functional supply voltage for both frequency dividers is 135 mV.

Energy per operation versus frequency at different supply voltages for both frequency dividers are measured and shown in Fig. 14. The measured energy is the mean energy for ten chip samples. The supply voltage is ranged from 160 mV to 500 mV. All ten samples are functional at 160 mV. The energy per operation at the maximum operating frequency for the Nand race-free and Power PC frequency dividers at the supply voltage of 160 mV is 40.3 and 32.5 fJ, respectively.

MEPs for both frequency dividers are at 250 mV. The energy per operation at the MEPs for the Nand race-free and Power PC frequency dividers is 12.5 and 12.2 fJ, respectively.

#### V. DISCUSSION

Based on the simulation results, comparisons for two flipflops at the same supply voltage and at the low frequency of 1 kHz shows that the Power PC flip-flop has relatively low power consumption compared to the Nand race-free counterpart. The Power PC flip-flop in 1 kHz frequency is 1.34X power efficient compared to the Nand race-free counterpart.

The area of our Power PC and Nand race-free flip-flops designed in 130 nm are  $12.2\times6.15=75.0~{\breve{m}}^2$  and  $13.8\times$ 

TABLE I: Comparison of two flip-flops at the same supply voltage of 150 mV at 1 kHz frequency

| Flip-flop Type | CLK-Q delay(uS) | Setup time(uS) | Hold time(uS) | Power (pW) |
|----------------|-----------------|----------------|---------------|------------|
| Power PC       | 1.59            | 0.678          | 0.551         | 57.2       |
| Nand race-free | 2.14            | 0.559          | 0.525         | 76.8       |

 $6.15=84.8~\mu m^2$ , respectively. The area reported for these flip-flops designed using 65 nm technology is 114  $\mu m^2$ . The area for our flip-flops is denser by 1.52X and 1.34X compared to those of the flip-flops in [10], respectively.

According to the mean measured energy per operation for ten samples, the frequency divider based on the Power PC flip-flop consumes 24% less energy per operation compared to that of the frequency divider based on the Nand race-free flip-flop at the ultra-low supply voltage of 160 mV.

MEPs for both frequency dividers is at 250 mV. The energy per operation at the MEPs for Nand race-free and Power PC frequency dividers 12.5 is and 12.2, respectively.

The mean energy for Nand race-free and Power PC frequency dividers at MEP is improved by 1.99X and 2.02X compared to that of 500 mV supply voltage, respectively.

The minimum functional supply voltage for the frequency dividers reported in [11] and [21] are 132, 137 and 160 mV, respectively. The minimum functional supply voltage in our work is 135 mV.

Our presented results are compatible quite well with the result of [10], where the authors declare, that the power PC flip-flop is power efficient among five flip-flops including Nand race-free in subthreshold regime at the same supply voltage.

#### VI. CONCLUSION

Two different subthreshold flip-flop architectures in frequency divider applications are implemented and compared using a 130 nm CMOS process. The measurement results show that both frequency dividers are functional at supply voltages as low as 135 mV for some samples. The simulation results show that by upsizing the transmission gates in the Power PC flip-flop, the reliability of this flip-flop has increased, and still, has lower power consumption compared to the Nand race-free flip-flop. The Power PC frequency divider is 24% more energy efficient compared to the Nand race-free counterpart for an ultra-low supply voltage of 160 mV. The energy per operation for the Power PC and Nand race-free frequency dividers at the MEP of 250 mV and maximum operating frequency is 12.2 and 12.5 fJ, respectively.

#### REFERENCES

- B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, et al., "Energy-efficient subthreshold processor design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 8, pp. 1127–1137, 2009.
- [2] S. R. Sridhara, "Ultra-low power microcontrollers for portable, wearable, and implantable medical electronics," in 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011), pp. 556–560, IEEE, 2011.
- [3] B. Shi, Z. Li, and Y. Fan, "Implantable energy-harvesting devices," Advanced Materials, vol. 30, no. 44, p. 1801511, 2018.

- [4] A. Wang, Z. Liu, M. Hu, C. Wang, X. Zhang, B. Shi, Y. Fan, Y. Cui, Z. Li, and K. Ren, "Piczoelectric nanofibrous scaffolds as in vivo energy harvesters for modifying fibroblast alignment and proliferation in wound healing," *Nano Energy*, vol. 43, pp. 63–71, 2018.
- [5] A. G. Andreou, K. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, K. Strohbehn, et al., "Current-mode subthreshold mos circuits for analog vlsi neural systems," *IEEE Transactions on neural networks*, vol. 2, no. 2, pp. 205–213, 1991.
- [6] Y. Ueda, H. Yamauchi, M. Mukuno, S. Furuichi, M. Fujisawa, F. Qiao, and H. Yang, "6.33 mw mpeg audio decoding on a multimedia processor," in 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, pp. 1636–1645, IEEE, 2006.
- [7] M. Yamaoka, Y. Shinozaki, N. Maeda, Y. Shimazaki, K. Kato, S. Shimada, K. Yanagisawa, and K. Osada, "A 300-mhz 25/spl mu/a/mbleakage on-chip sram module featuring process-variation immunity and low-leakage-active mode for mobile-phone application processor," *IEEE journal of solid-state circuits*, vol. 40, no. 1, pp. 186–194, 2005.
- [8] H. P. Alstad and S. Aunet, "Seven subthreshold flip-flop cells," in Norchip 2007, pp. 1–4, IEEE, 2007.
- [9] E. Låte, A. A. Vatanjou, T. Ytterdal, and S. Aunet, "Extended comparative analysis of flip-flop architectures for subthreshold applications in 28 nm fd-soi," *Microprocessors and Microsystems*, vol. 48, pp. 11–20, 2017.
- [10] M. Voernes, T. Ytterdal, and S. Aunet, "Performance comparison of 5 subthreshold emos flip-flops under process-, voltage-, and temperature variations, based on netlists from layout," in 2014 NORCHIP, pp. 1–6, IEEE, 2014.
- [11] A. A. Vatanjou, T. Ytterdal, and S. Aunet, "4 sub-/near-threshold flip-flops with application to frequency dividers," in 2015 European Conference on Circuit Theory and Design (ECCTD), pp. 1–4, IEEE, 2015.
- [12] H. Mostafa, M. Anis, and M. Elmasry, "Comparative analysis of power yield improvement under process variation of sub-threshold flip-flops," in *Proceedings of 2010 IEEE International Symposium on Circuits and Systems*, pp. 1739–1742, IEEE, 2010.
- N. Reynders and W. Dehaene, "Variation-resilient building blocks for ultra-low-energy sub-threshold design," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 59, no. 12, pp. 898–902, 2012.
   N. Lotze and Y. Manoli, "A 62 mv 0.13 cmos standard-cell-based
- [14] N. Lotze and Y. Manoli, "A 62 mv 0.13 cmos standard-cell-based design technique using schmitt-trigger logic," *IEEE journal of solid-state circuits*, vol. 47, no. 1, pp. 47–60, 2011.
- [15] S. H. Zadeh, T. Ytterdal, and S. Aunet, "Ultra-low voltage subthreshold binary adder architectures for iot applications: Ripple carry adder or kogge stone adder," in 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of Systemon-Chip (SoC), pp. 1–7, IEEE, 2019.
- [16] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, *et al.*, "Exploring variability and performance in a sub-200-mv processor," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 4, pp. 881–891, 2008.
- [17] J. Kwong and A. P. Chandrakasan, "Variation-driven device sizing for minimum energy sub-threshold circuits," in *Proceedings of the 2006 international symposium on Low power electronics and design*, pp. 8– 13, 2006.
- [18] C. Piguet, J.-M. Masgonty, and C. Arm, "D-type master-slave flip-flop," Nov. 27 2001. US Patent 6,323,710.
- [19] G. Gerosa, S. Gary, C. Dietz, D. Pham, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito, T. Ngo, S. Litch, et al., "A 2.2 w, 80 mhz superscalar risc microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 12, pp. 1440–1454, 1994.
- [20] D. Markovic, B. Nikolic, and R. Brodersen, "Analysis and design of lowenergy flip-flops," in *Proceedings of the 2001 international symposium* on Low power electronics and design, pp. 52–55, 2001.

[21] S. Aunet and A. Hasanbegovic, "Memory elements based on minority-3 gates and inverters implemented in 90 nm cmos," in 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, pp. 267–272, IEEE, 2010.

# 4.8 Manuscript VIII: Energy efficiency of serial versus parallel adders.

This paper is submitted for publication and is therefore not included

### 4.9 Appendix:



## IEEE Nordic Circuits and Systems Conference 2020 (Oslo, Norway) VIRTUAL October 27-28, 2020

## **Certificate on Nomination**

Multi-threshold voltage and dynamic body biasing techniques for energy efficient ultra low voltage subthreshold adders

by Somayeh Zadeh, Trond Ytterdal, and Snorre Aunet, NTNU, Norway

was nominated as one of the three Best Paper Award candidates in IEEE NorCAS 2020

In Tampere, Finland, on October 28, 2020

m Nn

Prof. Jari Nurmi IEEE NorCAS 2020 General Chair



#### Program

#### 10th Annual Workshop of the

Norwegian PhD Network on Nanotechnology for Microsystems

#### Oral session 3: Nanomaterials science II Chair: Justin Wells, NTNU

**Clarion Hotel the Edge** 17 - 19 June 2019

| Chair: | Justin W | ells, NTNU                                                                                         |
|--------|----------|----------------------------------------------------------------------------------------------------|
| 09:00  | 09:30    | Yuri Suzuki, Stanford University                                                                   |
|        |          | Controlling spin: Giant Rashba splitting, spin currents and the future of spintronics              |
| 09:30  | 09:45    | Sam Sloetjes, NTNU                                                                                 |
|        |          | Exchange explosions of topological edge defects in a square micromagnet                            |
| 09:45  | 10:00    | Suraj Singh, NTNU                                                                                  |
|        |          | Tailoring the magnetodynamic properties of dipole coupled 1D magnonic crystals by shape anisotropy |
| 10:00  | 10:15    | Stephanie Burgmann, NTNU                                                                           |
|        |          | Enabling nucleation phenomena studies of ALD deposited films by In-situ high-resolution TEM        |
| 10:15  | 10:30    | Xin Song, UiO                                                                                      |
|        |          | Metallization on thermoelectric material ZnSb                                                      |
| 10:30  | 11:00    | Coffee break                                                                                       |
| 11:00  | 11:30    | Erik Folven, NTNU                                                                                  |
|        |          | Long-range order in a magnetic metamaterial                                                        |
| 11:30  | 11:45    | Kristoffer Kjærnes, NTNU                                                                           |
|        |          | Antiferromagnetic spin engineering in LaFeO3                                                       |
| 11:45  | 12:00    | Sihai Luo, NTNU                                                                                    |
|        |          | Plasmonic nanogaps through SAM-assisted adhesion lithography                                       |
| 12:00  | 12:15    | Feng Wang, NTNU                                                                                    |
|        |          | Phase transformable slippery liquid infused porous surfaces with durable anti-icing performance    |
| 12:15  | 12:30    | Martin Greve, UiB                                                                                  |
|        |          | Wide-field microscopy for magnetic field imaging using nitrogen vacancies in diamond               |
| 12:30  | 12:45    | Alex Schenk, NTNU                                                                                  |
|        |          | Unexpected 3D electronic structure in ultrathin doped diamond films                                |
| 12:45  | 13:00    | Leg stretch                                                                                        |
| 13:00  | 14:00    | Lunch                                                                                              |

## Oral Session 4: Devices and circuits

| 15:15  | 15:45    | Conee break                                                                           |
|--------|----------|---------------------------------------------------------------------------------------|
| 15.15  | 15.45    | Coffee breek                                                                          |
|        |          | Colpitts oscilators                                                                   |
| 15:00  | 15:15    | Mehdi Azadmehr, USN                                                                   |
|        |          | A low-power noise-shaping SAR ADC in 28 nm FDSOI – challenges, solutions and results. |
| 14:45  | 15:00    | Harald Garvik, NTNU                                                                   |
|        |          | Low energy CMOS building blocks for IoT                                               |
| 14:30  | 14:45    | Somayeh Hossein Zadeh, NTNU                                                           |
|        |          | Energy harvesting with piezoelectric films                                            |
| 14:00  | 14:30    | Alan Seabaugh, University of Notre Dame                                               |
| Chair: | Snorre A | unet, NINU                                                                            |

#### **Oral session 5: Bionano and microfluidics**

| Chair: | Krishna . | Agarwal, UiT                                                                                             |
|--------|-----------|----------------------------------------------------------------------------------------------------------|
| 15:45  | 16:00     | Jakob Vinje, NTNU                                                                                        |
|        |           | Surfaces made by Electron Beam Lithography for cellular studies                                          |
| 16:00  | 16:15     | Anowarul Habib, UiT                                                                                      |
|        |           | Fabrication of gold nanostructures for surface-enhanced Raman scattering                                 |
| 16:15  | 16:30     | Fredrik Kristoffer Mürer, NTNU                                                                           |
|        |           | Studying the 3D structures of bone and cartilage with modern X-ray tomography techniques                 |
| 16:30  | 16:45     | Azeem Ahmad, UiT                                                                                         |
|        |           | Quantitative phase microscopy in biomedical imaging                                                      |
| 16:45  | 17:00     | David André Coucheron, UiT                                                                               |
|        |           | Integration of optical nanoscopy and quantitative phase microscopy for investigation of liver sinusoidal |
|        |           | endothelial cell fenestrations                                                                           |
| 17:00  | 17:30     | Ralph Bernstein, Kjeller Innovasjon AS                                                                   |
|        |           | Commercialization of R&D – models, methodology, and cases                                                |
|        |           |                                                                                                          |

#### **Tuesday 18 June**

## References

- Ultra-sub-threshold operation of always-on digital circuits for IoT) applications by use of Schmitt trigger gates, author=Lotze, Niklas and Manoli, Yiannos, journal=IEEE Transactions on Circuits and Systems I: Regular Papers, volume=64, number=11, pages=2920-2933, year=2017, publisher=IEEE.
- [2] M.I.T. Department of Electrical Engineering and Computer Science, 32-bit ALU. https://computationstructures.org/exercises/alu/ lab.html. 2015.
- M. Afghahi. A robust single phase clocking for low power, high-speed vlsi applications. *IEEE Journal of Solid-State Circuits*, 31(2):247-254, 1996.
- [4] S. Aunet. On the reliability of ultra low voltage circuits built from minority-3 gates. In 20th European Conference on Circuit Theory and Design (ECCTD), pages 540–543. IEEE, 2011.
- [5] S. Aunet and A. Hasanbegovic. Memory elements based on minority-3 gates and inverters implemented in 90 nm CMOS. In *13th IEEE (S.*
- [6] M. Bamal, E. Grossar, M. Stucchi, and K. Maex. Interconnect width selection for deep submicron designs using the table lookup method. In Proceedings of the 2004 International Workshop on System Level Interconnect Prediction, pages 41–44, 2004.
- [7] V. Beiu, A. Djupdal, and S. Aunet. Ultra low-power neural inspired addition: When serial might outperform parallel architectures. In *International Work-Conference on Artificial Neural Networks*, pages 486–493. Springer, 2005.
- [8] H. K. O. Berge, A. Hasanbegović, and S. Aunet. Muller c-elements based on minority-3 functions for ultra low voltage supplies. In 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems, pages 195–200. IEEE, 2011.
- [9] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat. Analysis and minimization of practical energy in 45nm subthreshold logic circuits. In
2008 IEEE International Conference on Computer Design, pages 294–300. IEEE, 2008.

- [10] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat. Interests and limitations of technology scaling for subthreshold logic. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 17(10):1508–1519, 2009.
- [11] B. H. Calhoun and A. P. Chandrakasan. A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation. *IEEE Journal of Solid-State Circuits*, 42(3):680–688, 2007.
- [12] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy. A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 44(2): 650–658, 2009.
- [13] J. Chen, L. T. Clark, and Y. Cao. Robust design of high fan-in/out subthreshold circuits. In 2005 International C)onference on Computer Design, pages=405-410, year=2005, organization=IEEE.
- [14] J. Chen, L. T. Clark, and T.-H. Chen. An ultra-low-power memory with a subthreshold power supply voltage. *IEEE Journal of Solid-State Circuits*, 41(10):2344–2353, 2006.
- [15] Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, and C.-T. Chuang. 40 nm bit-interleaving 12T subthreshold SRAM with data-aware write-assist. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 61(9):2578–2585, 2014.
- [16] K. M. Chu and D. L. Pulfrey. A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic. *IEEE Journal of Solid-State Circuits*, 22(4):528–532, 1987.
- [17] H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Power challenges may end the multicore era. *Communi*cations of the ACM, 56(2):93–102, 2013.
- [18] N. Ghobadi, M. Mehran, A. Afzali-Kusha, et al. Low power 4-bit full adder cells in subthreshold regime. In 18th Iranian Conference on Electrical Engineering, pages 362–367. IEEE, 2010.
- [19] S. Ghosh, D. Mohapatra, G. Karakonstantis, and K. Roy. Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking. *IEEE transactions on Very Large scale Integration (VLSI) Systems*, 18 (9):1301–1309, 2009.

- [20] W. Gosney. Subthreshold drain leakage currents in MOS field-effect transistors. *IEEE Transactions on Electron Devices*, 19(2):213–219, 1972.
- [21] K. Granhaug and S. Aunet. Six subthreshold full adder cells characterized in 90 nm CMOS technology. In 2006 IEEE Design and Diagnostics of Electronic Circuits and systems, pages 25–30. IEEE, 2006.
- [22] X. Guo, V. Verma, P. Gonzalez-Guerrero, S. Mosanu, and M. R. Stan. Back to the future: Digital circuit design in the finfet era. *Journal of Low Power Electronics*, 13(3):338–355, 2017.
- [23] N. H. E. Weste and D. Harris. CMOS VLSI Design: A Circuits and Systems Perspective. Addison-Wesley, 2011.
- [24] S. Hanson, M. Seok, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw. A low-voltage processor for sensing applications with Picowatt standby mode. *IEEE Journal of Solid-State Circuits*, 44(4):1145–1155, 2009.
- [25] J. R. Hauser. Noise margin criteria for digital logic circuits. IEEE Transactions on Education, 36(4):363–368, 1993.
- [26] J. Hu and X. Yu. Near-threshold full adders for ultra low-power applications. In 2010 Second Pacific-Asia Conference on Circuits, Communications and System, volume 1, pages 300–303. IEEE, 2010.
- [27] J. Keane, H. Eom, T.-H. Kim, S. Sapatnekar, and C. Kim. Stack sizing for optimal current drivability in subthreshold circuits. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 16(5):598–602, 2008.
- [28] S. Kerzenmacher, J. Ducrée, R. Zengerle, and F. Von Stetten. Energy harvesting by implantable abiotically catalyzed glucose fuel cells. *Journal of Power Sources*, 182(1):1–17, 2008.
- [29] T.-H. Kim, J. Keane, H. Eom, and C. H. Kim. Utilizing reverse shortchannel effect for optimal subthreshold circuit design. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 15(7):821–829, 2007.
- [30] P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. *IEEE Transactions* on Computers, 100(8):786–793, 1973.
- [31] J. Koomen. Investigation of the most channel conductance in weak inversion. Solid-State Electronics, 16(7):801–810, 1973.

- [32] J. P. Kulkarni, K. Kim, and K. Roy. A 160 mV robust Schmitt trigger based subthreshold SRAM. *IEEE Journal of Solid-State Circuits*, 42 (10):2303–2313, 2007.
- [33] J. Kwong and A. P. Chandrakasan. Variation-driven device sizing for minimum energy sub-threshold circuits. In *Proceedings of the 2006 International Symposium on Low Power Electronics and Design*, pages 8–13, 2006.
- [34] E. Låte, T. Ytterdal, and S. Aunet. A loadless 6T SRAM cell for sub-& near-threshold operation implemented in 28 nm fd-soi CMOS technology. *Integration*, 63:56–63, 2018.
- [35] C.-H. Lo and S.-Y. Huang. PPN based 10T SRAM cell for low-leakage and resilient subthreshold operation. *IEEE Journal of Solid-State Circuits*, 46(3):695–704, 2011.
- [36] P. Meinerzhagen, S. Y. Sherazi, A. Burg, and J. N. Rodrigues. Benchmarking of standard-cell based memories in the Sub-Vt domain in 65nm CMOS technology. *IEEE Journal on Emerging and Selected Topics* in Circuits and Systems, 1(2):173–182, 2011.
- [37] S. Narendra, V. De, D. Antoniadis, A. Chandrakasan, and S. Borkar. Scaling of stack effect and its application for leakage reduction. In *Proceedings of the 2001 International Symposium on Low Power Electronics and Design*, pages 195–200, 2001.
- [38] C. Neau and K. Roy. Optimal body bias selection for leakage improvement and process compensation over different technology generations. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design, pages 116–121, 2003.
- [39] P. Ng, P. T. Balsara, and D. Steiss. Performance of CMOS differential circuits. *IEEE Journal of Solid-State Circuits*, 31(6):841–846, 1996.
- [40] C. Piguet. Low-power CMOS Circuits: Technology, Logic Design and CAD tools. CRC Press, 2005.
- [41] Y. Pu, H. Corporaal, Y. Ha, et al. VT balancing and device sizing towards high yield of sub-threshold static logic gates. In *Proceedings* of the 2007 International Symposium on Low Power Electronics and DDesign (ISLPED'07), pages 355–358. IEEE, 2007.
- [42] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolić. Digital Integrated Circuits: A Design Perspective.

- [43] Y. K. Ramadass and A. P. Chandrakasan. A battery-less thermoelectric energy harvesting interface circuit with 35 mV startup voltage. *IEEE Journal of Solid-State Circuits*, 46(1):333–341, 2010.
- [44] N. Reynders and W. Dehaene. A 190 mV supply, 10 MHz, 90 nm CMOS, pipelined sub-threshold adder using variation-resilient circuit techniques. In *IEEE Asian Solid-state Circuits Conference 2011*, pages 113–116. IEEE, 2011.
- [45] E. Seevinck, F. J. List, and J. Lohstroh. Static-noise margin analysis of MOS SRAM cells. *IEEE Journal of Solid-State Circuits*, 22(5):748–754, 1987.
- [46] M. Seok, S. Hanson, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw. The Phoenix processor: A 30 pW platform for sensor applications. In 2008 IEEE Symposium on VLSI Circuits, pages 188–189. IEEE, 2008.
- [47] B. Shi, Z. Li, and Y. Fan. Implantable energy-harvesting devices. Advanced Materials, 30(44):1801511, 2018.
- [48] R. M. Swanson and J. D. Meindl. Ion-implanted complementary mos transistors in low-voltage circuits. *IEEE Journal of Solid-State Circuits*, 7(2):146–153, 1972.
- [49] A. Teman, L. Pergament, O. Cohen, and A. Fish. A 250 mV 8 KB 40 nm ultra-low power 9T supply feedback SRAM (SF-SRAM). *IEEE Journal of Solid-State Circuits*, 46(11):2713–2726, 2011.
- [50] Y. Tsividis. Eric vittoz and the strong impact of weak inversion circuits. *IEEE SSolid-State Circuits Society Newsletter, volume=13, number=3, pages=56-58, year=2008, publisher=IEEE.*
- [51] M.-H. Tu, J.-Y. Lin, M.-C. Tsai, C.-Y. Lu, Y.-J. Lin, M.-H. Wang, H.-S. Huang, K.-D. Lee, W.-C. Shih, S.-J. Jou, et al. A single-ended disturb-free 9T subthreshold SRAM with cross-point data-aware write word-line structure, negative bit-line, and adaptive read operation timing tracing. *IEEE Journal of Solid-State Circuits*, 47(6):1469–1482, 2012.
- [52] Y. Ueda, H. Yamauchi, M. Mukuno, S. Furuichi, M. Fujisawa, F. Qiao, and H. Yang. 6.33 mW mpeg audio decoding on a multimedia processor. In 2006 IEEE International Solid-state Circuits Conference-Digest of Technical Papers, pages 1636–1645. IEEE, 2006.

- [53] A. A. Vatanjou, T. Ytterdal, and S. Aunet. 4 sub-/near-threshold flipflops with application to frequency dividers. In 2015 European Conference on Circuit Theory and Design (ECCTD), pages 1–4. IEEE, 2015.
- [54] N. Verma and A. P. Chandrakasan. A 65 nm 8T sub-Vt SRAM employing sense-amplifier redundancy. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pages 328–606. IEEE, 2007.
- [55] N. Verma and A. P. Chandrakasan. A 256 KB 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. *IEEE Journal of Solid-State Circuits*, 43(1):141–149, 2008.
- [56] E. A. Vittoz. The electronic watch and low-power circuits. *IEEE Solid-State Circuits Society Newsletter*, 13(3):7–23, 2008.
- [57] M. Voernes, T. Ytterdal, and S. Aunet. Performance comparison of 5 subthreshold CMOS flip-flops under process-, voltage-, and temperature variations, based on netlists from layout. In 2014 NORCHIP, pages 1–6. IEEE, 2014.
- [58] A. Wang and A. Chandrakasan. A 180-mV subthreshold FFT processor using a minimum energy design methodology. *IEEE Journal of Solid-State Circuits*, 40(1):310–319, 2005.
- [59] A. Wang, A. P. Chandrakasan, and S. V. Kosonocky. Optimal supply and threshold scaling for subthreshold CMOS circuits. In *Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms* for VLSI Systems Design. ISVLSI 2002, pages 7–11. IEEE, 2002.
- [60] A. Wang, B. H. Calhoun, and A. P. Chandrakasan. Sub-threshold design for ultra low-power systems, volume 95. Springer, 2006.
- [61] A. Wang, Z. Liu, M. Hu, C. Wang, X. Zhang, B. Shi, Y. Fan, Y. Cui, Z. Li, and K. Ren. Piezoelectric nanofibrous scaffolds as in vivo energy harvesters for modifying fibroblast alignment and proliferation in wound healing. *Nano Energy*, 43:63–71, 2018.
- [62] H. Yamauchi. Embedded SRAM design in nanometer-scale technologies. In *Embedded Memories for Nano-Scale VLSIs*, pages 39–88. Springer, 2009.
- [63] S. H. Zadeh, T. Ytterdal, and S. Aunet. Comparison of ultra low power full adder cells in 22 nm FDSOI technology. In 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), pages 1–5. IEEE, 2018.

- [64] S. H. Zadeh, T. Ytterdal, and S. Aunet. Exploring optimal back bias voltages for ultra low voltage CMOS digital circuits in 22 nm FDSOI technology. In 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), pages 1–6. IEEE, 2019.
- [65] S. H. Zadeh, T. Ytterdal, and S. Aunet. Ultra-low voltage subthreshold binary adder architectures for IoT applications: Ripple carry adder or Kogge Stone adder. In 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), pages 1–7. IEEE, 2019.
- [66] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester. Analysis and mitigation of variability in subthreshold design. In *Proceedings of the 2005 International Symposium on Low Power Electronics and Design*, pages 20-25, 2005.
- [67] B. Zhai, R. G. Dreslinski, D. Blaauw, T. Mudge, and D. Sylvester. Energy efficient near-threshold chip multi-processing. In *Proceedings* of the 2007 International Symposium on Low Power Electronics and Design (ISLPED'07), pages 32–37. IEEE, 2007.
- [68] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, et al. Energy-efficient subtreshold processor design. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 17(8):1127–1137, 2009.
- [69] R. Zimmermann and W. Fichtner. Low-power logic styles: CMOS versus pass-transistor logic. *IEEE Journal of Solid-State Circuits*, 32(7):1079– 1090, 1997.



ISBN 978-82-326-6993-6 (printed ver.) ISBN 978-82-326-6316-3 (electronic ver.) ISSN 1503-8181 (printed ver.) ISSN 2703-8084 (online ver.)

