# Christian Steinsland 

# Design and Implementation of a Digital Standard Cell Library for 28 nm Technology 

Master's thesis in Electronic System Design
Supervisor: Trond Ytterdal, Snorre Aunet
June 2021

## Christian Steinsland

## Design and Implementation of a Digital Standard Cell Library for 28 nm Technology

Master's thesis in Electronic System Design
Supervisor: Trond Ytterdal, Snorre Aunet
June 2021
Norwegian University of Science and Technology
Faculty of Information Technology and Electrical Engineering
Department of Electronic Systems

## - NTNU

Norwegian University of Science and Technology


#### Abstract

A digital standard cell library has been designed and implemented for a 28 nm technology. The library has been designed and optimized for a supply voltage of 300 mV , to be compatible with a standard design flow. Each cell has been characterized with extracted parasitic components. Combinatorial logic gates, including compound logic gates, and sequential cells were implemented with SLVT (Super Low VT) transistors. The library has been used to synthesize a functional RISC-V architecture (PicoRV32). The motivation was to verify the functionality of the standard cell library and obtain quantitative results of the performance of the library. The minimum energy point (at room temperature in the TT-corner) for the CPU was found to be with a supply voltage of 500 mV and a frequency of 20 MHz . By increasing the supply voltage to 600 mV , the CPU supports a 50 MHz clock. The highest simulated frequency was 250 MHz at 1 V


## Preface

This report is written as an assignment for TFE4930, the master thesis in the Electronic Systems Design and Innovation program at NTNU Trondheim. The goal of the project is to design and characterize a digital standard cell library in a 28 nm technology, with a supply voltage of 300 mV . All work presented in this report is a continuation of a previous project (presented in [1]) which is assumed to be known to the reader. In cases where it is deemed necessary, some information may be revisited for clarification purposes. Despite the similarity of the content, the implementation of some of the cells has been changed.

I want to acknowledge and thank my supervisors:
Prof. Trond Ytterdal for always being available for questions, and being an astounding resource for insight to the tools related to design of CMOS circuits.

Prof. Snorre Aunet for the uttermost insight in digital design, and providing vital guidance on solutions and results that are beneficial to pursue.

I would also like to thank Fredrik Feyling. By allowing me to contribute on developing PADE ${ }^{1}$, I had the possibility to develop testbenches in a much simpler manner than before. PADE was used to simulate some simple synthesized designs (presented in Section IV-C), and without it I would require much more time on setting everything up.

Asbjørn Djupdal has helped me to set up the top module for the PicoRV32 CPU, as presented in Section IV-D, and the testbench of the CPU, presented in IV-D2. I would not have the time to implement and simulate on a circuit of this size without his help, for which I am very appreciative.

[^0]
## TAble of Contents

List of Figures ..... v
List of Tables ..... vi
I Introduction ..... 1
I-A Outline ..... 2
II Background ..... 3
II-A Design flow ..... 3
II-A1 Cell design ..... 4
II-B Digital cells and logic gates ..... 5
II-B1 Compound gates ..... 5
III Methodology ..... 8
III-A Tools ..... 8
III-A1 Design of standard cells ..... 8
III-A2 Library characterization ..... 8
III-A3 Synthesis ..... 8
III-B Testbench ..... 8
IV Implementation ..... 9
IV-A Digital Cells in standard cell library ..... 9
IV-A1 Transistors ..... 9
IV-A2 Simple logic gates ..... 11
IV-A3 Compound logic gates ..... 11
IV-A4 Buffers ..... 11
IV-A5 Multiplexer ..... 11
IV-A6 D-type flip-flop ..... 11
IV-A7 D-type latches ..... 12
IV-A8 Full Adder ..... 12
IV-A9 Filler ..... 12
IV-B Library characterization ..... 13
IV-C Synthesis of digital designs ..... 13
IV-C1 Full adder ..... 13
IV-C2 Counter ..... 13
IV-D PicoRV32 ..... 14
IV-D1 P\&R ..... 14
IV-D2 Testbench ..... 14
V Results ..... 18
V-A Library characterization ..... 18
V-B Full Adder ..... 19
V-C Counter ..... 19
V-D PicoRV32 ..... 20
V-D1 Synthesis and P\&R ..... 20
V-D2 Simulation ..... 22
VI Discussion ..... 24
VI-A Library characterization ..... 24
VI-A1 Library characterization for different operating conditions ..... 25
VI-B Synthesized Designs ..... 25
VI-B1 Full Adder ..... 25
VI-B2 8-bit Counter ..... 26
VI-B3 PicoRV32 ..... 26
VI-C Area ..... 27
VII Conclusion ..... 29
VII-A Using SLVT transistors ..... 29
VII-B Drive Strength ..... 29
VII-C PicoRV32 ..... 29
Bibliography ..... 30
Appendix ..... 31
A Presentation of standard cells ..... 31
B Datasheets ..... 54
C Library characterization for different operating conditions ..... 126

## List of Figures

1 Overview og standard cell library design [1, Fig. 4] ..... 3
2 Design flow of digital cells [1, Fig. 5] ..... 4
3 AOI22, an example of logic gate of type And-Or-Invert ..... 6
4 OAI22, an example of logic gate of type Or-And-Invert ..... 7
5 Layout for the PMOS transistors ..... 10
6 Layout for the NMOS transistors ..... 10
7 Layout for FILLER ..... 12
8 Simulation results of full adder ..... 19
9 Simulation results of 8-bit counter ..... 20
10 Layout of PicoRV32 after P\&R ..... 21
11 Area as a function of supply voltage and frequency ..... 22
12 Logic gates included in the synthesized results ..... 22
13 Supported frequencies ..... 23
14 Power consumption ..... 23
15 Energy consumption ..... 23
16 Standard cell: INV1X1 ..... 31
17 Standard cell: INV1X4 ..... 32
18 Standard cell: BUFF1X1 ..... 33
19 Standard cell: NAND2X1 ..... 34
20 Standard cell: AND2X1 ..... 35
21 Standard cell: NOR2X1 ..... 36
22 Standard cell: OR2X1 ..... 37
23 Standard cell: XNOR2X1 ..... 38
24 Standard cell: XOR2X1 ..... 39
25 Standard cell: AOI12X1 ..... 40
26 Standard cell: AOI22X1 ..... 41
27 Standard cell: AOI112X1 ..... 42
28 Standard cell: AOI212X1 ..... 43
29 Standard cell: AOI222X1 ..... 44
30 Standard cell: OAI12X1 ..... 45
31 Standard cell: OAI22X1 ..... 46
32 Standard cell: OAI211X1 ..... 47
33 Standard cell: OAI222X1 ..... 48
34 Standard cell: MUX2X1 ..... 49
35 Standard cell: DFFX1 ..... 50
36 Standard cell: DFFX4 ..... 51
37 Standard cell: DLX1 ..... 52
38 Standard cell: FAX1 ..... 53
List of Tables
I Tools used in the design flow of the standard cell library ..... 8
II Digital cells implemented in the standard cell library ..... 9
III Area of the implemented cells ..... 10
IV Library characterization results for different operating conditions ..... 19
V Supported pairs of frequency and voltages after synthesis ..... 21
VI Comparison of synthesis results with compound gates included or excluded ..... 21
VII Simulation results ..... 23
VIII Leakage power for different drive strengths ..... 24
IX Delay for rising and falling output for a selection of cells ..... 24
X Digital results, representing the values from Fig 8 ..... 26
XI Digital results, representing the values from Fig 9 ..... 26
XII Library characterization results for nominal conditions ( $25^{\circ} \mathrm{C}$, TT-corner) ..... 126
XIII Library characterization results for $-20^{\circ} \mathrm{C}$, in the SS-corner ..... 126
XIV Library characterization results for $85^{\circ} \mathrm{C}$, in the FF-corner ..... 127

## GLOSSARY

FF Fast-Fast, where both PMOS and NMOS are in the fast process corner
library characterization The process of characterizing the behavior of each cell in a standard cell library
main netlist The netlist synthesized for 300 mV and 3.2 MHz , with parasitic components extracted

PADE Python aided Analog Design Environment
PicoRV32 An open-source, size-optimized RISC-V CPU implementation
poly Short name for polysilicon, which is made of small crystalline regions of silicon
PVT Process-Voltage-Temperature variations. Inaccuracies across variations due to process corners, voltage and/or temperature

RISC-V Open source standard instruction set architecture based on Reduced Instruction Set Computer (RISC) principles

| SS | Slow-Slow, where both PMOS and NMOS are in the slow process corner |
| :--- | :--- |
| standard cell library | A library containing pre-designed and pre-verified technology-dependent digital <br> cells, available when synthesizing a technology-independent RTL-design |
| synthesis | Gate level synthesis, the process of mapping technology-independent RTL <br> description to technology-dependant CMOS logic using a library of pre- <br> characterized standard cells |
| SystemVerilog | A Hardware Descriptive Language (HDL) language (and Hardware Verification <br> Language), based on Verilog and some extensions |
| TT | Typical-Typical, where both PMOS and NMOS are in the typical process corner |
| Verilog | A Hardware Descriptive Language (HDL) used to model digital electronic <br> systems |

## ACRONYMS

| AOI | And-Or-Invert |
| :---: | :---: |
| ASIC | Application-Specific Integrated Circuit |
| CMOS | Complemetary Metal-Oxide Semiconductor |
| CPU | Central Processing Unit |
| DRC | Design Rule Check |
| HDL | Hardware Descriptive Language |
| lef | Library Exhange Format |
| LVS | Layout Versus Schematic |
| NMOS | Negative-channel Metal-Oxide Semiconductor |
| OAI | Or-And-Invert |
| P\&R | Place And Route |
| PDK | Process Design Kit |
| PMOS | Positive-channel Metal-Oxide Semiconductor |
| POS | Product-Of-Sums |
| RISC | Reduced Instruction Set Computer |
| RTL | Register-Transfer-Level |
| SLVT | Super Low VT |
| SOP | Sum-Of-Products |
| VT | Threshold voltage ( $V_{t}$ ) |

## I. Introduction

The implementation of modern digital designs requires an increasingly amount of complexity, and requirements for computer performance still increases. Although full-custom ASIC design allows more control of the optimization, there are major drawbacks regarding design-time and requirements in skill for designers. A major factor in the rapid growth of integrated circuits is the use of standard cell libraries [2]. By using pre-designed and pre-verified standard cells to perform various system functions, time can be spent on working on the Register-Transfer-Level (RTL). A tool may then use the technology-independent RTL description and map it to technology-dependent CMOS logic using a library of pre-characterized standard cells. This process is called gate-level synthesis.

In recent years, as the complexity increases and the size of transistors decreases, there has been an increased focus on the energy efficiency of integrated circuits. Patrick P. Gelsinger predicted in 2001 that the development of integrated circuits meant that the power consumption would become higher than what is practically possible [3]. His predictions imply that the power density would reach the power density of a nuclear reactor by 2005, a rocket nozzle by 2010, and the surface of the sun in 2015. As this is practically impossible, power consumption needs to be addressed. Often this can be achieved by decreasing the supply voltage, decreasing the clock frequency, or by various implementations to reduce power consumption in Hardware Descriptive Language (HDL). Decreasing the supply voltage to subthreshold or near-threshold values can often be necessary to realize self-powered systems.

This report will present an implementation of a standard cell library, which consists of digital cells that perform a sequential or combinatorial function, and can then be used by a synthesis tool to generate a digital design. An implementation of a RISC-V CPU in HDL has been used to test the library. As the main focus is the library itself, there have not been any efforts made to optimize or improve the RTL design. The chosen processor is a PicoRV32, which is a size-optimized RISC-V CPU, presented in Section IV-D.

The technology is a commercially available 28 nm CMOS technology. All transistors were designed with the minimum length allowed by the Process Design Kit (PDK) and width of 200 nm [1]. The size of the transistors were equal in every cell in the standard cell library to simplify the design process. SLVT transistors were used to maximize the speed.

Because of the importance of energy efficiency, the library has been designed to use a supply voltage of 300 mV , which is a near-threshold voltage. Some analysis has been done to see how varying the supply voltage affects the overall speed and energy-efficiency of the CPU.

There were no requirements for the frequency, as supported frequency can vary based on which RTL design is implemented. However, the following were desired to find to characterize the potential of the library:

- The maximum possible frequency for operation with a supply voltage of 300 mV .
- The minimum possible supply voltage for operation with a frequency of 50 MHz , for applications that require higher speed.
- The minimum energy point to perform a set of instructions, with a pair of frequencies and supply voltages.

The standard cell library has been implemented with various combinatorial and sequential cells. Both two-input logic gates and compound logic gates of And-Or-Invert (AOI)- and Or-And-Invert (OAI)-type are present in the library. A D-type flip-flop has been implemented to allow clocked digital designs to be synthesized.

Additionally, a D-type latch and a tri-state buffer have been implemented. Although not required or utilized in any synthesized designs presented in this report, some digital designs may require these cells for the RTL-code to be synthesizable.

The work presented in this report is based on previous work presented in [1].
There were some problems with the previous implementation that have since been resolved. The biggest problem was due to compatibility issues in Place And Route (P\&R) related to the Library Exhange Format
(lef) file. The routing grid was previously chosen to be suitable for the design. However, to resolve issues that surfaced under $\mathrm{P} \& \mathrm{R}$, the routing grid has been changed to comply with the restrictions of the PDK.

The library was functional for the synthesized designs. Generated reports after library characterization and results of simulations provided correct functionality. Synthesized designs passed Design Rule Check (DRC) and Layout Versus Schematic (LVS) checks, which implies that the standard cells are compatible with the standard design flow.

The PicoRV32 CPU was verified to function with a clock frequency of 2.1 MHz at 300 mV . When the supply voltage was increased to 600 mV , the CPU supported a clock frequency of 50 MHz . The maximum supported frequency tested by simulation was 250 MHz , with a supply voltage of 1 V .

The lowest amount of energy consumed for executing the testbench was found to be approximately 32.3 pJ at 500 mV and 20 MHz .

## A. Outline

The report contains the following chapters, and is meant to be read in the following way:

- Background (Section II) - Background theory, written in a generalized manner, necessary to be familiar with to better understand the implementation and results
- Methodology (Section III) - The tools and methods used in the implementation, and how results were obtained
- Implementation (Section IV) - How the standard cell library and the synthesized designs were implemented
- Results (Section V) - Results obtained by library characterization, synthesis, P\&R and simulation of synthesized design
- Discussion (Section VI) - The results are discussed and evaluated
- Conclusion (Section VII) - The conclusion of the discussion


## II. Background

This section will describe some theory that is necessary to be familiar with to understand the implementation of the standard cell library. The design flow used for the implementation is presented in Section II-A. Tools that are used in the design flow are presented in Section III-A.

## A. Design flow

An overview of the design flow is given in Fig. 1.


Fig. 1: Overview og standard cell library design [1, Fig. 4]
The initial step is designing the cells themselves. This step is explained in further detail in Section II-A1.
A library characterization tool simulates each cell to characterize the power consumption and timing information. The tool imports the netlists (containing parasitic capacitances) for each cell and generates a lib file containing all information about the cells. Information about area may be included (optional) to allow the synthesis tool to optimize for area. As the netlists do contain geometrical information, this is not included automatically. Included in the lib file is also the logic functionality of the cells.

The synthesis tool is responsible for reading an HDL file and generate a netlist containing the cells that are available in the library. As the lib file contains the truth tables, power consumption, and timing for each cell, the synthesis tool has the necessary information to synthesize with correct functionality and meet the timing requirements (if possible). The netlist generated by the synthesis tool can be used by the Place And Route $(\mathrm{P} \& R)$ tool to create the floorplan with the cells placed and connected as stated in the netlist.

For the $\mathrm{P} \& R$ tool to be able to place the cells and route between them without creating shorts, it needs to be provided information about the layout of the cells. However, not all the information from the layout is needed. This can be done by using a Library Exhange Format (lef) file. The file contains the following information [4]:

- Technology: layer, design rules, via definitions, metal capacitance
- Site: Site extension
- Macros: cell descriptions, cell dimensions, layout of pins and blockages, capacitances

An important part of the design flow is to include the lef file for the given technology in use. By using a tool, one can generate another lef file containing the macros. These macros contain the necessary information about the cells, which allows the $P \& R$ tool to use the cells without any information about the internal netlist. Not shown in Fig. 1, is that the lef files may be included in the synthesis tool.

Including this file implies that information about the area is known, which allows the tool to minimize the area and produce more accurate reports. Additionally, the lib file can be included in the $\mathrm{P} \& \mathrm{R}$ tool to analyze necessary setup and hold times in the design.

When the $P \& R$ tool has finished placing all cells and routing the design, the layout and Verilog netlist can be exported for further use. The layout is streamed out to a binary .gds file, which is the file used to fabricate the integrated circuit. The Verilog netlist contains all the digital cells (from the standard cell library) that are placed and routed by the $\mathrm{P} \& \mathrm{R}$ tool. It is also necessary to ensure that the design passes the DRC and LVS checks. They will be explained in further detail in Section II-A1.

Additional steps for finalizing the design for tape-out are regarded as out of the scope of this report. However, the exported layout after $\mathrm{P} \& \mathrm{R}$ can be used for further simulation with parasitic components extracted (extraction is explained in Section II-A1).

1) Cell design: Fig. 2 shows an overview of the design flow for a single digital cell. The initial stage is to design the cell on the schematic level. The cell can then be verified and analyzed in simulation and redesigned if necessary.


Fig. 2: Design flow of digital cells [1, Fig. 5]
When the cell works as intended with satisfying results, the next step is to design the layout. The layout is related to the physical design and contains the geometrical information of the layers that are intended to be in the integrated circuit. When drawing the layout, the design must obey the design rules that are given
by the Process Design Kit (PDK). To ensure that these are followed, the layout must pass the Design Rule Check (DRC) to achieve an overall high yield and reliability [5]. In addition to DRC, the design must pass Layout Versus Schematic (LVS). This tool verifies that the layout is the same representation of the circuit as the schematic (same number of transistors, nets connected correctly, etc.).

When both DRC and LVS pass, there is one more step that is necessary to complete the layout step. Transistors contain parasitic capacitors [6], which impacts the behavior of the cells. In addition, adding metal wires in the cell adds parasitic capacitance. The parasitic components can be extracted by using a tool that generates a netlist containing the parasitic components in addition to the circuit. By including the extracted components, the simulation models become more accurate when accounting for these parasitic components.

The final step is to reevaluate the behavior of the cell. If the simulation results are not satisfactory after parasitic components are accounted for, the cell must be redesigned.

## B. Digital cells and logic gates

The Boolean functionality of the most common logic gates, flip-flops, and latches is assumed to be known by the reader. The following cells must be known, and are not explained in detail in this report:

- Inverters
- Tri-state buffers
- NAND, NOR, AND and OR
- XOR and XNOR
- D type flip-flops
- D type latches
- Multiplexers

Additionally, a basic understanding of CMOS transistors and how PMOS and NMOS constructs the pull-up and pull-down circuitry of a logic cell is assumed.

1) Compound gates: One can achieve any combinatorial function by only using NAND-gates and inverters. By connecting one input of a NAND-gate to VDD, one can even use only NAND-gates, assuming the synthesis tool supports it. However, this has multiple drawbacks. By creating more logic gates (f. ex. NOR, XOR, and XNOR), one can reduce the number of total required transistors. This increases overall speed and reduces both area and power consumption of the synthesized result.

Expanding on this, it is possible to create cells that perform more complex logic functions in a single stage of logic by using a combination of parallel and serial connections of PMOS and NMOS transistors [6]. Examples of this are AOI (And-Or-Invert) and OAI (Or-And-Invert) cells. How they can be derived is described in the following paragraphs. Note that the approach can be used to analyze simple NANDand NOR-gates with two or more inputs. For clarity, a short description of them is provided.
a) NAND gates: The output of a NAND-gate is dependant on every input being high for the output to be low. This implies that when only one (or none) of the inputs is low, the output must be high. The PMOS circuitry is responsible for pulling the output to the value of the VDD rail (PMOS circuitry may be referred to as the pull-up network). As the NAND-gate produces a high output even when a single input is low, the PMOS transistors must be connected in parallel. When the gate voltage is low on one of the inputs, the transistors connects ${ }^{2}$ the output to VDD. Similarly, for the NMOS circuitry, a single low input must disconnect ${ }^{3}$ the output from VSS. Only when all inputs are high must the output be connected to VSS. This implies that the NMOS transistors are connected in series.
b) NOR gates: The output of a NOR gate requires both inputs to be low for the output to be high. Following the same approach as for the NAND gate, one can see that both PMOS transistors must conduct for the output to be driven high. This implies that the PMOS transistors must be connected in series. The NMOS transistors should connect the output to VSS when at least one transistor has a high gate voltage, which implies that the NMOS transistors are connected in parallel.

[^1]c) AOI cells: AOI cells perform Sum-Of-Products (SOP) expressions. This means that the output depends on the sum of two or more products. An example is shown in eq. (1). Note that the output is inverted.
\[

$$
\begin{equation*}
Y=\overline{A B+C D} \tag{1}
\end{equation*}
$$

\]

The equation in the example can naturally be made by using two AND-gates (connected to A and B, and C and D respectively), which feeds a two-input NOR-gate. However, this can be simplified.

If we first regard the pull-down network consisting NMOS transistors, one can see that for the (inverted) output to be pulled low, either $A B$ or $C D$ must be true. For each product, both inputs must be high to conduct between output and VSS. This implies that the NMOS transistors must be connected in series for the given product. The output only depends on one of the products to be true, which implies conducting serial connection. From this follows that each serial connection is connected in parallel.

The pull-up network consists of PMOS transistors, which conducts when the gate voltage is low. Pulling the output voltage high requires both products to be false (given the output is inverted). This implies that $A$ or $B$ must be low, for the product to be evaluated as false. The same applies to $C$ and $D$. As both products must be evaluated to false for the output to be driven high, the transistors for $A B$ and $C D$ must be connected in series, while the transistors for each product are in parallel.

The resulting schematic and symbol that evaluates the function in eq. (1) is given in Fig. 3. The name for this specific cell is AOI22.

(a) Schematic for AOI22 cell

(b) Symbol for AOI22 cell

Fig. 3: AOI22, an example of logic gate of type And-Or-Invert
Using the same approach for different Boolean functions, one can create schematics for logic gates to evaluate other Boolean functions, with more or fewer inputs.
d) OAI cells: OAI cells are similar to AOI cells. However, OAI cells are used to calculate a Product-Of-Sums (POS) expression instead of a SOP expression. Fig. 4 shows the schematic and symbol for the logic gate that represents the equation shown in eq. (2).

$$
\begin{equation*}
Y=\overline{(A+B)(C+D)} \tag{2}
\end{equation*}
$$

Using the same approach as for the AOI cell, one can create the schematic by using a combination of parallel and serial connections of the transistors. However, the first step is to identify each sum instead of each product. As inputs $A$ and $B$ are ORed together, the NMOS transistors must be connected in parallel and PMOS transistors in series. As the sums are ANDed together, the NMOS circuitry for each sum must be connected in series, and the PMOS circuitry must be connected in parallel. From this follows that the OAI cells can easily be created by switching the parallel and serial connections from the complementary AOI cells.

An important thing to notice with AOI- and OAI-cells is that for each input, one NMOS and one PMOS transistors are required. This implies that the total number of transistors is twice the amount of inputs.


Fig. 4: OAI22, an example of logic gate of type Or-And-Invert

## III. Methodology

## A. Tools

Table I presents the tools and descriptions for what they were used.
TABLE I: Tools used in the design flow of the standard cell library

| Tool | version | Description |
| :--- | :--- | :--- |
| Virtuoso | $6.1 .7-64 \mathrm{~b}$ | Design of schematic and layout. |
| calibre | v2020.3_24.16 | Design Rule Check (DRC) and Layout Versus Schematic (LVS) |
| Quantus Extraction | $20.1 .1-\mathrm{s} 233$ | Extraction of parasitic capacitance |
| Liberate | 19.21 .472 | Library characterization of cells (lib-file) |
| Genus | 19.15 .000 | Gate level synthesis |
| Abstract | $6.1 .7-64 \mathrm{~b}$ | Abstract view generation, to generate Library Exhange Format (lef) |
| Innovus | v19.16-s053_1 | Place And Route (P\&R) |
| irun | $15.20-$ s084 | Simulation of processor with Verilog testbench |

1) Design of standard cells: The Schematic Suite XL and Layout Suite XL in Virtuoso were used to design the schematic and layout of cells. Some simulations were done with ADE Explorer to verify the functionality of the cells.
2) Library characterization: The characterization of the library was done by use of Liberate from Cadence. All cells were characterized for a nominal temperature of $25^{\circ} \mathrm{C}$ in the TT process corner. To accurately characterize the behavior of the digital cells, the input slews and load capacitance must be defined. The characterization was done for the following input slews and load capacitances:

- Input slew: $0.5 \mathrm{~ns}, 1 \mathrm{~ns}, 3 \mathrm{~ns}, 7 \mathrm{~ns}, 10 \mathrm{~ns}$
- Load capacitance: $0.5 \mathrm{fF}, 1 \mathrm{fF}, 2 \mathrm{fF}, 3 \mathrm{fF}, 5 \mathrm{fF}$

The library contains results for all combinations of input slews and load capacitances. Values were chosen based on initial simulation results and measurements of capacitance of various cells.
3) Synthesis: The gate-level synthesis was performed with Genus from Cadence. To allow more accurate area optimization and reporting, lef files for the technology, and the generated lef files for the library, were included in the script run by Genus. The synthesis was performed with a high effort on redundancy removal and optimization for timing, area, and power.

The power estimations were performed with default settings. The leakage power is calculated from values given by the lib files, and the dynamic power is calculated as following: If a pin is associated with a clock, the default toggle rate is $10 \%$ of the frequency. If a pin is not associated with a clock, the default toggle rate is $1 \%$ of the frequency.

## B. Testbench

The implementation of the testbench is presented in Section IV-D2. In order to find the most energyefficient supply voltages, some metrics should be defined.
The average power consumption, $P_{\text {avg }}$, is found by measuring the current delivered by the power supply, and multiply the average current, $I_{a v g}$, with the supply voltage, $V_{V D D}$, as shown in eq. 3. From this, the total energy consumed when simulating the testbench can be calculated by multiplying the average power consumption with the time, $T$, as shown in eq. 4 . As the number of clock cycles is known, the total time required can be substituted with a function given by the frequency $f$, as shown in eq. 5 .

$$
\begin{align*}
& P_{a v g}=I_{a v g} \cdot V_{V D D}  \tag{3}\\
& E=P_{a v g} \cdot T  \tag{4}\\
& E=P_{a v g} \cdot \frac{c y c l e s}{f} \tag{5}
\end{align*}
$$

## IV. IMPLEMENTATION

In this chapter, the implementation of the standard cell library will be presented. All digital cells were implemented using Super Low VT (SLVT) transistors to maximize speed. A consequence of this is a trade-off with higher current leakage through the transistors. Each transistor was implemented with a width of 200 nm , and minimum gate length. The width was chosen by initial experimentation to achieve a balanced rise and fall time for the output of the inverter. Every transistor was implemented with equal width to simplify the design process. As will be explained in Section IV-A1, higher drive strengths were obtained by using multiple transistors in parallel.

PicoRV32, a RISC-V CPU architecture presented in Section IV-D, was synthesized with a supply voltage of 300 mV and frequency of 3.2 MHz . The synthesized design was used in Place And Route ( $\mathrm{P} \& \mathrm{R}$ ), to generate a netlist where parasitic components were extracted. The resulting netlist will be referred to as the main netlist in this report. All simulation results are based on simulation of the main netlist.

## A. Digital Cells in standard cell library

All digital cells have been designed to function optimally with a supply voltage of 300 mV . The cells that have been implemented are presented in Table II. The schematics and layouts for all cells are presented in Appendix A.

Each cell has a fixed height of $1.3 \mu$. The width is a multiple of 130 nm for compatibility with the Process Design Kit (PDK).

TABLE II: Digital cells implemented in the standard cell library

| Name | Drive strength | Description |
| :--- | :--- | :--- |
|  |  |  |
| INV1 | X1 (Fig. 16), X4 (Fig. 17) | Single-input inverter |
| BUFF1 | X1 (Fig. 18) | Tri-state buffer |
| NAND2 | X1 (Fig. 19) | Two-input NAND |
| AND2 | X1 (Fig. 20) | Two-input AND |
| NOR2 | X1 (Fig. 21) | Two-input NOR |
| OR2 | X1 (Fig. 22) | Two-input OR |
| XNOR2 | X1 (Fig. 23) | Two-input XNOR |
| XOR2 | X1 (Fig. 24) | Two-input XOR |
| AOI12 | X1 (Fig. 25) | Two-input AND to two-input NOR |
| AOI22 | X1 (Fig. 26) | Double two-input AND to two-input NOR |
| AOI112 | X1 (Fig. 27) | Two-input AND to three-input NOR |
| AOI212 | X1 (Fig. 28) | Double two-input AND to three-input NOR |
| AOI222 | X1 (Fig. 29) | Triple two-input AND to three-input NOR |
| OAI12 | X1 (Fig. 30) | Two-input OR to two-input NAND |
| OAI22 | X1 (Fig. 31) | Double two-input OR to two-input NAND |
| OAI211 | X1 (Fig. 32) | Two-input OR to three-input NAND |
| OAI222 | X1 (Fig. 33) | Triple two-input OR to three-input NAND |
| MUX2 | X1 (Fig. 34) | 2:1 multiplexer |
| DFF | X1 (Fig. 35), X4 (Fig. 36) | D-type flip-flop |
| DL | X1 (Fig. 37) | D-type latch |
| FA | X1 (Fig. 38) | Full adder |

The area for the layout of each cell is presented in Table III.

1) Transistors: As shown in Table II, there are two available drive strengths in the library. The width of all transistors are 200 nm (for both NMOS and PMOS). As the height of the cells should be equal for every cell in the library [1], this is the case for both X1- and X4-versions of the cells. The higher drive strength is realized by having multiple transistors in parallel. The given technology requires the bulk connection to be connected to the VDD/VSS rails, so they are in themselves a part of the rails. Fig. 5 presents the layout for PMOSX1 and PMOSX4. NMOSX1 and NMOSX4 layouts are presented in Fig. 6

TABLE III: Area of the implemented cells

| Cell | Area |
| :--- | :--- |
| INV1X1 | $0.507 \mu \mathrm{~m}^{2}$ |
| INV1X4 | $1.690 \mu \mathrm{~m}^{2}$ |
| BUFF1X1 | $1.521 \mu \mathrm{~m}^{2}$ |
| NAND2X1 | $0.676 \mu \mathrm{~m}^{2}$ |
| AND2X1 | $1.014 \mu \mathrm{~m}^{2}$ |
| NOR2X1 | $0.676 \mu \mathrm{~m}^{2}$ |
| OR2X1 | $1.014 \mu \mathrm{~m}^{2}$ |
| XNOR2X1 | $2.366 \mu \mathrm{~m}^{2}$ |
| XOR2X1 | $2.366 \mu \mathrm{~m}^{2}$ |
| AOI12X1 | $1.183 \mu \mathrm{~m}^{2}$ |
| AOI22X1 | $1.183 \mu \mathrm{~m}^{2}$ |
| AOI112X1 | $1.014 \mu \mathrm{~m}^{2}$ |
| AOI212X1 | $1.352 \mu \mathrm{~m}^{2}$ |
| AOI222X1 | $1.690 \mu \mathrm{~m}^{2}$ |
| OAI12X1 | $1.183 \mu \mathrm{~m}^{2}$ |
| OAI22X1 | $1.352 \mu \mathrm{~m}^{2}$ |
| OAI211X1 | $1.183 \mu \mathrm{~m}^{2}$ |
| OAI222X1 | $1.859 \mu \mathrm{~m}^{2}$ |
| MUX2X1 | $3.549 \mu \mathrm{~m}^{2}$ |
| DFFX1 | $2.535 \mu \mathrm{~m}^{2}$ |
| DFFX4 | $9.126 \mu \mathrm{~m}^{2}$ |
| DLX1 | $2.535 \mu \mathrm{~m}^{2}$ |
| FAX1 | $7.943 \mu \mathrm{~m}^{2}$ |
| FILLER | $0.169 \mu \mathrm{~m}^{2}$ |



The main difference that must be accounted for is that the transistors are connected on another metal layer when dealing with multiple transistors in parallel. As mentioned, the bulk connections are all connected
to the rails that are over the PMOS transistors and under the NMOS transistors. The rails are designed to be easy to stack side by side. In addition, the symmetrical rails allow similar transistors to be mirrored over/under the rails. To be compliant with the restrictions of the PDK and reduce mismatch and PVTvariations, the gates have a fixed pitch of 130 nm . Non-active poly is included for the reasons presented in [1].
2) Simple logic gates: Implementation of the most basic inverting logic cells (INV1, NAND2, NOR2) is very intuitive. Section II-B presents some information about how NAND2 and NOR2 can be implemented. Only two-input versions of the cells are included in this standard cell library.

AND2 and OR2 have been designed by simply using NAND2/NOR2 and an inverter in series. The area has been slightly minimized by having an overlap between the cells, reducing the area from what would be possible by the synthesis tool.

XNOR2 and XOR2 have been designed using an 8 transistors in a static CMOS configuration [6]. Inverters are included in the cells to provide the inverted complements of the signals within the cells. Note that, in difference with the other logic gates presented in this section, one can achieve the inverting functionality without including an inverter on the output of the complementing cell.

INV1 is one of the few gates that has both an X1- and an X4-version. As this is one of the most used gates, the overall synthesized design can benefit by having more than one available drive strength.
3) Compound logic gates: How to draw the schematic for compound logic gates is presented in Section II-B1. All implemented compound cells are of And-Or-Invert (AOI) and Or-And-Invert (OAI)type. The non-inverting complements of the cells can be realized using an inverter on the output, which can be done by the synthesis tool, and has therefore not been implemented in the current cell library. The following cells, with the given Boolean functions have been implemented:

- AOI12X1: $Y=\overline{A B+C} \quad$ (Presented in Fig. 25)
- AOI22X1: $Y=\overline{A B+C D} \quad$ (Presented in Fig. 26)
- AOI112X1: $Y=\overline{A B+C+D} \quad$ (Presented in Fig. 27)
- AOI212X1: $Y=\overline{A B+C D+E}$
(Presented in Fig. 28)
- AOI222X1: $Y=\overline{A B+C D+E F} \quad$ (Presented in Fig. 29)
- OAI12X1: $Y=\overline{(A+B) C} \quad$ (Presented in Fig. 30)
- OAI22X1: $Y=\overline{(A+B)(C+D)} \quad$ (Presented in Fig. 31)
- OAI211X1: $Y=\overline{(A+B) C D} \quad$ (Presented in Fig. 32)
- OAI222X1: $Y=\overline{(A+B)(C+D)(E+F)}$ (Presented in Fig. 33)

By inspecting the schematics of one of the larger compound logic gates, the maximum amount of transistors between the output and the VDD/VSS rails can be more than what is required by using multiple two-input logic gates (for example up to three transistors in AOI222X1, Fig. 29b, or OAI222X1, Fig. 33b).
4) Buffers: For the synthesis tool to be able to amplify signals or delay signals (given timing constraints), there should be at least one buffer in the cell library [7]. For several buffers to be attached to a databus or similar, it should be a tri-state buffer. This allows several buffers to avoid outputting to the bus all at once. In the current library, a tri-state buffer with an enable signal $E$ has been implemented. When $E$ is high, the output $Y$ is equal to the input $A$. When $E$ is low, the output is $Z$ (high-impedance), disregarding the value of the input is.

Due to time restrictions for the project, a digital buffer (without an enable signal), or an inverting tri-state buffer, has not been implemented.
5) Multiplexer: The $2: 1$ multiplexer that has been implemented is presented in Fig. 34. The multiplexer passes either the value of input $A$, or the value of input $B$ to the output, depending of the value of the select signal, SEL. This is realized by ANDing the inputs with the select signal (inverted in the case of $A)$. By the use of the AND-gates, only one signal can be passed to the OR-gate at the time.
6) D-type flip-flop: For the D-type flip-flop, two drive strengths have been implemented (X1 and X4). As one of the most timing-critical parts of the library, it's beneficial to have the possibility to have cells with higher speed. The implemented design is based on the Pass Gate DFF presented in [8]. This design was chosen as it scored the best overall score for low-voltage implementations.

The DFFX1 is presented in Fig. 35, and DFFX4 is presented in Fig. 36. Note that flip-flops are necessary to synthesize sequential digital designs.
7) D-type latches: Some designs written in HDL may require a latch. Various HDL designs were synthesized for testing purposes, but some designs required latches to be synthesizable. Although the D-type latch is not required for work presented in this report, the latch was implemented to allow the library to be used whenever a design should be synthesizable.

As the D-type latch is not a high priority for this project, only a single drive strength has been implemented. The DLX1, presented in Fig. 37, is an Active High Transparent Latch, with a non-inverted output.
8) Full Adder: A full adder may be implemented by the synthesis tool. However, by implementing a full-custom cell, one can exploit known optimizations, which may improve the overall performance of the synthesized design. The full adder (FAX1) that is implemented in this library is inspired by a FA implementation, using XNOR gates and a multiplexer, presented in [9]. A comparative study of multiple full adders was done in [10], where the XNOR based implementation obtained good results for both energy-efficiency and speed with a sub-threshold implementation. The full adder is presented in Fig. 38.
9) Filler: Although not an active part of the digital library, filler cells need to be implemented to ensure that the $\mathrm{P} \& \mathrm{R}$ tool is able to fill void. There are a couple of things to keep in mind when designing the filler cell. It is used to ensure that the layout after P\&R does not contain any DRC errors because of empty space between cells. Similar to other cells, the filler cells should contain rails that are symmetrical. This ensures that the rails have similar symmetry, reducing mismatch- and PVT-variations. Dummy-poly are implemented as well for the same reasons. As the cell does not contain any components, designing a schematic is not required. The layout for the filler cell, named FILLER, is presented in Fig. 7. Some overlapping layers ( N -well regions etc.) are included to remove any possible DRC errors due to arbitrary distances between the layers.


Fig. 7: Layout for FILLER

## B. Library characterization

The method used for the library characterization process is described in Section III-A2. As mentioned, the characterization was done for nominal temperature in the TT corner. Presented results do therefore not take PVT-variations.

However, it can be beneficial to analyze the robustness of the various cells. To achieve this, the different process corners were characterized over three temperatures: $25^{\circ} \mathrm{C},-20^{\circ} \mathrm{C}$ and $85^{\circ} \mathrm{C}$.

The most interesting operating conditions were found to be the following:

- $25^{\circ} \mathrm{C}$, TT-corner, Nominal conditions
- $85^{\circ} \mathrm{C}$, FF-corner, Highest speed and highest power consumption
- $-20^{\circ} \mathrm{C}$, SS-corner, Lowest speed and lowest power consumption

The results from comparisons of the different operating conditions are presented in Section V-A.

## C. Synthesis of digital designs

Before synthesizing a larger digital design, some smaller designs were used for some intermediate verification of the standard cell library. As the motivation was to verify the functionality of the standard cell library, the frequency was kept fairly low. Both designs were synthesized for 300 mV and 1 MHz .

These two designs were synthesized:

- A full adder, synthesized with the FAX1 cell excluded.
- An 8 -bit counter with support for enable and loading data.

After synthesis, the designs went through $\mathrm{P} \& \mathrm{R}$ and extraction of parasitic components. Simulations were performed, by using PADE, on the netlists with parasitic components extracted.

1) Full adder: The full adder is a simple clocked adder, with carry-in and carry-out signals. The RTLcode, written in SystemVerilog, is presented in Listing 1. At every positive clock edge, the carry-out, $C O$, and sum, $S$, is calculated as the sum of the input values $(A, B$ and $C I)$.
```
module adder (
    input logic A,
    input logic B,
    input logic CI,
    input logic clk,
    output logic CO,
    output logic S
    );
    always_ff @(posedge clk) begin
        {CO,S} <= A + B + CI;
    end
endmodule
```

Listing 1: Implementation of the full adder
The synthesis was performed by excluding the full adder cell, FAX1. If synthesis would produce all combinatorial logic with a single cell, it could be more difficult to verify that the standard cell library is compatible with the whole design flow.

The simulation looped through the possible combinations of the inputs, and the output values were measured and inspected visually. For the simulation, the supply voltage was 300 mV and the frequency was 1 MHz . The results are presented in Section V-B.
2) Counter: The 8 -bit counter is written in Verilog, as presented in Listing 2.

The counter can be reset with the reset signal. If the enable signal is high, out is incremented every clock cycle. The load signal can be used to load a value from data to the counter.

```
module top( out, data, load, enable, clk, reset);
output [7:0] out;
input [7:0] data;
input load, enable, clk, reset;
reg [7:0] out;
always @(posedge clk)
    if (reset) begin
        out <= 8'bo;
    end else if (load) begin
        out <= data;
    end else if (enable) begin
        out <= out + 1;
    end
endmodule
```

Listing 2: Implementation of the Counter

The testbench works by resetting the counter for 1 clock cycle, and enables it immediately after the reset. The enable signal goes low at $8 \mu \mathrm{~s}$ for one clock cycle before going high again. At the same time, and the load signal goes high for a single clock cycle, while $0 x 55$ is written to data. Then the counter should resume operation, counting up from $0 x 55$.

## D. PicoRV32

PicoRV32 ${ }^{4}$ is an open-source RISC-V CPU implementation written in Verilog. The CPU may be configured as a RV32E, RV32I, RV32IC, RV32IM or RV32IMC core. To simplify the design, the CPU was implemented as shown in Listing 3 (A. Djupdal, personal communication, May 12, 2021). This configuration reduces the number of registers, disables interrupts and 64-bit counters. Additionally, it allows instructions for comparisons and arithmetic operations to use two clock cycles, which relaxes timing requirements.

The motivation for implementing a RISC-V processor was to verify the functionality of the standard cell library, and obtain quantitative results of the performance of the library. The CPU was synthesized for 300 mV , over a range of frequencies. The maximum frequency that met timing restrictions after synthesis was found to be 3.2 MHz . Using the synthesized netlist for 300 mV and 3.2 MHz , the layout of the CPU was obtained after P\&R. The layout was verified to pass DRC and LVS, and parasitic components were extracted for the layout of the whole CPU. Extraction of the parasitic components was done similar to the process for extraction for each standard cell, explained in Section II-A1. All simulations were done on the main netlist obtained after the parasitic extraction. This implies that the CPU is optimized for 300 mV and 3.2 MHz by the synthesis tool.

1) $P \& R$ : Power rings were added around the CPU to allow the $P \& R$ tool to route the power rails. This was necessary to pass LVS and ensure that the netlist had consistent VDD/VSS rails. P\&R was done with a core utilization of $70 \%$. As the main focus was to verify that the standard cell library worked correctly, no efforts were made to increase the core utilization from the default value. Because of this, necessary filler cells were added to ensure that no DRC errors occurred.
2) Testbench: The testbench that has been used to simulate the CPU is shown in Listing 4 (A. Djupdal, personal communication, May 18, 2021). It works by executing multiple instructions that are given in memimage.hex.

Initially, the CPU resets for eight clock cycles before processing the instructions. The simulation of the testbench is finished when the execution of the instructions in the memory is completed. Alternatively, on a timeout, if the CPU behaves incorrectly (ie when the frequency is higher than possible).

The content of memimage.hex executes the program presented in Listing 5 (A. Djupdal, personal communication, May 18, 2021). With eight clock cycles for resetting the CPU, the whole execution

[^2]```
module top (
    input clk,
    input resetn,
    output wire mem_valid,
    output wire mem_instr,
    input mem_ready,
    output wire [31:2] mem_addr,
    output wire [31:0] mem_wdata,
    output wire [ 3:0] mem_wstrb,
    input [31:0] mem_rdata
);
    wire [31:0] mem_addr_i;
    parameter [31:0] STACKADDR = 32'h 0000_0400;
    parameter [31:0] PROGADDR_RESET = 32'h 0000_0000;
    assign mem_addr = mem_addr_i[31:2];
    picorv32 #(
        .ENABLE_COUNTERS64 (0)
        .ENABLE_REGS_16_31 (0),
        .ENABLE_REGS_DUALPORT (0),
        .LATCHED_MEM_RDATA (1)
        .CATCH_MISALIGN (0)
            .CATCH_ILLINSN (0),
            .TWO_STAGE_SHIFT (0)
            .TWO_CYCLE_COMPARE (1),
            .TWO_CYCLE_ALU (1)
            STACKADDR (STACKADDR),
            .PROGADDR_RESET (PROGADDR_RESET),
            .ENABLE_IRQ (0)
    ) cpu (
            .clk (clk),
            resetn (resetn),
            .mem_valid (mem_valid),
            .mem_instr (mem_instr),
            .mem_ready (mem_ready),
            .mem_addr (mem_addr_i),
            .mem_wdata (mem_wdata),
            .mem_wstrb (mem_wstrb),
            .mem_rdata (mem_rdata)
    );
endmodule
```

Listing 3: Top implementation of PicoRV32 (A. Djupdal, personal communication, May 12, 2021)
takes 82 clock cycles.
The CPU was synthesized for various values of supply voltages and clock frequency to obtain an estimate for the maximum possible frequency for each supply voltage. However, the testbench was simulated only on the netlist that was synthesized for 300 mV .

```
`timescale 1 ns / 1 ps
module tb_picorv32;
    reg clk;
    reg resetn;
    wire mem_valid;
    wire mem_instr;
    wire [31:2] mem_addr;
    wire [31:0] mem_wdata;
    wire [ 3:0] mem_wstrb;
    reg [31:0] mem_rdata;
    reg [31:0] mem [0:255];
    integer cyclecounter;
    initial begin
        $readmemh("memimage.hex", mem);
        cyclecounter = 0;
        clk = 0;
        resetn = 0;
        #8000 resetn = 1;
    end
    always #500 clk=~clk;
    // stop if timeout
    always @(posedge clk) begin
        cyclecounter = cyclecounter + 1;
        if(cyclecounter >= 100) begin
                $display("Error, timeout");
                $stop;
        end
    end
    // memory
    always @(*) begin
        if (mem_valid) begin
            mem_rdata <= mem[mem_addr];
                if (mem_wstrb[0]) mem[mem_addr][ 7: 0] <= mem_wdata[ 7: 0];
                if (mem_wstrb[1]) mem[mem_addr][15: 8] <= mem_wdata[15: 8];
                if (mem_wstrb[2]) mem[mem_addr][23:16] <= mem_wdata[23:16];
                if (mem_wstrb[3]) mem[mem_addr][31:24] <= mem_wdata[31:24];
        end
    end
    // exit when firmware exits
    always @(*) begin
        if( (mem_valid) &&
                (mem_addr == 30'h0000_080) &&
                (mem_wstrb == 4'hf) &&
                (mem_wdata == 32'h0000_00ad)
        ) begin
                $display("Test program ended correctly");
                $stop;
        end
    end
    top chip (
        .clk (clk),
        .resetn (resetn),
        .mem_valid (mem_valid)
        .mem_instr (mem_instr),
        .mem_ready (1),
        .mem_addr (mem_addr),
        .mem_wdata (mem_wdata),
        mem_wstrb (mem_wstrb),
        .mem_rdata (mem_rdata)
    );
```

endmodule

Listing 4: Testbench for simulation of PicoRV32 (A. Djupdal, personal communication, May 18, 2021)

```
#define MEM_RESULT 512
int main(int argc, char *argv[]) {
    int volatile *res = (int*)MEM_RESULT;
    *res = 0xad;
    return 0;
}
```

Listing 5: Program executed when running the instructions in memimage.hex (A. Djupdal, personal communication, May 18, 2021)

## V. Results

The library characterization produces datasheets for each cell, presented in Appendix B. Additional results and further elaboration of the results regarding the library characterization are presented in Section V-A.

The simulation of the 8 -bit counter and full adder were intended to verify the functionality of the library. The results are presented in Section V-B and V-C.

Synthesis results of the PicoRV32 CPU is presented in Section V-D1. These results are based on reports generated by the synthesis tool.

Section V-D2 presents the results that are obtained by simulating the testbench of the PicoRv32 processor (presented in Section IV-D2). All simulations were done on the same netlist, synthesized for 300 mV and 3.2 MHz , after $\mathrm{P} \& \mathrm{R}$ and extraction of parasitic capacitance. As mentioned, this netlist will be referred to as the main netlist.

## A. Library characterization

The library characterization produced a library file and a Verilog file containing Verilog descriptions of the cells. Additionally, the characterization produced datasheets of each cell. All datasheets characterized for 300 mV are presented in Appendix B. As mentioned in Section IV-B, the characterization was performed for nominal circumstances, which is $25^{\circ} \mathrm{C}$ in the TT-corner. These are the results exactly as extracted from the library characterization, only edited for formatting. Notice that area and process corners are absent from the datasheets. The correct area of the cells is presented in Table III

The cells were also characterized with other operating conditions, as mentioned in Section IV-B. Some key results are presented in Table IV. The $\Delta$ delay and $\Delta$ power columns is the difference in the maximum delay and leakage power from the nominal conditions ( $25^{\circ} \mathrm{C}$, TT-corner). They are calculated as shown in eq. (6), where Delay is the maximum delay as shown in the table, and Leakage is the leakage power. Delay $y_{n o m}$ and Leakage ${ }_{n o m}$ are the values from the nominal operating conditions ( $25^{\circ} \mathrm{C}$, TT-corner).

$$
\begin{equation*}
\Delta \text { delay }=\frac{\text { Delay }}{\text { Delay }}, \quad \Delta \text { power }=\frac{\text { Leakage }=}{\text { Leakage }_{\text {nom }}} \tag{6}
\end{equation*}
$$

As there are plentiful cells in the library, only a selection has been presented. However, the characterization results for all cells are presented in Appendix C.

TABLE IV: Library characterization results for different operating conditions

| Operating conditions | Cell | Maximum delay | $\Delta$ delay | Leakage power | $\Delta$ power |
| :--- | :--- | :--- | :--- | :--- | :--- |
|  | INV1X1 | 10.89 ns | 1.0 | 0.14 nW | 1.0 |
|  | INV1X4 | 7.28 ns | 1.0 | 0.57 nW | 1.0 |
|  | NAND2X1 | 16.64 ns | 1.0 | 0.29 nW | 1.0 |
|  | AND2X1 | 16.03 ns | 1.0 | 0.35 nW | 1.0 |
| $25^{\circ} \mathrm{C}$, TT-corner | XNOR2X1 | 25.9 ns | 1.0 | 0.46 nW | 1.0 |
|  | AOI12X1 | 21.53 ns | 1.0 | 0.29 nW | 1.0 |
|  | AOI112X1 | 32.35 ns | 1.0 | 0.29 nW | 1.0 |
|  | AOI222X1 | 36.83 ns | 1.0 | 0.29 nW | 1.0 |
|  | OAI222X1 | 28.02 ns | 1.0 | 0.43 nW | 1.0 |
|  |  |  |  |  |  |
|  | INV1X1 | 48.8 ns | 4.48 | 0.01 nW | 0.06 |
|  | INV1X4 | 20.83 ns | 2.86 | 0.03 nW | 0.06 |
|  | NAND2X1 | 89.9 ns | 5.4 | 0.02 nW | 0.06 |
|  | AND2X1 | 89.7 ns | 5.6 | 0.02 nW | 0.06 |
| $-20{ }^{\circ} \mathrm{C}$, SS-corner | XNOR2X1 | 175.43 ns | 6.77 | 0.03 nW | 0.06 |
|  | AOI12X1 | 145.14 ns | 6.74 | 0.02 nW | 0.06 |
|  | AOI112X1 | 241.41 ns | 7.46 | 0.02 nW | 0.06 |
|  | AOI222X1 | 280.24 ns | 7.61 | 0.02 nW | 0.06 |
|  | OAI222X1 | 176.2 ns | 6.29 | 0.02 nW | 0.06 |
|  |  |  |  |  |  |
|  | INV1X1 | 4.05 ns | 0.37 | 2.89 nW | 20.27 |
|  | INV1X4 | 2.17 ns | 0.3 | 11.56 nW | 20.27 |
|  | NAND2X1 | 4.5 ns | 0.27 | 5.45 nW | 18.67 |
|  | AND2X1 | 3.73 ns | 0.23 | 8.44 nW | 23.89 |
|  | XNOR2X1 | 5.84 ns | 0.23 | 11.14 nW | 24.17 |
|  | AOI12X1 | 5.92 ns | 0.27 | 6.03 nW | 20.58 |
|  | AOI112X1 | 7.08 ns | 0.22 | 9.07 nW | 31.22 |
|  | AOI222X1 | 7.54 ns | 0.2 | 8.72 nW | 30.21 |
|  | OAI222X1 | 6.23 ns | 0.22 | 8.0 nW | 18.42 |
|  |  |  |  |  |  |

## B. Full Adder

The synthesis and P\&R for the full adder completed without any errors. Both DRC and LVS checks passed. The simulation results for the full adder are presented in Fig. 8. By visually inspecting the graph, it is clear that the clock period is $1 \mu \mathrm{~s}$.


Fig. 8: Simulation results of full adder

## C. Counter

The 8-bit counter was synthesized, placed, and routed without any reported errors. DRC and LVS did not report errors.

The simulation results of the counter are presented in Fig. 9. Note that the data signal has not been excluded in the graph to simplify the figure.


Fig. 9: Simulation results of 8-bit counter

## D. PicoRV32

1) Synthesis and $P \& R$ : The synthesis of the main netlist produced the following results:

- Supply voltage: 300 mV
- Frequency: 3.2 MHz
- Area: $30219 \mu \mathrm{~m}^{2}$
- Number of gates: 17555
- Total power consumption: $8.82 \mu \mathrm{~W}$
- Dynamic power consumption: $4.26 \mu \mathrm{~W}$
- Leakage power consumption: $4.56 \mu \mathrm{~W}$
- Timing slack: 11 ps

After P\&R, the measured area was $32604 \mu \mathrm{~m}^{2}$, which is an increase of approximately $7.9 \%$.
The layout is shown in Fig. 10. By inspecting the layout, one can see that the digital logic does not fill the entire layout. However, the area that does not contain digital logic is not empty but has been filled by the filler cells.

On the left and right sides of the design, horizontal wires can be seen between the power rings and the core. These are the results of routing the power nets.

The pair of frequencies and voltages presented in Table V were synthesizable with a positive timing slack. All simulation results presented in Section V-D2 were done for the main netlist, but the synthesis results were used to determine frequencies to use in simulation.


Fig. 10: Layout of PicoRV32 after P\&R
TABLE V: Supported pairs of frequency and voltages after synthesis

| Supply voltage | Frequency | Timing slack |
| :--- | :--- | :--- |
| 250 mV | 1.25 MHz | 421 ps |
| 300 mV | 3.2 MHz | 11 ps |
| 350 mV | 5 MHz | 2644 ps |
| 400 mV | 10 MHz | 3425 ps |
| 450 mV | 20 MHz | 37 ps |
| 500 mV | 50 MHz | 0 ps |
| 550 mV | 100 MHz | 1 ps |
| 1 V | 250 MHz | 4 ps |

a) Effect by implementing compund gates: Section IV-A3 presents the implementation of compound logic gates of AOI- and OAI-type. By excluding the compound gates from synthesis, different results were obtained. The CPU was synthesized for 300 mV and 2 MHz with the compound gates included, and excluded. The difference in results are presented in Table VI. Note that the frequency is different than for the main netlist, which implies that the resulting netlists are therefore different.

TABLE VI: Comparison of synthesis results with compound gates included or excluded

| Compound gates | Area | Power Consumption | Leakage Power | Dynamic Power | Timing slack |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Included | $31709 \mathrm{~m}^{2}$ | $8.23 \mu \mathrm{~W}$ | $4.84 \mu \mathrm{~W}$ | $3.40 \mu \mathrm{~W}$ | 389 ps |
| Excluded | $33229 \mu \mathrm{~m}^{2}$ | $8.39 \mu \mathrm{~W}$ | $5.13 \mu \mathrm{~W}$ | $3.26 \mu \mathrm{~W}$ | 629 ps |

b) Area: The area reported for the main netlist is approximately $30219 \mu \mathrm{~m}^{2}$. However, the area varies with the frequency and supply voltage, as shown in Fig. 11.


Fig. 11: Area as a function of supply voltage and frequency

The various synthesized results utilized different gates and drive strengths to produce the results. Fig. 12 presents how many INV1X1, INV1X4, DFFX1 and DFFX4 gates were included in the synthesized netlist for the following pairs of frequency and voltage:

- $V D D=300 \mathrm{mV}, f=2.1 \mathrm{MHz}$ (main netlist)
- $V D D=500 \mathrm{mV}, f=50 \mathrm{MHz}$
- $V D D=1 \mathrm{~V}, \quad f=250 \mathrm{MHz}$


Fig. 12: Logic gates included in the synthesized results
2) Simulation: The pairs of frequencies and supply voltages presented in Section IV-D2 were results after synthesis and had to be adjusted when simulating the main netlist. In most cases, the supply voltages had to be increased to support the given frequencies. Table VII presents the pairs of supply voltages and frequencies that were functional, with the average power consumption through the testbench. Additionally, the maximum frequency that was supported for the supply voltage of 300 mV was found to be 2.1 MHz .

The supported frequencies and the power consumption are presented as functions of supply voltage in Fig. 13 and Fig. 14, plotted on a logarithmic axis.

The energy consumed by the testbench as a function of supply voltage is shown in Fig. 15. Note that the frequency is not constant, but varies with the supply voltage as listed in Table VII. The total energy consumption is calculated as explained in Section III-B.

TABLE VII: Simulation results

| Supply voltage | Frequency | Average Power Consumption |
| :--- | :--- | :--- |
| 270 mV | 1.25 MHz | $1.2 \mu \mathrm{~W}$ |
| 300 mV | 2.1 MHz | $1.5 \mu \mathrm{~W}$ |
| 350 mV | 3.2 MHz | $2.2 \mu \mathrm{~W}$ |
| 400 mV | 5 MHz | $3.0 \mu \mathrm{~W}$ |
| 450 mV | 10 MHz | $4.7 \mu \mathrm{~W}$ |
| 500 mV | 20 MHz | $7.9 \mu \mathrm{~W}$ |
| 600 mV | 50 MHz | $19.9 \mu \mathrm{~W}$ |
| 650 mV | 100 MHz | $45.9 \mu \mathrm{~W}$ |
| 1 V | 250 MHz | $255.9 \mu \mathrm{~W}$ |



Fig. 13: Supported frequencies


Fig. 14: Power consumption


Fig. 15: Energy consumption

## VI. DISCUSSION

The implemented standard cell library has been tested through various steps.
The first step was to verify that the library characterization worked as intended. By inspecting the truth tables given in the datasheets generated by Liberate, all cells perform the desired function.

The next step was to verify the functionality of synthesized designs. Before working with the PicoRV32, two smaller designs were tested: A full adder and an 8-bit counter. By verifying and fixing issues with these smaller designs, finding bugs in the standard cell library was an easier process. These designs are discussed in Section VI-B1 and VI-B2.

When the adder and the counter were functional, the PicoRV32 CPU was synthesized. The implementation and results are discussed in Section VI-B3.

## A. Library characterization

As mentioned, the datasheets presented in Appendix B contains logical functions and truth tables for each cell. Some of the cells can have unusual functions that are not necessarily the intuitive interpretation of the cell. For example, the function for NAND2X1 is given as $(!A)+(!B)$. What one might expect would be ! $(A * B)$. In these cases, one can identify that the logic functionality is equivalent by using DeMorgan's Theorems [11]. However, for more complex logical functions (for example, AOI222X1, OAI222X1, or FAX1), it might be simpler to verify the functionality by inspecting the truth tables in the datasheets.

As the logic cells are functional, the next topic to discuss is the behavior of the cells.
All cells were implemented with Super Low VT (SLVT) transistors. By having the lowest possible threshold voltage, the speed of the transistors is maximized. However, this has a drawback, which is an increased leakage current. This can result in high static power consumption. As seen in the datasheets, the leakage power of the cells can range from 0.1425 nW (INV1X1) to 2.5677 nW (DFFX4).

The leakage power for the cells with multiple drive strength is shown in Table VIII. The leakage power increases in both cells with a factor of approximately 4 for the X4-versions. From this, we can assume that there is a probability that the leakage power increases linearly with the number of transistors in parallel.

TABLE VIII: Leakage power for different drive strengths

| Cell | INV1X1 | INV1X4 | DFFX1 | DFFX4 |
| :--- | :--- | :--- | :--- | :--- |
| Leakage | 0.1425 nW | 0.5701 nW | 0.6419 nW | 2.5677 nW |

The width of both PMOS and NMOS transistors was chosen to be 200 nm . The reason for this is that the rise time and fall time were balanced for the inverter.

By inspecting the delay in various combinatorial cells, the balance in rise time and fall time is not necessarily balanced. The maximum delay for rising and falling output for a selection of cells is presented in Table IX.

TABLE IX: Delay for rising and falling output for a selection of cells

| Cell | Delay |  |
| :--- | :--- | :--- |
|  | Rising output | Falling output |
| INV1X1 | 10.89 ns | 10.04 ns |
| INV1X4 | 7.28 ns | 7.13 ns |
| NAND2X1 | 11.04 ns | 16.64 ns |
| NOR2X1 | 20.62 ns | 10.11 ns |
| AOI222X1 | 36.83 ns | 18.89 ns |
| OAI222X1 | 23.80 ns | 28.02 ns |
| DFFX1 | 14.17 ns | 69.21 ns |
| DFFX4 | 10.58 ns | 44.07 ns |

Both inverters (X1- and X4-version) have an approximately balanced rise time and fall time, with a marginally higher rise time. NAND2X1 has a higher fall time, while NOR2X1 has a higher rise time.

The schematic in Fig. 19b shows that NAND2X1 has two NMOS transistors between the output and VSS, and one PMOS transistor between the output and VDD. Similarly, the schematic in Fig. 21b shows that NOR2X1 has only one NMOS transistor between the output and VSS, but two PMOS transistors between the output and VDD.

This can explain why there is an opposite imbalance in the rise time and fall time for the two cells. By the same reasoning, the same can be observed for AOI222X1 in Fig. 29b and OAI222X1 in Fig. 33b.

The most unbalanced cell in the library is the D-type flip flop. The rise time is over four times higher than the fall time for both X1- and X4-versions of the cell. As mentioned in Section IV-A6, the design was inspired by [8], where it is stated that the n-channel and p -channel pass gates makes the rise and fall times highly asymmetrical. In addition to the sizes of the transistors not being balanced for the cell in the given technology and implementation, this can explain the imbalance seen in these results.

1) Library characterization for different operating conditions: As mentioned in Section V-A, the library was characterized for 300 mV , under nominal operating conditions ( $25^{\circ} \mathrm{C}$ in the TT-corner). To be able to get some estimations of the effect of PVT-variations, the library was also characterized for different operating conditions. The results presented in Table IV and Appendix C presents some points that are worth to note.

When comparing INV1X1 and INV1X4, the change in delay is lower for the higher drive strength in cold conditions.

The larger combinatorial cells of AOI- or OAI-cells have a higher variation in delay than most of the smaller cells. The exceptions are NOR2X1 and OR2X1, which are more comparable to the compound gates. It can be challenging to address the severity of this variation when regarding only individual cells. Section V-D1a shows that the synthesis results improved in regards of power consumption, area, and timing by including compound gates. However, the benefits acquired must be balanced with the increase in PVT-variations.

The D-type flip-flops experienced a different variation in delay in cold conditions than the inverters. DFFX1 and DFFX4 had an increase in delay by a factor of 10.81 and 10.41 , respectively. In hot conditions, the delay was reduced by a factor of exactly 10 . However, the flip-flops are the cells that have the highest variations due to PVT-variations regarding the delay.

## B. Synthesized Designs

The synthesis and Place And Route (P\&R) was performed for three digital designs, each increasingly complex. By increasing the complexity gradually, the process of discovering and understanding the root of problems in the implementation was much easier. When transitioning to work on a more complex design, there was not an overwhelming amount of bugs, as they had been fixed after working with the simple designs.

1) Full Adder: The first design that was synthesized was a full adder. Even if the design is quite simple, several steps had to be performed correctly to produce any significant simulation results. The first step was to ensure that the library files were readable and compatible with the synthesis tool. Secondly, the P\&R had to have correct Library Exhange Format (lef) files to be able to place the cells as they were intended in the design phase. Additionally, the routing had to be performed, which also required a correct lef file. The layout and Verilog netlist had to be exported correctly, which had to pass Design Rule Check (DRC) and Layout Versus Schematic (LVS) checks. When this was finished, the parasitic components could be extracted, and simulation could be performed.

The simulation results are presented in Section V-B. A representation of the results has been prepared in Table X to make the results more readable. Notice that the time in the table is on each positive clock edge. The outputs ( $S$ and $C O$ ) are measured by the value that appears shortly after the positive clock edge.

By comparing these results to a truth table for a full adder (for example, in the datasheet for FAX1 in Appendix B, or in [6]), it is clear that the full adder is functional. This indicates that the synthesis, P\&R and parasitic extraction provided a netlist that represents a functional full adder.

TABLE X: Digital results, representing the values from Fig 8

| Time | A | B | CI | S | CO |
| :--- | :--- | :--- | :--- | :--- | :--- |
| $1 \mu \mathrm{~s}$ | 0 | 0 | 0 | 0 | 0 |
| $2 \mu \mathrm{~s}$ | 1 | 0 | 0 | 1 | 0 |
| $3 \mu \mathrm{~s}$ | 0 | 1 | 0 | 1 | 0 |
| $4 \mu \mathrm{~s}$ | 1 | 1 | 0 | 0 | 1 |
| $5 \mu \mathrm{~s}$ | 0 | 0 | 1 | 1 | 0 |
| $6 \mu \mathrm{~s}$ | 1 | 0 | 1 | 0 | 1 |
| $7 \mu \mathrm{~s}$ | 0 | 1 | 1 | 0 | 1 |
| $8 \mu \mathrm{~s}$ | 1 | 1 | 1 | 1 | 1 |

When inspecting the output signals, $S$ and $C O$, it is clear that the slew for a rising edge is much steeper than the falling edge. A probable cause for this is the long fall time on the output of the DFFX1 and DFFX4 cells, as discussed in Section VI-A.
2) 8-bit Counter: The second digital design that was implemented was an 8 -bit counter, presented in Section IV-C2. Still a simple design, but it utilized more cells than the full adder. Although a lot was corrected in the standard cell library when implementing the full adder, the increased complexity of the counter allowed more problems to be discovered. With an increased number of utilized cells, some edge cases had to be taken into account.

When the necessary steps had been completed correctly, the counter was simulated to produce the results presented in Section V-C. The simulation results are presented in Table XI for readability.

TABLE XI: Digital results, representing the values from Fig 9

| Time | reset | enable | load | data[7:0] (HEX) | out[7:0] | out[7:0](HEX) |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $1 \mu \mathrm{~s}$ | 1 | 0 | 0 | 0x00 | 00000000 | 0x00 |
| $2 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000001 | 0x01 |
| $3 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000010 | 0x02 |
| $4 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000011 | 0x03 |
| $5 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000100 | 0x04 |
| $6 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000101 | 0x05 |
| $7 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 00000110 | 0x06 |
| $8 \mu \mathrm{~s}$ | 0 | 0 | , | 0x55 | 01010101 | 0x55 |
| $9 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01010110 | 0x56 |
| $10 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01010111 | 0x57 |
| $11 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01011000 | 0x58 |
| $12 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01011001 | 0x59 |
| $13 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01011010 | 0x5A |
| $14 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01011011 | 0x5B |
| $15 \mu \mathrm{~s}$ | 0 | 1 | 0 | 0x00 | 01011100 | 0x5C |

When the counter is enabled at time $2 \mu \mathrm{~s}$, the counter starts incrementing the output. The incrementation is performed every clock cycle until the enable signal goes low at time $8 \mu \mathrm{~s}$. At this time, $0 x 55$ is written to data and load is set high. This results in $0 x 55$ on the output. When enable is high again, the output is incremented every clock cycle, starting from $0 x 55$.

The output of the counter works as described by the RTL file, which further verifies that the standard cell library is functional.

The same difference in fall time and rise time can be observed for the counter as for the full adder.
3) PicoRV32: The layout of the CPU after P\&R, shown in Fig. 10, shows three important points.

Firstly, the design utilized the filler cells correctly. The area that does not contain digital logic was filled as intended.

Secondly, the horizontal wires connect the VDD/VSS-rails through the cells to the power rings around the core as intended.

The last point requires closer attention to the layout. By inspecting the labels in the lower right corner, one can see that the inner power ring is connected to VDD, and the outer ring is connected to VSS. The horizontal wires for the power routing are connected in an alternating fashion to the power rings. Every second rail is connected to VDD, and every other to VSS. As there is no gap between the rails,
the figure shows that the cells are placed in an alternating fashion, where every second row is flipped.
This implies that the P\&R tool can utilize the symmetry of the cells as intended. There were no reported errors from DRC and LVS.

It should be mentioned that the core utilization of the CPU was $70 \%$, which is the default value. For products intended for production, this would probably be regarded as sub-optimal. As the wasted area is directly correlated to an increase in cost, the core utilization should be increased. However, it was kept fairly low as it was beneficial to verify that the filler cells worked correctly.

Table VI shows synthesis results with AOI- and OAI-cells included and excluded. By utilizing the compound logic gates, there was an improvement in speed, area, and static power consumption. The improvement was expected, as this was the motivation to implement the compound logic gates. However, the improvement was only approximately $5 \%$ regarding the area and approximately $6 \%$ regarding leakage. As the improvement is noticeable, it would be hard to argue against including the compound logic gates. Nevertheless, as mentioned in Section V-A, the compound logic gates had slightly higher PVT-variations, which should be taken into account.

Simulation results, presented in Table VII, demonstrates combinations of frequencies and supply voltages that were supported in simulation. The synthesis results, presented in Table V, presents other pairs of frequency and voltage that are supported. This difference can be correlated to the following points:

1) Simulation results are done on a netlist containing parasitic components
2) The netlists generated by the synthesis tool is different than the main netlist.

The second point is only relevant for the simulations of other supply voltages than 300 mV . However, the simulation results show that the supply voltage had to be increased to 350 mV to support a 3.2 mV . For this reason, the circuit was simulated for 300 mV to find the maximum clock frequency at the nominal supply voltage. This was found to be 2.1 MHz .

Fig. 13 and 14 shows that there is a logarithmic increase in both supported maximum frequency and power consumption with an increase in supply voltage.

The estimation for energy consumption, presented in Fig. 15, shows that the testbench consumes the least amount of energy with a supply voltage between 500 mV and 600 mV . This estimation was higher than expected, especially as the circuit is optimized for 300 mV . It is important to mention that the time required for each simulation varies with the frequency. Consequently, a high leakage power will have a higher impact on low frequencies. This means the results can allude to the library having a substantial leakage, as the minimum energy point is with a higher frequency.

## C. Area

Fig. 11 shows that the area can change drastically by synthesizing for higher supply voltages. When comparing the synthesis results for 1 V and 250 mV , the decreased supply voltage results in an increase in area with a factor of approximately 2.5 .

One possible reason is that by decreasing the supply voltage, more cells with higher drive strength are required. Additionally, the synthesis may add buffers to enforce that the timing requirements are met. As mentioned in Section IV-A4, only tri-state buffers were implemented in the library. Therefore, digital buffers must be constructed by the synthesis tool by using two inverters instead.

By inspecting Fig. 12, it is a clear trend for the different supply voltages.

- The total number of inverters increased for a lower supply voltage.
- For a lower supply voltage, more cells with higher drive strength were utilized.

In every case, there were a total of 1013 D-type flip-flops. However, for 300 mV , not a single DFFX1 logic gate was utilized. Similarly, for 1 V, only DFFX1 cells were used. For a supply voltage of 500 mV , a combination of the two drive strengths was utilized.

As the area seems to be correlated with the utilized drive strengths, it might be beneficial to implement more versions. For example, X2- and X3-versions could allow the synthesis tool to optimize a little further for area.

As shown in Section IV-A1, the increase in drive strength is realized by using a multiplier for the transistors, which practically simply uses multiple transistors in parallel. However, there are no overlapping nodes of the X4-versions. By inspecting the layout of INV1X4 in Fig. 17c, one can see that the input A is connected to every second strip of poly. This suggests that every second poly-strip is nonactive and simply wastes area. To rectify this, connecting the non-active poly-strip to the input, would in practice act as the double amount of active transistors, sharing source and drain. This would exploit the area more efficiently. The drawbacks regarding extra leakage and possible parasitic components would have to be taken into account.

The library does not have any simple digital buffers. This results in a large amount of inverters being used to act as digital buffers. Although the standard cell library is functional, using two inverters is not optimized for size. By inspecting the layout of INV1X1 in Fig. 16c, one can see that there is unused space to the right of the transistors. When two inverters are placed by the P\&R tool, this area can not be utilized.

A digital buffer could be easily be implemented more compact than the use of inverters allow, as the mentioned empty space could be utilized. Additionally, the transistors could share the source connected to VDD/VSS rails. If this was designed to be DRC clean, each buffer could be even more compact.

Especially for low voltage implementation, where a large number of extra inverters were added, could this be beneficial.

## VII. CONCLUSION

The testbenches of the various synthesized designs provide positive results. All the designs function as they are supposed, which implies that the standard cell library has been implemented correctly to be compatible with a standard design flow. There are some possible areas of improvement with regard to the performance of the library.

The initial idea of using equal transistor size in each cell of the library was to simplify the design process and find the balanced timing for the inverter. However, when the cells became larger and had an imbalanced pull-up and pull-down network, the rise time and fall time became imbalanced as well. NOR2X1, AOI112X1, AOI212X1 and AOI222X1 had approximately twice the delay for rising output, compared to falling output. The DFFX1 and DFFX4 had a delay for falling output that was between four and five times the delay for rising output.

By increasing transistor widths in the pull-up network (pull-down in the case of DFFX1 and DFFX4), the speed could be improved. Subsequently, decreasing the widths in the pull-down network(pull-up for DFFX1 and DFFX4), the leakage could be reduced.

Combining these ideas would improve the balance of the rise time and fall time of the cells, without the full cost in power consumption. This could impact mismatch of transistors negatively, which would need to be taken into consideration.

## A. Using SLVT transistors

The standard cell library has been implemented with Super Low VT (SLVT) transistors. The low threshold transistors results in higher speed, but also higher power consumption. Simulation of the PicoRV32 alludes to a dominant passive power consumption. For the given testbench, the results suggest that lowering the leakage of the transistors would be beneficial. The simplest approach would be to use transistors with a higher threshold voltage.

By characterizing the library with other transistors, with other threshold voltages, multiple libraries could be made available. The user could then choose the library that is most suitable for the application, with the optimal balance of speed and leakage.

## B. Drive Strength

The higher drive strength for the inverter cell, INV1, has two noticeable advantages: higher speed and lower PVT-variations. This was the expected behavior. On the other hand, the DFFX4 did surprisingly not seem to have the advantage of reduced PVT-variations.

In both cases, the higher drive strength resulted in a larger area and higher power consumption. This was naturally the expected behavior.

Implementing extra possible drive strengths for all cells could benefit the flexibility for the synthesis tool to optimize for speed, area, and power consumption.

## C. PicoRV32

PicoRV32 was the largest and most complex synthesized design that the standard cell library was tested with. That the standard design flow provided a functional netlist of the CPU was a good indicator that the library is functional.

The simulation results indicate that the processor is most energy-efficient with a supply voltage between 500 mV and 600 mV , which was surprisingly high. However, as SLVT transistors are used to implement all cells, a high leakage should be expected. This could explain why a higher supply voltage, and consequently a higher clock frequency, resulted in less energy consumed.

As the leakage is high, the current implementation of the standard cell library could perform well in a system that requires higher speed or with proper power gating. If the library is intended to be used in implementations with lower power consumption, addressing the leakage should be considered.

## Bibliography

[1] C. Steinsland, "Design and implementation of a digital standard cell library for 28 nm technology with a 300 mv supply voltage," December 2020.
[2] A. Jambek, A. NoorBeg, and M. Ahmad, "Standard cell library development," in ICM'99. Proceedings. Eleventh International Conference on Microelectronics (IEEE Cat. No.99EX388), 1999, pp. 161-163.
[3] P. Gelsinger, "Microprocessors for the new millennium: Challenges, opportunities, and new frontiers," in 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177), 2001, pp. 22-25.
[4] "Standard cell library/library exchange format (lef)," https://www.csee.umbc.edu/~cpatel2/links/641/ slides/lect04_LEF.pdf, accessed: 2021-05-18.
[5] L. Scheffer, L. Lavagno, and G. Martin, EDA for IC System Design, Verification, and Testing (Electronic Design Automation for Integrated Circuits Handbook). USA: CRC Press, Inc., 2006.
[6] N. Weste and D. Harris, Integrated Circuit Design. Pearson, 2011.
[7] E. Janson and T. Johansson, "Creation of standard cell libraries in sub-micron processes," http: //www.diva-portal.org/smash/get/diva2:20175/FULLTEXT01.pdf, 2005, accessed: 2021-06-02.
[8] E. Låte, A. A. Vatanjou, T. Ytterdal, and S. Aunet, "Comparative analysis of flip-flop architectures for subthreshold applications in 28 nm fdsoi," in 2015 Nordic Circuits and Systems Conference (NORCAS): NORCHIP International Symposium on System-on-Chip (SoC), 2015, pp. 1-4.
[9] P. Balasubramanian and N. Mastorakis, "High speed gate level synchronous full adder designs," WSEAS Transactions on Circuits and Systems, vol. 8, pp. 290-300, 012009.
[10] S. Fini, "Sub-threshold design of arithmetic circuits: when serial might overcome parallel architectures," 2019.
[11] P. J. Hurley, A Concise Introduction to Logic, 12, Ed. Open SUNY Textbooks, 2017.

## ApPENDIX

## A. Presentation of standard cells

In this section all standard cells are presented with schematic, symbol and the layout.


Fig. 16: Standard cell: INV1X1

(c) Layout of INV1X4

Fig. 17: Standard cell: INV1X4

## 

(a) Symbol for BUFF1X1

(b) Schematic of BUFF1X1

(c) Layout of BUFF1X1

Fig. 18: Standard cell: BUFF1X1

(a) Symbol for NAND2X1

(b) Schematic of NAND2X1

(c) Layout of NAND2X1

Fig. 19: Standard cell: NAND2X1

(a) Symbol for AND2X1

(c) Layout of AND2X1

Fig. 20: Standard cell: AND2X1

(a) Symbol for NOR2X1


Fig. 21: Standard cell: NOR2X1

## $B-T$

(a) Symbol for OR2X1

(b) Schematic of OR2X1

(c) Layout of OR2X1

Fig. 22: Standard cell: OR2X1

(a) Symbol for XNOR2X1

(c) Layout of XNOR2X1

Fig. 23: Standard cell: XNOR2X1

(a) Symbol for XOR2X1

(c) Layout of XOR2X1

Fig. 24: Standard cell: XOR2X1


Fig. 25: Standard cell: AOI12X1

(a) Symbol for AOI22X1

(b) Schematic of AOI22X1

(c) Layout of AOI22X1

Fig. 26: Standard cell: AOI22X1

(a) Symbol for AOI112X1


Fig. 27: Standard cell: AOI112X1

(a) Symbol for AOI212X1


Fig. 28: Standard cell: AOI212X1

(a) Symbol for AOI222X1

(b) Schematic of AOI222X1

(c) Layout of AOI222X1

Fig. 29: Standard cell: AOI222X1

(a) Symbol for OAI12X1

(b) Schematic of OAI12X1

(c) Layout of OAI12X1

Fig. 30: Standard cell: OAI12X1

(a) Symbol for OAI22X1


Fig. 31: Standard cell: OAI22X1


Fig. 32: Standard cell: OAI211X1

(a) Symbol for OAI222X1

(b) Schematic of OAI222X1

(c) Layout of OAI222X1

Fig. 33: Standard cell: OAI222X1

(a) Symbol for MUX2X1

(c) Layout of MUX2X1

Fig. 34: Standard cell: MUX2X1

# $\mathrm{D}-\square-\mathrm{Q}$ <br> CLK- 

(a) Symbol for DFFX1

(b) Schematic of DFFX1

(c) Layout of DFFX1

Fig. 35: Standard cell: DFFX1

(a) Symbol for DFFX4

(b) Schematic of DFFX4

(c) Layout of DFFX4

Fig. 36: Standard cell: DFFX4


Fig. 37: Standard cell: DLX1

(a) Symbol for FAX1

(b) Schematic of FAX1

(c) Layout of FAX1

Fig. 38: Standard cell: FAX1

## B. Datasheets




END Cell Group INV1X1

Cell Group INV1X4 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


Truth Table


Footprint:


Leakage


Pin Capacitance

Delays(ns) to Y rising:

|  |  | Delay (ns) |  |  |
| :---: | :---: | :---: | :---: | :---: |
| Cell | Timing Arc (Dir) | first | mid | last |
| INV1X4 | $\mathrm{A} \rightarrow \mathrm{Y}(\mathrm{FR})$ | 1.0398 | 2.8032 | 7.2760 |

Delays(ns) to Y falling:


Power
Internal switching power( pJ ) to Y rising :

| Cell |
| :--- |

Internal switching power( pJ ) to Y falling :

| Cell |
| :--- |

END Cell Group INV1X4

Cell Group BUFF1X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function

| + | Fin Name \| | + |
| :--- | :--- | :--- |
| + | Y \| | + |
| + |  | A |
| + | + |  |
| + |  |  |

Truth Table


Footprint:



Passive Power
Hidden power(pJ) for A rising:
Conditional


Hidden power(pJ) for A falling:
Conditional


Hidden power(pJ) for E rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| BUFF1X1 | 0.0000 | 0.0000 | 0.0000 |
| BUFF1X1 | 0.0001 | 0.0000 | 0.0000 |
| BUFF1X1 | 0.0000 | 0.0000 | 0.0000 |
| BUFF1X1 | 0.0001 | 0.0001 | 0.0000 |

Hidden power( pJ ) for E falling:
Conditional


END Cell Group BUFF1X1

Cell Group NAND2X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


## Truth Table

| Input |  | Output |
| :---: | :---: | :---: |
| A | B | Y |
| 0 | X | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

Footprint:


Leakage


Pin Capacitance


Delay
Delays(ns) to Y rising:


Delays(ns) to Y falling:


Power

Internal switching power( pJ ) to Y rising:


Internal switching power(pJ) to Y falling :


Hidden power ( pJ ) for A falling:
Conditional


Hidden power( pJ ) for B rising:
Conditional


Hidden power( pJ ) for B falling:
Conditional


END Cell Group NAND2X1

Cell Group AND2X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Truth Table

| Input |  | Output |
| :---: | :---: | :---: |
| A | B | Y |
| 0 | X | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

Footprint:


Leakage


Pin Capacitance

| $+\quad \mid$ |
| :--- |
| + |
| + |

Delay
Delays(ns) to Y rising:


Delays(ns) to Y falling:



Hidden power( pJ ) for $B$ falling :
Conditional


END Cell Group AND2X1

Cell Group NOR2X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


Truth Table

| Input |  | Output |
| :---: | :---: | :---: |
| A | B | Y |
| 0 | 0 | 1 |
| x | 1 | 0 |
| 1 | x | 0 |

Footprint:


Pin Capacitance


Delay
Delays(ns) to Y rising:



Hidden power( pJ ) for B falling :
Conditional


END Cell Group NOR2X1


Delays(ns) to Y falling :


Power
Internal switching power(pJ) to Y rising:


Internal switching power( pJ ) to Y falling:


Passive Power
Hidden power(pJ) for A rising:
Conditional


Hidden power(pJ) for A falling:
Conditional


Hidden power ( pJ ) for $B$ rising:
Conditional

| \| |  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: | :---: |
| \| Cell |  | first | mid | 1ast |
| \| | OR2X1 | -0.0000 | $-0.0000$ | $-0.0000$ |
| \| | OR2X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power( pJ ) for B falling:
Conditional


END Cell Group OR2X1



Cell Group XOR2X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Delay
Delays(ns) to Y rising:
Conditional


Delays(ns) to Y falling:
Conditional



Cell Group AOI12X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Truth Table

| Input |  | Output |  |
| :---: | :---: | :---: | :---: |
| A | B | C | Y |
| 0 | x | 0 | 1 |
| x | X | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | X | 0 |

Footprint:




Hidden power(pJ) for A falling:
Conditional

| Cell |
| :--- |

Hidden power(pJ) for $B$ rising:
Conditional

|  |  |  |  |  |
| ---: | ---: | ---: | ---: | ---: |

Hidden power( pJ ) for B falling:
Conditional


Hidden power ( pJ ) for C rising:
Conditional

|  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| AOI12X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI12X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power ( pJ ) for C falling:
Conditional


END Cell Group AOI12X1

Cell Group AOI22X1 from Library 28 nm _slvt, Process corner , Temp 25.00, Voltage 0.30

Function

+ Pin Name |
+ 
+ 
+ 
+ 

Truth Table


Footprint:



Passive Power
Hidden power(pJ) for A rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| AOI22X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI22X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI22X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI22X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI22X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI22X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI22X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI22X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for A falling:
Conditional
Cell
AOI22X1
AOI22X1
AOI22X1

Hidden power(pJ) for $B$ rising:
Conditional

|  |  | Power(pJ) |  |  |  |
| ---: | ---: | ---: | ---: | :---: | :---: |
|  | Cell | first | mid |  |  |

Hidden power ( pJ ) for B falling:
Conditional

| Cell |
| :--- |

```
Hidden power(pJ) for C rising
Conditional
\begin{tabular}{|r|r|r|r|}
\hline & & \multicolumn{3}{|c|}{} & \\
\hline
\end{tabular}
Hidden power(pJ) for C falling:
Conditiona
\begin{tabular}{|c|c|c|c|}
\hline & \multicolumn{3}{|c|}{Power (pJ)} \\
\hline Cell & first & mid & 1ast \\
\hline AOI22X1 & 0.0000 & 0.0000 & 0.0000 \\
\hline AOI22X1 & -0.0000 & -0.0000 & -0.0000 \\
\hline AOI22X1 & 0.0000 & 0.0000 & 0.0000 \\
\hline AOI22X1 & -0.0000 & -0.0000 & -0.0000 \\
\hline AOI22X1 & 0.0000 & 0.0000 & 0.0000 \\
\hline AOI22X1 & -0.0000 & -0.0000 & -0.0000 \\
\hline AOI22X1 & 0.0000 & 0.0000 & 0.0000 \\
\hline AOI22X1 & -0.0000 & -0.0000 & -0.0000 \\
\hline
\end{tabular}
Hidden power(pJ) for D rising:
Conditional
\begin{tabular}{r|r|r|r|}
\hline & & & \\
\hline
\end{tabular}
Hidden power(pJ) for D falling:
Conditional
```



```
END Cell Group AOI22X1
```

Cell Group AOI112X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

## Function



Truth Table

| Input |  |  |  | Output |
| :---: | :---: | :---: | :---: | :---: |
| A | B | C | D | Y |
| 0 | X | 0 | 0 | 1 |
| 0 | x | X | 1 | 0 |
| x | x | 1 | x | 0 |
| 1 | 0 | 0 | 0 | 1 |
| 1 | 0 | x | 1 | 0 |
| 1 | 1 | x | x | 0 |

Footprint:


Leakage


Pin Capacitance


Delay
Delays(ns) to Y rising:


Delays(ns) to Y falling:

|  |  |  | Delay (ns) |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Cell | Timing | Arc (Dir) | first | mid | 1ast |
| AOI112X1 |  | $\mathrm{A} \rightarrow \mathrm{Y}(\mathrm{RF})$ | 3.2276 | 7.3092 | 16.3073 |
| AOI112X1 |  | $\mathrm{B} \rightarrow \mathrm{Y}(\mathrm{RF})$ | 3.1257 | 6.8646 | 15.3259 |
| AOI112X1 |  | $\mathrm{C} \rightarrow \mathrm{Y}(\mathrm{RF})$ | 1.4740 | 4.0055 | 10.2053 |
| AOI112X1 |  | $\mathrm{D} \rightarrow \mathrm{Y}(\mathrm{RF})$ | 1.4302 | 3.9202 | 10.0664 |

## Power

Internal switching power( pJ ) to Y rising:


Internal switching power( pJ ) to Y falling :

| Cell |
| :--- |
|  |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |
| AOI112X1 |

Passive Power
Hidden power (pJ) for A rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power (pJ) for A falling:
Conditional

|  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI112X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI112X1 | 0.0000 | 0.0000 | 0.0000 |




Cell Group AOI212X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


Truth Table

| Input |  |  |  | Output |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| A | B | C | D | E | Y |
| 0 | x | 0 | X | 0 | 1 |
| 0 | x | x | x | 1 | 0 |
| 0 | x | 1 | 0 | 0 | 1 |
| x | x | 1 | 1 | x | 0 |
| 1 | 0 | 0 | X | 0 | 1 |
| 1 | 0 | x | x | 1 | 0 |
| 1 | 0 | 1 | 0 | 0 | 1 |
| 1 | 1 | X | X | X | 0 |



| Internal switching power(pJ) to Y falling: |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: |
| \| | \| | Power (pJ) |  |  |
| Cell | Input | first \| | mid | 1ast |
| AOI212X1 | A | 0.0000 | 0.0000 | 0.0000 |
| AOI212X1 | A | 0.0001 | 0.0001 | 0.0001 |
| AOI212X1 | B | 0.0000 | 0.0000 | 0.0000 |
| AOI212X1 | B | 0.0001 | 0.0001 | 0.0001 |
| AOI212X1 | C | 0.0000 | 0.0000 | 0.0000 |
| AOI212X1 | C | 0.0001 | 0.0001 | 0.0001 |
| AOI212X1 | D | 0.0000 | 0.0000 | 0.0000 |
| AOI212X1 | D | 0.0001 | 0.0001 | 0.0001 |
| AOI212X1 | E | 0.0000 | 0.0000 | 0.0000 |
| AOI212X1 | E | 0.0000 | 0.0000 | 0.0000 |
|  |  |  |  |  |
| $+$ |  |  | - |  |
| Passive Power |  |  |  |  |
| Hidden power(pJ) for A rising : Conditional |  |  |  |  |
| $+$ |  | Power (pJ) |  |  |
| Cell | first | mid | 1ast |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| \| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| $+$ |  |  |  |  |
| Hidden power ( pJ ) for A falling : Conditional |  |  |  |  |
| $+\quad$ Power (pJ) + |  |  |  |  |
| \| Cell | first | mid | last |  |
| + AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
| AOI212X1 | 0.0000 | 0.0000 | 0.0000 |  |
| AOI212X1 | -0.0000 | -0.0000 | -0.0000 |  |
|  |  |  |  |  |
| $+\longrightarrow$ |  |  |  |  |

Hidden power ( pJ ) for B rising:
Conditional


Hidden power(pJ) for $B$ falling:
Conditional


Hidden power(pJ) for C rising:
Conditional




Cell Group AOI222X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Truth Table


Footprint:

| Cell | Area |  |  |
| :---: | :---: | :---: | :---: |
| \| AOI222X1 | 0.0000 |  |  |
| Leakage |  |  |  |
| Leakage (nW) |  |  |  |
| Cell | Min | Avg | Max |
| AOI222X1 | 0.0000 | 0.1722 | 0.2887 |

Pin Capacitance


Delay
Delays(ns) to Y rising:

| Cell |
| :--- |


| Delays(ns) to Y falling: |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 1 |  | \| |  | lay (ns) |  |
| Cell | Timing Arc | Dir) | first | mid | 1ast |
| AOI222X1 |  | (RF) | 5.0132 | 9.3996 | 18.8914 |
| AOI222X1 |  | (RF) | 4.8809 | 8.9040 | 17.8061 |
| AOI222X1 |  | (RF) | 4.7945 | 9.1177 | 18.5286 |
| AOI222X1 |  | (RF) | 4.6128 | 8.5577 | 17.3951 |
| AOI222X1 |  | (RF) | 4.1872 | 8.4518 | 17.8059 |
| AOI222X1 |  | (RF) | 4.0339 | 7.9404 | 16.7250 |
| Power |  |  |  |  |  |
| Internal switching power( pJ ) to Y rising : |  |  |  |  |  |
| \| | \| |  | Power (pJ) |  |  |
| \| Cell | Input | first | mid | 1 ast |  |
| AOI222X1 | A | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | A | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | B | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | B | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | C | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | C | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | D | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | D | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | E | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | E | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | F | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | F | 0.0000 | 0.0000 | 0.0000 |  |
| Internal switching power( pJ ) to Y falling : |  |  |  |  |  |
| $+\quad$ Power (pJ) \|+ |  |  |  |  |  |
| \| Cell | Input | first | mid | last |  |
| AOI222X1 | A | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | A | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | B | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | B | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | C | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | C | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | D | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | D | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | E | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | E | 0.0001 | 0.0001 | 0.0001 |  |
| AOI222X1 | F | 0.0000 | 0.0000 | 0.0000 |  |
| AOI222X1 | F | 0.0001 | 0.0001 | 0.0001 |  |
|  |  |  |  |  |  |
| Passive Power |  |  |  |  |  |
| Hidden power(pJ) for A rising : Conditional |  |  |  |  |  |
| \| 1 Power (pJ) + |  |  |  |  |  |
| \| Cell | first | mid | 1as |  |  |
| AOI222X1 | -0.0000 | -0.000 | -0.000 |  |  |
| AOI222X1 | 0.0000 | 0.00 | 0.00 |  |  |
| AOI222X1 | -0.0000 | -0.000 | -0.000 |  |  |
| AOI222X1 | 0.0000 | 0.00 | 0.00 |  |  |
| AOI222X1 | -0.0000 | -0.000 | -0.000 |  |  |
| AOI222X1 | 0.0000 | 0.00 | 0.00 |  |  |
| AOI222X1 | -0.0000 | -0.000 | -0.0 |  |  |




| AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 |
| :---: | :---: | :---: | :---: |
| Hidden power( pJ ) for C falling : Conditional |  |  |  |
| $1$ | Power (pJ) |  |  |
| Cell | first | mid | 1 ast |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| $+{ }^{+}+$ |  |  |  |
| Hidden power ( pJ ) for D rising: Conditional |  |  |  |
| $+$ | Power (pJ) |  |  |
| Cell | first | mid | 1ast |
| A AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |



| AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 AOI222X1 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 | 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 |
| :---: | :---: | :---: | :---: |
| Hidden power(pJ) for E falling: Conditional |  |  |  |
| $1$ | Power ( pJ ) |  |  |
| Cell | first | mid | 1 ast |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| + + |  |  |  |
| $++$ |  |  |  |
| Hidden power( pJ ) for $F$ rising : Conditional |  |  |  |
| + | Power ( pJ ) |  |  |
| Cell | first | mid | 1ast |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |
| AOI222X1 | 0.0000 | 0.0000 | 0.0000 |
| AOI222X1 | -0.0000 | -0.0000 | -0.0000 |



Cell Group OAI12X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


Truth Table

| Input |  | O Output |  |
| :---: | :---: | :---: | :---: |
| A | B | C | Y |
| 0 | 0 | x | 1 |
| x | 1 | 0 | 1 |
| x | 1 | 1 | 0 |
| 1 | x | 0 | 1 |
| 1 | x | 1 | 0 |

Footprint:


Hidden power ( pJ ) for A rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power ( pJ ) for A falling:
Conditional

Hidden power( pJ ) for B rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for $B$ falling:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |

Hidden power ( pJ ) for C rising:
Conditional


Hidden power (pJ) for $C$ falling:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI12X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI12X1 | -0.0000 | -0.0000 | -0.0000 |

END Cell Group OAI12X1

Cell Group OAI22X1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function

+ Pin Name |
$+\quad$ Function
+ 
+ 
+ 

Truth Table


Footprint:


Leakage



Passive Power
Hidden power( pJ ) for A rising:
Conditional

|  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for A falling:
Conditional

| Cell |
| :--- |
| OAI22X1 |
| OAI22X1 |
| OAI22X1 |

Hidden power(pJ) for $B$ rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power ( pJ ) for B falling:
Conditional

| Cell |
| :--- |

Hidden power (pJ) for C rising:
Conditional

|  |  |  |  |
| ---: | ---: | ---: | ---: | ---: |

Hidden power ( pJ ) for C falling:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |

Hidden power ( pJ ) for D rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power( pJ ) for D falling:
Conditional

|  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI22X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI22X1 | -0.0000 | -0.0000 | -0.0000 |

END Cell Group OAI22X1

Cell Group OAI211X1 from Library $28 \mathrm{~nm}_{-}$slvt, Process corner, Temp 25.00, Voltage 0.30


Footprint:


Pin Capacitance


Delay
Delays(ns) to Y rising:

| Cell |
| :--- |


| Delays(ns) to Y falling: |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\mid$ \| Delay (ns) ${ }^{\text {a }}$ |  |  |  |  |  |
| \| Cell | Timing Arc | Dir) \| | first | mid | 1ast |
| OAI211X1 |  | (RF) | 5.7734 | 11.6137 | 23.9734 |
| OAI211X1 |  | (RF) | 5.2936 | 11.0401 | 23.2420 |
| OAI211X1 |  | (RF) | 5.4448 | 11.0183 | 22.9878 |
| OAI211X1 |  | (RF) | 5.0535 | 10.3382 | 21.6955 |
| Power |  |  |  |  |  |
| Internal switching power(pJ) to Y rising: |  |  |  |  |  |
| 1 | I |  | Power ( pJ ) |  |  |
| Cell | Input \| | first \| | mid | 1ast |  |
| OAI211X1 | A | 0.0001 | 0.0001 | 0.0001 |  |
| OAI211X1 | A | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | B | 0.0001 | 0.0001 | 0.0001 |  |
| OAI211X1 | B | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | C | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | C | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | D | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | D | 0.0000 | 0.0000 | 0.0000 |  |
| Internal switching power(pJ) to Y falling : |  |  |  |  |  |
| \| | \| |  | Power ( pJ ) |  |  |
| \| Cell | Input \| | first \| | mid | 1ast |  |
| OAI211X1 | A | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | A | 0.0001 | 0.0001 | 0.0001 |  |
| OAI211X1 | B | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | B | 0.0001 | 0.0001 | 0.0001 |  |
| OAI211X1 | C | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | C | 0.0001 | 0.0001 | 0.0001 |  |
| OAI211X1 | D | 0.0000 | 0.0000 | 0.0000 |  |
| OAI211X1 | D | 0.0001 | 0.0001 | 0.0001 |  |
| Passive Power |  |  |  |  |  |
| Hidden power(pJ) for A rising: Conditional |  |  |  |  |  |
|  |  | Power (pJ) |  |  |  |
| + Cell | first | mid | 1 las |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |
| OAI211X1 | -0.0000 | -0.0000 | -0.000 |  |  |
| OAI211X1 | 0.0000 | 0.0000 | 0.00 |  |  |

Hidden power( pJ ) for A falling :
Conditional


Hidden power( pJ ) for B rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI211X1 | -0.0000 | -0.0000 | 0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for $B$ falling:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |

Hidden power ( pJ ) for C rising :
Conditional

|  |  |  |  |
| ---: | ---: | ---: | ---: | ---: |

Hidden power ( pJ ) for C falling:
Conditional

| Cell |
| :--- |
|  |
|  |

Hidden power ( pJ ) for D rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI211X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI211X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for D falling:
Conditional


END Cell Group OAI211X1

Cell Group OAI222X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Truth Table

| Input |  |  |  |  | Output |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| A | B | C | D | E | F | Y |
| 0 | 0 | x | x | x | x | 1 |
| x | 1 | 0 | 0 | X | x | 1 |
| X | 1 | x | 1 | 0 | 0 | 1 |
| X | 1 | X | 1 | x | 1 | 0 |
| x | 1 | x | 1 | 1 | X | 0 |
| x | 1 | 1 | x | 0 | 0 | 1 |
| x | 1 | 1 | X | X | 1 | 0 |
| x | 1 | 1 | X | 1 | x | 0 |
| 1 | x | 0 | 0 | X | X | 1 |
| 1 | x | x | 1 | 0 | 0 | 1 |
| 1 | x | X | 1 | X | 1 | 0 |
| 1 | x | X | 1 | 1 | x | 0 |
| 1 | x | 1 | X | 0 | 0 | 1 |
| 1 | x | 1 | x | x | 1 | 0 |
| 1 | x | 1 | x | 1 | x | 0 |

Footprint:

| Cell | Area |  |  |
| :---: | :---: | :---: | :---: |
| OAI222X1 | 0.0000 |  |  |
| Leakage |  |  |  |
| Leakage (nW) |  |  |  |
| Cell | Min | Avg | Max |
| OAI222X1 | 0.0000 | 0.1825 | 0.4345 |

Pin Capacitance


Delay
Delays(ns) to Y rising:

| Cell |
| :--- |



Power
Internal switching power(pJ) to Y rising:


Internal switching power( pJ ) to Y falling :


Passive Power
Hidden power(pJ) for A rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| OAI222X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI222X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI222X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI222X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI222X1 | -0.0000 | -0.0000 | -0.0000 |
| OAI222X1 | 0.0000 | 0.0000 | 0.0000 |
| OAI222X1 | -0.0000 | -0.0000 | -0.0000 |








Cell Group MUX2X1 from Library 28 nm _slvt, Process corner, Temp 25.00 , Voltage 0.30

Function


Truth Table

| Input |  |  | Output |
| :---: | :---: | :---: | :---: |
| A 1 | B | SEL | Y |
| 0 | 0 | X | 0 |
| 0 | 1 | 0 | 0 |
| x | 1 | 1 | 1 |
| 1 | x | 0 | 1 |
| 1 | 0 | 1 | 0 |

Footprint:


Internal switching power(pJ) to $Y$ falling :
Conditional

Passive Power
Hidden power(pJ) for A rising:
Conditional

Hidden power(pJ) for A falling:
Conditional

Hidden power(pJ) for B rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | 1ast |
| MUX2X1 | $-0.0000$ | -0.0000 | -0.0000 |
| MUX2X1 | 0.0000 | 0.0000 | 0.0000 |

Hidden power(pJ) for $B$ falling:
Conditional


Hidden power (pJ) for SEL rising:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | last |
| MUX2X1 | 0.0002 | 0.0002 | 0.0001 |
| MUX2X1 | 0.0002 | 0.0002 | 0.0001 |
| MUX2X1 | 0.0000 | 0.0000 | 0.0000 |
| MUX2X1 | 0.0001 | 0.0001 | 0.0001 |

Hidden power(pJ) for SEL falling:
Conditional


END Cell Group MUX2X1

Cell Group DFFX1 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function


Truth Table

| Input |  | Output |
| :---: | :---: | :---: |
| D | CLK | Q |
| 0 | R | 0 |
| 1 | R | 1 |
| x | x | IQ |

Footprint:


Leakage




Hidden power ( pJ ) for CLK falling:
Conditional

|  | Power (pJ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | last |
| DFFX1 | 0.0000 | 0.0000 | 0.0000 |
| DFFX1 | -0.0000 | -0.0000 | -0.0000 |
| DFFX1 | 0.0001 | 0.0001 | 0.0000 |
| DFFX1 | 0.0001 | 0.0001 | 0.0000 |
| DFFX1 | 0.0000 | 0.0000 | 0.0000 |
| DFFX1 | -0.0000 | -0.0000 | -0.0000 |
| DFFX1 | 0.0000 | 0.0000 | 0.0000 |
| DFFX1 | -0.0000 | -0.0000 | -0.0000 |

END Cell Group DFFX1

Cell Group DFFX4 from Library 28 nm _slvt, Process corner, Temp 25.00, Voltage 0.30

Function

| Pin Name | Function |
| :---: | :---: |
| Q | IQ |

Truth Table


Footprint:


Pin Capacitance

Delays(ns) to $Q$ rising:

|  |  | Delay (ns) |  |  |
| :---: | :---: | :---: | :---: | :---: |
| Cell | Timing Arc(Dir) | first | mid | 1 ast |
| DFFX4 | CLK $->\mathrm{Q}(\mathrm{RR}$ ) | 4.0330 | 5.8973 | 10.5754 |

Delays(ns) to $Q$ falling:

Constraint
Constraints(ns) for D rising:

Constraints (ns) for $D$ falling:

|  |  |  |  |  |  |  |
| ---: | ---: | ---: | :---: | :---: | :---: | :---: |
|  | Cell | Check | Ref $\operatorname{Pin}($ Trans $)$ | Reference | Slew | Rate (ns) |

Constraints(ns) for CLK rising:
Conditional

| Cell | Check | Ref | Pin(Trans) | first | mid | 1ast | when |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| DFFX4 | min_pulse_width |  | CLK | 2.5316 | 2.5740 | 2.6399 | D |
| DFFX4 | min_pulse_width |  | CLK | 35.0781 | 35.1025 | 35.1270 | \| ! ${ }^{\text {d }}$ |

Constraints (ns) for CLK falling:
Conditional



Hidden power ( pJ ) for CLK falling:
Conditional

|  | Power ( pJ ) |  |  |
| :---: | :---: | :---: | :---: |
| Cell | first | mid | last |
| DFFX4 | 0.0001 | 0.0001 | 0.0001 |
| DFFX4 | -0.0001 | -0.0001 | -0.0001 |
| DFFX4 | 0.0002 | 0.0002 | 0.0001 |
| DFFX4 | 0.0003 | 0.0003 | 0.0001 |
| DFFX4 | 0.0002 | 0.0001 | 0.0001 |
| DFFX4 | -0.0001 | -0.0001 | -0.0001 |
| DFFX4 | 0.0001 | 0.0001 | 0.0001 |
| DFFX4 | -0.0000 | -0.0001 | -0.0001 |

END Cell Group DFFX4

Cell Group DLX1 from Library $28 n m_{\text {_s }}$ (vt, Process corner , Temp 25.00, Voltage 0.30

Function

| Pin Name | Function |
| :---: | :---: |
| Q | IQ |


| Input |  | Output |
| :---: | :---: | :---: |
| D | CLK | Q |
| x | 0 | IQ |
| 0 | 1 | 0 |
| 1 | 1 | 1 |

Footprint:


Pin Capacitance



Delays(ns) to Q falling:


Constraint
Constraints(ns) for D rising:


Constraints(ns) for $D$ falling:


Constraints (ns) for CLK rising:
Conditional


Power
Internal switching power( pJ ) to Q rising:



Cell Group FAX1 from Library $28 n m_{\text {_ }}$ slvt, Process corner, Temp 25.00 , Voltage 0.30
Function


Pin Capacitance


Delays(ns) to CO falling:



Internal switching power( pJ ) to S falling : Conditional


END Cell Group FAX1

## C. Library characterization for different operating conditions

In Section V-A, the characterization results under different operating conditions were presented for a selection of cells. For the library characterization results for all the cells, they are presented in the following Tables:

- $25^{\circ} \mathrm{C}$, TT-corner , Table XII
- $-20^{\circ} \mathrm{C}$, SS-corner, Table XIII
- $85^{\circ} \mathrm{C}$, FF-corner , Table XIV

TABLE XII: Library characterization results for nominal conditions $\left(25^{\circ} \mathrm{C}\right.$, TT-corner)

| Operating conditions | Cell | Maximum delay | $\Delta$ delay | Leakage power | $\Delta$ power |
| :--- | :--- | :--- | :--- | :--- | :--- |
|  | INV1X1 | 10.89 ns | 1.0 | 0.14 nW | 1.0 |
|  | INV1X4 | 7.28 ns | 1.0 | 0.57 nW | 1.0 |
|  | BUFF1X1 | 22.99 ns | 1.0 | 0.34 nW | 1.0 |
|  | NAND2X1 | 16.64 ns | 1.0 | 0.29 nW | 1.0 |
|  | AND2X1 | 16.03 ns | 1.0 | 0.35 nW | 1.0 |
|  | NOR2X1 | 20.63 ns | 1.0 | 0.15 nW | 1.0 |
|  | OR2X1 | 16.82 ns | 1.0 | 0.27 nW | 1.0 |
|  | XNOR2X1 | 25.90 ns | 1.0 | 0.46 nW | 1.0 |
|  | XOR2X1 | 25.52 ns | 1.0 | 0.54 nW | 1.0 |
|  | AOI12X1 | 21.53 ns | 1.0 | 0.29 nW | 1.0 |
|  | AOI22X1 | 22.04 ns | 1.0 | 0.30 nW | 1.0 |
| $25^{\circ} \mathrm{C}$, TT-corner | AOI112X1 | 32.35 ns | 1.0 | 0.29 nW | 1.0 |
|  | AOI212X1 | 34.08 ns | 1.0 | 0.22 nW | 1.0 |
|  | AOI222X1 | 36.83 ns | 1.0 | 0.29 nW | 1.0 |
|  | OAI12X1 | 20.47 ns | 1.0 | 0.29 nW | 1.0 |
|  | OAI22X1 | 21.68 ns | 1.0 | 0.30 nW | 1.0 |
|  | OAI211X1 | 23.97 ns | 1.0 | 0.45 nW | 1.0 |
|  | OAI222X1 | 28.02 ns | 1.0 | 0.43 nW | 1.0 |
|  | MUX2X1 | 22.79 ns | 1.0 | 0.90 nW | 1.0 |
|  | DFFX1 | 69.21 ns | 1.0 | 0.64 nW | 1.0 |
|  | DFFX4 | 44.07 ns | 1.0 | 2.57 nW | 1.0 |
|  | DLX1 | 24.47 ns | 1.0 | 0.51 nW | 1.0 |
|  | FAX1 | 49.03 ns | 1.0 | 1.73 nW | 1.0 |

TABLE XIII: Library characterization results for $-20^{\circ} \mathrm{C}$, in the SS-corner

| Operating conditions | Cell | Maximum delay | $\Delta$ delay | Leakage power | $\Delta$ power |
| :--- | :--- | :--- | :--- | :--- | :--- |
|  | INV1X1 | 48.8 ns | 4.48 | 0.01 nW | 0.06 |
|  | INV1X4 | 20.83 ns | 2.86 | 0.03 nW | 0.06 |
|  | BUFF1X1 | 151.56 ns | 6.59 | 0.02 nW | 0.06 |
|  | NAND2X1 | 89.90 ns | 5.4 | 0.02 nW | 0.06 |
|  | AND2X1 | 89.70 ns | 5.6 | 0.02 nW | 0.06 |
|  | NOR2X1 | 138.57 ns | 6.72 | 0.01 nW | 0.06 |
|  | OR2X1 | 101.45 ns | 6.03 | 0.01 nW | 0.05 |
|  | XNOR2X1 | 175.43 ns | 6.77 | 0.03 nW | 0.06 |
|  | XOR2X1 | 172.99 ns | 6.78 | 0.03 nW | 0.06 |
|  | AOI12X1 | 145.14 ns | 6.74 | 0.02 nW | 0.06 |
| $-20^{\circ} \mathrm{C}$, SS-corner | AOI22X1 | 149.42 ns | 6.78 | 0.02 nW | 0.06 |
|  | AOI112X1 | 241.41 ns | 7.46 | 0.02 nW | 0.06 |
|  | AOI212X1 | 256.52 ns | 7.53 | 0.01 nW | 0.06 |
|  | AOI222X1 | 280.24 ns | 7.61 | 0.02 nW | 0.06 |
|  | OAI12X1 | 137.98 ns | 6.74 | 0.02 nW | 0.06 |
|  | OAI22X1 | 147.44 ns | 6.8 | 0.02 nW | 0.06 |
|  | OAI211X1 | 146.75 ns | 6.12 | 0.03 nW | 0.06 |
|  | OAI222X1 | 176.20 ns | 6.29 | 0.02 nW | 0.06 |
|  | MUX2X1 | 146.06 ns | 6.41 | 0.05 nW | 0.06 |
|  | DFFX1 | 748.08 ns | 10.81 | 0.04 nW | 0.06 |
|  | DFFX4 | 458.77 ns | 10.41 | 0.14 nW | 0.06 |
|  | DLX1 | 162.34 ns | 6.63 | 0.03 nW | 0.06 |
|  | FAX1 | 375.03 ns | 7.65 | 0.10 nW | 0.05 |

TABLE XIV: Library characterization results for $85^{\circ} \mathrm{C}$, in the FF-corner

| Operating conditions | Cell | Maximum delay | $\Delta$ delay | Leakage power | $\Delta$ power |
| :--- | :--- | :--- | :--- | :--- | :--- |
|  | INV1X1 | 4.05 ns | 0.37 | 2.89 nW | 20.27 |
|  | INV1X4 | 2.17 ns | 0.3 | 11.56 nW | 20.27 |
|  | BUFF1X1 | 4.10 ns | 0.18 | 7.97 nW | 23.43 |
|  | NAND2X1 | 4.50 ns | 0.27 | 5.45 nW | 18.67 |
|  | AND2X1 | 3.73 ns | 0.23 | 8.44 nW | 23.89 |
|  | NOR2X1 | 5.76 ns | 0.28 | 5.92 nW | 40.74 |
|  | OR2X1 | 4.60 ns | 0.27 | 8.73 nW | 32.49 |
|  | XNOR2X1 | 5.84 ns | 0.23 | 11.14 nW | 24.17 |
|  | XOR2X1 | 6.00 ns | 0.24 | 10.95 nW | 20.17 |
|  | AOI12X1 | 5.92 ns | 0.27 | 6.03 nW | 20.58 |
|  | AOI22X1 | 5.93 ns | 0.27 | 6.14 nW | 20.76 |
| $85^{\circ} \mathrm{C}$, FF-corner | AOI112X1 | 7.08 ns | 0.22 | 9.07 nW | 31.22 |
|  | AOI212X1 | 7.20 ns | 0.21 | 8.88 nW | 39.73 |
|  | AOI222X1 | 7.54 ns | 0.2 | 8.72 nW | 30.21 |
|  | OAI12X1 | 5.57 ns | 0.27 | 5.94 nW | 20.31 |
|  | OAI22X1 | 5.93 ns | 0.27 | 6.08 nW | 20.31 |
|  | OAI211X1 | 5.57 ns | 0.23 | 8.24 nW | 18.48 |
|  | OAI22X1 | 6.23 ns | 0.22 | 8.00 nW | 18.42 |
|  | MUX2X1 | 4.39 ns | 0.19 | 22.97 nW | 25.53 |
|  | DFFX1 | 6.58 ns | 0.1 | 15.85 nW | 24.69 |
|  | DFFX4 | 4.58 ns | 0.1 | 63.39 nW | 24.69 |
|  | DLX1 | 5.79 ns | 0.24 | 13.99 nW | 27.49 |
|  | FAX1 | 8.22 ns | 0.17 | 44.83 nW | 25.94 |

Kunnskap for en bedre verden


[^0]:    ${ }^{1}$ PADE is available on github.
    The version used in the implementation is the following commit: 37c8cfedbf4f09642ae71d8d4ca8a19c13f1f9ce. The repository is not open to the public at the time of publication, as confidentiality of some information must be protected. This is prone to change.

[^1]:    ${ }^{2}$ By providing a low-impedance path
    ${ }^{3}$ By having a high impedance path

[^2]:    ${ }^{4}$ The full Verilog code for the CPU can be found on github. The version used in the implementation is the following commit: f9b1beb4cfd6b382157b54bc8f38c61d5ae7d785.
    A version that was forked on May 27, 2021 is available here

