# Digital Implementation



**Clock Tree Synthesis** 

After completing this unit, you should be able to:

- ☐ List the status of the design prior to CTS
- **■** Set up the design for clock tree synthesis
- ☐ Identify implicit clock tree start/end points and when explicit modifications are needed
- **□** Control the constraints and targets used by CTS
- Describe the three different skew optimization methods
- Execute the recommended clock tree synthesis and optimization flow
- Analyze timing and clock specifications post-CTS









Please turn off cell phones and pagers





Use IC Compiler II to perform placement, DFT, CTS, routing and optimization, achieving timing closure for designs with moderate to high design challenges.

3/3/2024



# The American University in Cairo Target Audience

ASIC, back-end or layout designers with experience in standard cell-based automatic Place&Route.





# The American University in Cairo High-Level IC Compiler Flow





### University in Cairo Design Status, Start of CTS Phase

- **Placement completed**
- **Power and ground nets prerouted**
- **Estimated congestion acceptable**
- Estimated timing acceptable (~0ns slack)
- **Estimated max cap/transition no violations**
- **High fanout nets:** 
  - Reset, Scan Enable synthesized with buffers
  - Clocks are still not buffered

check design -checks pre clock tree stage



Why are there no buffers on clock nets?



#### Versity in Cairo Understand Your Clock Structure First

- **Questions you should be asking:** 
  - What are the different clock trees and their roots?
  - What are the stop (sink) and exclude pins?
  - Are there any preexisting clock cells or trees?
  - Are the generated clocks defined correctly?
  - Are there converging or overlapping clock trees?
  - What are the requirements between clocks?



### iversity in Cairo Understand Your Clock Tree Goals

#### **Skew Goal**

- What are the skew requirements for your design?
- Are there different skew targets for small and large clocks?

#### **Insertion Delay Goal**

- What are the insertion delay specs for your block?
- What is a reasonable target based on the size and floorplan of your block/chip?
- Nondefault rules to prevent SI problems
- **DRC Requirements** 
  - Are signal net DRCs different from clock net DRCs?
- Find out the order of significance or importance of all the clocks in the design





# The American Understand Your Clock Tree Specs University in Cairo

- □ Check SDC constraints and verify them against the specification
  - Pay close attention to any generated clocks and muxed clocks and understand how set case analysis switches between modes of operation create clock (clock source) create generated\_clock (derived clock) set clock latency (insertion delay) set clock uncertainty (skew + jitter + margin) set clock transition (transition for sync) set case analysis (case analysis)
- ☐ Find out what clocks to set as false paths which prevent unrelated logics from optimization



# The American University in Cairo Clock Delay Problems

- ☐ All clock pins are driven by a single clock source
- ☐ All clock pins are from a source of clock pulses in various geometrical distances



#### **CTS Problem**

- CTS is the process of distributing clock signals to clock pins based on physical/layout information
- After placement of cells the tree of synchronization is synthesized
- Balanced clock tree is synchronized with the addition of buffers
- After routing CT optimization is made





# The American University in Cairo Clock Tree Goal and Metrics

- Goal
  - Basic connectivity
- **Metrics** 
  - Skew
  - Power
  - Area
  - Slew rates







# **General Concepts: Clock Skew: Definition, Causes and Effects**

Clock Skew is the time difference between the arrival of the same edge of a clock signal at the Clock pin of the capture flop and launch flop

Skew = 
$$d_1 - d_2$$

Zero skew:  $d_1 = d_2$ 

Useful skew,  $d_1 - d_2 = \delta_{12}$ 



# The American University in Cairo Clock Skew Types

- **□** Global
  - Recommended fastest
- Local
  - Longer runtime
- Useful
  - Used to fix small timing violations

© Ahmed Abdelazeem



### University in Cairo Clock Skew Types: Global



**Global skew is recommended – fastest runtime** 

(may add unnecessary buffers)



# The American University in Cairo Clock Skew Types: Local



Only related FFs are balanced for skew

Possibly fewer buffers – much longer runtime



# The American University in Cairo Clock Skew Types: Useful

☐ Useful skew- If the clock is skewed intentionally to resolve violations, it is called useful skew.



skewed intentionally to resolve violations



# The American University in Cairo Clock Skew Variations

- Process variations
- Power supply noise
- Temperature variations











All clock pins are driven by a single clock source.



# University in Cairo Clock Tree Synthesis (CTS) (1/2)



A buffer tree is built to balance the loads and minimize the skew.

© Ahmed Abdelazeem 3/3/2024



## University in Cairo Clock Tree Synthesis (CTS) (2/2)



A "delay line" is added to meet the minimum insertion delay.



# The American University in Cairo Clock Tree Synthesis



© Ahmed Abdelazeem



# The American University in Cairo Clock Trees Are Built in Two Phases

- **□** Phase 1: Clock Tree Synthesis (CTS)
  - Builds an initial load-balanced, DRC-clean clock tree
- **Phase 1: Clock Tree Optimization (CTO)** 
  - Performs cell sizing, relocation, buffer insertion
  - Tries to achieve clock tree targets

#### Phase 1: CTS

- **■** Meet the clock tree Design Rule Constraints (DRC):
  - Maximum transition delay (0.5ns, by default)
  - Maximum load capacitance (0.6pf, by default)
  - Maximum fanout (2000, by default)
  - Maximum buffer levels (50, by default)

#### Phase 1: CTO

- Meet the clock tree targets:
  - Maximum skew (0ns, by default)
  - Minimum insertion delay (Ons, by default)



Constraints are upper-bound goals. If constraints are not met, violations will be reported.

Targets are "nice to have" goals.
If targets are not met,
no violations will be reported.



### University in Cairo Clock Tree Synthesis Goal and Flows

#### Placement should be completed

- Acceptable congestion, setup timing, and logical DRCs
- High fanout signal nets (reset, scan enable, etc.) are buffered
- The goal of the clock tree synthesis design phase is to:
  - Build the clock tree buffer structure
  - Route the clock nets
  - Optimize datapath logic for setup and hold timing, and DRCs
- Two CTS flows are supported:
  - Classic CTS flow: CTS first, <u>followed</u> by data path optimization
  - Concurrent Clock & Data flow (CCD): CTS and data path optimization performed concurrently
    - Recommended for timing-critical designs



### University in Cairo Comparing Classic CTS versus CCD

#### **Classic CTS**



- Clock tree is built while ignoring the data paths
- Goal: Minimize skew
  - Allows full clock-cycle reg-toreg timing
- Data path is optimized post-CTS
  - Clock tree remains untouched

#### **Concurrent Clock & Data**



- Clock tree is built with full knowledge of data path timing
- Goal: Meet setup/hold timing
  - Reg-to-reg timing may be larger or smaller than a clock cycle
- Data path is optimized along with incremental clock tree modifications (green buffer)

**■** When are the pros and cons of setting a tight constraint for target\_skew?

**■** When are the pros and cons of setting a relaxed constraint for max\_transition?



## Where does the Clock Tree Begin and End?



To see the source(s) of a clock, you can either run report\_clocks, or use the sources attribute:

```
icc2_shell> get_attribute [get_clocks PCI_CLK] sources
{pclk}
```



# University in Cairo Balance Points: Sink and Ignore Pins

#### **Sink Pins:**

- CTS optimizes for DRC and clock tree targets (skew, insertion delay)
- Optionally, can consider internal insertion delay

#### **Ignore Pins:**

- CTS adds a small buffer "guide buffer" to isolate all pins
  - Only DRCs fixed after the guide buffer
- CTS ignores skew and insertion delay targets





#### University in Cairo Generated and Gated Clocks



Master and generated clocks are part of the same clock domain. Skew is balanced globally (across all clock pins) within each clock domain



### University in Cairo User-defined or Explicit Sink Pins

**Scenario:** If the clock pin inside a macro cell is correctly defined, CTS will treat that pin as an implicit Sink pin. In this example the clock pin is not defined. What is the problem here?

The macro's clock pin is marked as an implicit exclude "Ignore" pin – no skew optimization!





### University in Cairo Defining an Explicit Sink Pin

Defining an explicit **Sink** pin allows CTS to optimize for skew and insertion delay targets.

CTS has no knowledge of the IP-internal clock delay – it can only "see" up to the stop pin!

```
set clock balance points
      -consider for balancing true \
      -balance points [get pins IP/IP CLK]
```





# University in Cairo Defining an Explicit Sink Pin Wi h Delay





## iversity in Cairo Defining an Explicit Ignore (or Exclude) Pin

Explicit <u>ignore</u> pins cause CTS to ignore this pin and down-stream sinks for skew balancing

```
set_clock_balance_points \
    -clock Clock \
    -consider_for_balancing false \
    -balance_points [get_pins AN2/A]
```

→ CTS inserts a guide buffer; CTS engine fixes DRCs only along the clock path past ignore pins





## niversity in Cairo Defining a Clock Skew Group



Create clock skew group -mode TEST -objects {FF1/CLK FF2/CLK}



### Iniversity in Cairo Default Clock Tree Targets

**Target** 

- The default target skew and target latency is 0 ns
  - SDC uncertainty and network latency constraints are ignored
- **Relax clock skew targets for non-timing critical clocks** 
  - Reduces overall buffer count, power and run time
- **Specify network latency targets to help post-CTS timing (see below)** 
  - An alternative method using automatically-derived balance points will also be shown





### University in Cairo CCD: Skew and Latency Considerations

- For the classic CTS flow, it is important to specify accurate
- skew and latency targets For the CCD flow:
  - Target skew is not relevant, because the goal is to <u>intentionally</u> introduce skew to meet timing
  - Target latency can be applied if needed (for chip-level inter-block timing considerations, for example), but keep in mind that the final agencies have larger skews compared to classic CTS



#### Iniversity in Cairo User-Defined Clock Tree Targets

```
remove clock tree options -all -target skew -target latency
set clock tree options -target skew 0.2 -corners [all corners]
       \rightarrow max skew target of 0.2 applied to all corners,
               all clocks, all modes
current corner Cl
set clock tree options -target skew 0.1
       \rightarrow max skew target of 0.1 applied in C1, all clocks,
               all modes
set clock tree options -clocks CLK1 -target latency 1. 4
set clock tree options -clocks CLK2 -target latency 2.4
       \rightarrow latency target of 1.4 applied in C1, to CLK1 in
               current mode
       \rightarrow latency target of 2.4 applied in C1, to CLK2 in
               current mode
                                      Targets apply to:
report clock tree options

    All clocks if no clock is specified
```

• Current mode if clock is specified, otherwise all modes

• Current corner if no corner is specified



# The American University in Cairo Balancing Multiple Synchronous Clocks



By default CTS does not perform inter-clock skew balancing

→ May result in timing violations



#### University in Cairo Create Balancing Groups

- There are two ways to balance the clock groups:
  - Manual balancing constraints:

```
foreach mode {ml m2} {
       current mode $mode
       create_clock_balance_group -name grpl \
             -objects [get clocks "CL0CK1 CL0CK2"]
```

Auto-derived balancing constraints:

```
derive_clock_balance_constraints -slack less than -0.3
report clock balance groups
```

- > Constraints will be derived for all modes
- ➤ Only clocks with cross-clock paths meeting the -slack\_less\_than specification will be selected



# The American University in Cairo Balancing Occurs During clock opt

Balancing occurs during the build clock stage





# University in Cairo Inter-clock Balancing is Important for CCD

- ☐ If inter-clock balancing is not enabled for the CCD flow, CCD optimization will try to meet timing by skewing individual sinks (instead of the root)
  - Much longer runtime
  - Inserts many more buffers





#### niversity in Cairo Control CTS Cell Selection

In the following example, only \$cts\_libcells are used to build the clock trees. Modify as needed:

```
set cts libcells [get lib cells \
      "*/INVX* LVT */BUFX8 LVT */BUFX16 LVT \
      */MUX*LVT */A0* */CG* */FF*"]
set lib cell purpose -exclude cts [get lib cells]
set lib cell purpose -include cts $cts libcells
set dont touch $cts libcells false
```

- In addition to buffers/inverters, include logically equivalent cells of any gates (MUX, AND, ICG, etc.) and FlipFlops\*1 in the clock tree Allows these to be resized, if needed
- Always-on buffers should be added, to allow always-on CTS for MV designs
  - > If a clock branch needs to feed through a shut-down voltage area, always-on CTS inserts AO buffers- otherwise, must go around it



# The American University in Cairo Pre-existing clock Buffers Are Removed

**Control** Pre-existing clock buffers/inverters may negatively affect CTS QoR By default, <u>CTS removes pre-existing</u> clock buffers/inverters and builds a balanced clock tree with fewer buffers **CLOCK** CTS will only build this part of the tree

cts.compile.remove existing clock tress

Default: **true** 



# University in Cairo Preserving Pre-Existing Clock Trees



May create unnecessary additional buffers. Reported global skew is only as good as pre-existing logic skew



#### University in Cairo Non-Default Clock Routing

Physical Constraints "NDR"

- □ ICC II can route the clocks using non-default routing rules (NDR), e.g. double-spacing, double-width, shielding
  - NDR rules are treated as physical DRCs- reported as violations if not met
- Non-default rules are often used to "harden" the clock, e.g. to make the clock routes less sensitive to Cross Talk or EM effects





## niversity in Cairo Defining Non-Default Routing and Via Rules

**Define the NDR rules:** 

```
create routing rule 2xS 2xW CLK RULE \
          -widths {Ml 0.11 M2 0.11 M3 0.14 M4 0.14 M5 0.14} \
          -spacings {Ml 0.4 M2 0.4 M3 0.48 M4 0.48 M5 1.1} \
          -cuts { ...
                                                  cutNameTB1 name
                  {VIA3 {Vrect 1} } \
                  {VIA5 {Vrect 1} }}
Layer Name
                                                    Minimum # of cuts
```

- With the -cuts option you use "symbolic" via names defined in the technology file as cutNameTbl
- This simplifies the NDR definition because one symbolic via can match many specific vias and arrays



#### University in Cairo Example Technology File

```
Layer "VIAS" {
  fatTblThreshold = (0, 0.13, 0.26)
  fatTb1FatContactNuinber = ( "2,3,4",
                                     "5,6,20"
                                             , "5,6,20")
  fatTblFatContactMinCuts = ( "1,1,1", "1,1,2"
                                              "2,2,4")
  cutNameTbl = ( Vsq , Vrect )
  cutWidthTbl = (0.05,
                         0.05
  cutHeightTbl = (0.05,
                         01.013
contactCode "VIA34,LH" {
  contactCodeNumber = 5
  cutwidth
             = 0.13
  cutHeight
                    = 0.05
  . . .
contactCode "VIA34,LV" {
  contactCodeNumber = 6
  cutwidth
                    = 0.05
  cutHeight
                    = 0.13
ContactCode "VIA34 P" {
  contactCodeNumber = 20
  cutwidth = 0.05
  cutHeight = 0.05
```



# The American University in Cairo Non-Default Via Rules

- Clock nets are hardened further by often requiring 100% via optimization to enhance reliability
  - Minimum size single cut vias are replaced with:
    - Multiple-cut via arrays and/or
    - > and/or Larger "square" or "bar" cuts
- This is also defined using clock *NDR* rules



### Iniversity in Cairo Defining Via NDRs with -vias

- **■** You may also use the -vias option
- ☐ This will require you to specify the exact Contactcode names defined in the technology file
- This example is the equivalent of using

-cuts { {via3 {vrect 1} } shown two pages earlier

```
create routing rule 2xS 2xW CLK RULE \
-widths {Ml 0.11 M2 0.11 M3 0.14 M4 0.14 M5 0.14} \
-spacings {Ml 0.4 M2 0.4 M3 0.48 M4 0.48 M5 1.1} \
-vias { \
{VIA34 LH 1x1 R} {VIA34 LH 1x1 NR} {VIA34 LV 1x1 R) \
{VIA34 LV 1x1 NR} {VIA34 LH 1x2 R} {VIA34 LH 1x2 NR} \
{VIA34 LH 2x1 R} {VIA34 LH 2x1 NR} {VIA34 LV 1x2 R} \
(VIA34 LV 1x2 NR) {VIA34 LV 2x1 R} {VIA34 LV 2x1 NR}} \
##
VIA45 and VIA56 definitions go here}
```



# In a American Applying Non-Default Routing Rules

**Configure clock tree routing:** 

```
create routing rule 2xS 2xW CLK RULE ...
set clock routing rules -rule 2xS 2xW CLK RULE \
      -min routing layer M4 \
      -max_routing layer M5
```

- Note that the NDR was specified also for M1-M3
  - Although the clocks will be primarily routed on M4/M5 as shown above, they need to route on lower layers to connect to the standard cell pins, therefore should be covered by an NDR as well!

#### niversity in Cairo NDRs on Clock Nets

- □ ICC II CTS supports various types of NDR specifications when building the clock tree
  - Net-specific NDR from set\_routing\_rule <net\_list>: NDR will not get propagated to newly created nets
  - Net-specific NDR from set\_clock\_routing\_rules -nets: NDR will get propagated to newly created nets
  - Clock-specific NDR from set\_clock\_routing\_rules -clocks: Supports separate NDRs for root, sink and internal nets which are clock-specific
  - Global NDR from set clock routing rules: Supports separate NDRs for root, sink, and internal nets
- The order of priority is 1 > 2 > 3 > 4





#### Iniversity in Cairo Different NDRs for Root, Internal, Sink Nets



The highlighted options allow you to specify less aggressive NDR rules for clock tree "leaf" nets, while allowing more aggressive NDRs for the other nets. Helps to prevent routing DRC violations as well as SI issues



## The American University in Cairo Are all Clock Drivers and Loads Specified?

Ensure that all clock input ports have a slew constraint needed for accurate clock delay calculation during and after CTS

**Constraints** 





### University in Cairo Defining CTS-Specific DRC Values

- Max transition and max capacitance design rules can be specified in two ways: Library and SDC
  - ICC II will always use the smallest value
- **■** You can define your own CTS-specific DRC values:

```
set max transition 0.5 -clock path [all clocks]
set max capacitance 0.6 -clock path [all clocks]
```

Design rule constraints can be selectively applied per clock and per scenario (applies to current scenario, by default):

```
set raax transition 0.2 -clock path \
       -scenarios "SI S4" \
       [get clocks SYS CLK]
```



#### niversity in Cairo Remove Skew from Uncertainty

■ SDC constraints usually include

```
set_clock_uncertainty -setup | -hold <number> applied to each clock
```

- Models estimated effects of clock skew, jitter, and additional timing margin on setup/hold timing, pre-CTS
- After clock tree propagation, to avoid pessimistic timing analysis, remove or reduce uncertainty by the estimated skew:

```
set clock uncertainty -scenarios si -setup | -hold <SMALLER #> CLK1
set clock uncertainty -scenarios {s2 s3} -setup | -hold <SMALLER #> CLK2
# OR
remove clock uncertainty [all clocks] -scenarios [all scenarios]
```



#### niversity in Cairo Reporting Settings Summary

```
# Report clock tree max tran/cap/references/... in all clocks+modes:
report clock settings
       [-clock CLOCKS]
       [-type type]
             type can be: configurations , routing rules ,
             references, spacing rules, all
# Report clock tree target skew/latency constraints:
report clock tree options
# Report explicit balance points (sink/exclude pins), and groups:
report clock balance points
report clock balance groups
# Report non-default routing rules in a more compact way
report clock routing rules
```

- □ ICC II builds clock trees for <u>all clocks</u> in <u>all active setup and/or hold scenarios</u>
- □ CTS will also perform clock net <u>logical PRC fixing</u> on scenarios enabled for max\_transition and/or max\_capacitance
- ☐ If you do not want a scenario to be used during CTS:

```
set_scenario_status -active false {s1}
```

■ Hold fixing occurs in all scenarios that are enabled for hold:

```
set_scenario_status { s2 } -hold true
```

Control what cells are used to fix hold violations:

Control the effort for hold fixing:

```
set_app_options -list {
    clock_opt.hold.effort none|low|medium |high}
```



#### he American niversity in Cairo Minimize Hold Time Violations in Scan Paths

- **Enable scan chains reordering to minimize branch crossings**
- Can reduce hold time violations in the scan chain

```
set_app_options -list opt.dft.clock_aware_scan_reorder true
```



Without clock tree based reordering

With clock tree based reordering



#### University in Cairo Clock Reconvergence Pessimism



- To reduce OCV effects, clock trees try to share as many buffers as possible
- Remove the pessimism from the timing calculation caused by shared clock paths (late/early for launch/capture calculation)

set\_app\_options -name time.remove\_clock\_reconvergence\_pessimism -value true



### niversity in Cairo Clock Reconvergence Pessimism Removal

The pessimism is removed for setup timing by adding the CRP in the capture path, and by subtracting for hold

Example setup timing report: Added CRP in capture path





### niversity in Cairo Congestion-Aware Initial CTS

- Initial clock tree synthesis (build clock) is not congestion aware!
  - Based on virtual routing, by default
- On large complex-floorplan designs, this may lead to:
  - Pre- vs. post-route clock skew, latency, and logical DRC degradation
  - Congestion hotspots
  - Route DRCs



Enable global routing for congestion estimation and congestion-aware clock tree construction during the build clock stage:

```
set app options -list {cts.compile.enable global route true}
```



### University in Cairo Local Skew Optimization



- **Performs timing-aware clustering (as opposed to location-based)**
- Optimizes local skew directly for setup and hold timing critical pairs
- Can relax skew target where possible (if slack is positive) to save clock buffer area



### University in Cairo Local Skew Optimization

#### **Timing aware clustering:**

Clusters timing-related clock tree nodes together to improve path sharing without degrading latency.

#### **■** Automatic derivation of target skew:

- Derives target skew for clocks based on timing QoR seen with early estimation of timing QoR. As a safeguard, derived value is limited to within 3% to 10% of clock period. CTS works to meet derived skew instead of trying to achieve "0" skew and thus reducing clock buffer/cell area.
- Local skew optimization ensures that timing-related registers have minimal skew.

#### ■ Local skew optimization during CTO:

- After (traditional) global skew optimization stage in CTO, it now looks at timing QoR, and optimizes local skew for timing critical pairs to improve timing QoR.
- Makes sure that skew/latency/DRCs across scenarios are not degraded.



#### Iniversity in Cairo Enable Local Skew CTS and CTO

- When using CCD, local skew CTS and CTO are enabled by default
  - Because timing is taken into account for the initial clock tree, this reduces the work that CCD has to do to further optimize the clock tree to meet timing
- To use local skew for classic CTS, you need to enable it specifically:

```
set app options -list {
      cts.compile.enable local skew true
      cts.optimize.enable local skew true }
```

By default, local skew optimization will calculate relaxed skew targets. You might see messages like:

```
Computing global skew target ...
Setting target skew for clock: SYS 2x CLK as 0.240000
```

CTS is doing this in order to save on clock area. If, after CTS, hold timing is degrading more than what you are comfortable with, you might consider turning this feature off using the application option cts.common.enable auto skew target for local skew



# Iniversity in Cairo Restricting the Amount of CCD Skewing

- By default, CCD skews as much as needed to meet the timing
- You can limit the amount of skewing if needed, for example:

```
set app options -name ccd.max prepone -value 0.2
set app options -name ccd.max postpone -value 0.4
```

- ccd.max prepone: max latency reduction (advance) allowed to a sink
- ccd.max postpone: max latency increase (delay)
- Setting a limit can impact the timing QoR as this restricts CCD



# The American University in Cairo Skipping Path Groups during CCD

- You Can configure CCD to Skip the clock pins that belong to particular Path groups during clock latency adjustment
- Data path optimization will still be performed on these path groups

```
set app options -name ccd.skip path groups
      -value { pathgroupl {scenarioA pathgroup2} }
```

- If you specify a path group name without a scenario name, all path groups with the name specified from all scenarios are skipped
- If you specify a path group name with a scenario name, only the path group in the specified scenario is skipped



# In the American Inchine In Cairo Ignore I/O Timing for Boundary Register Skewing

- **■** It might be desirable to prevent **CCD Skewing based on I/O timing** 
  - I/O timing inaccurate or too pessimistic
  - To ignore timing on some or all I/Os for CCD:

```
In1
```

```
group path -name MY IN -from [get ports In*]
set app options -name ccd.skip path groups -value {MY IN}
clock opt
```

This allows CCD to skew the FF1 boundary register to help Path A's timing





# The American University in Cairo CCD and Boundary Registers (1 of 2)



- By default, CCD skews all registers (including boundary FFs) for timing
- To prevent latency adjustment on boundary registers, use:

ccd.optimize boundary timing  $\rightarrow$  set to false

Boundary CK pin: en launch/CK because of port din Boundary CK pin: ff l l/CK because of port din Optimize boundary timing is set to false, 2 registers will not be optimized, 20 registers will be optimized.



# Iniversity in Cairo CCD and Boundary Registers (2 of 2)



- Primary Ports are used to differentiate between boundary and internal FFs
- Ignore ports for boundary identification (reset, scan\_en, ...) using:

```
ccd.ignore_scan_reset_for_boundary_identification → set to false
```

to ignore specific ports for boundary identification, use:

```
set app options \
      -name ccd.ignore ports for boundary identification
      -value {my en en_A}
```



#### University in Cairo Hold Criticality during CCD

- Although CCD setup optimizations are hold-aware, hold violations can be introduced to improve setup timing
  - Usually, data path optimizations are able to address these hold violations
- You may want to prioritize hold during CCD if
  - Fixing hold in your design is difficult / not possible
  - Your design is area-critical, and the hold-buffer area increase is to be avoided
- To specify hold criticality during CCD, use:

```
Default: low
ccd.hold_control_effort
```

- Set to medium or high to reduce hold degradation
- Use only if hold timing is critical, since setup optimizations may degrade
- Only affects final opto, as well as CCD optimization during route opt

#### versity in Cairo Classic CTS and CCD Execution

Clock tree synthesis, clock tree routing, and data path optimization are all executed by:

- CCD is enabled using the application option clock opt.flow.enable ccd
- The four stages of clock\_opt are:

```
build clock
                route clock
                                  final opt
                                                  global route opt
```

- You can control which stages are executed with -from/- to
  - For example, to perform only clock tree synthesis (CTS+CTO):



# The American University in Cairo Classic CTS Flow Stages

| Command                                                        | What does it do?                                                             |
|----------------------------------------------------------------|------------------------------------------------------------------------------|
| <pre>clock_opt \ -to build_clock</pre>                         | Builds GR clock trees*, and performs inter-clock balancing, for minimum skew |
| <pre>clock_opt \   -from route_clock \   -to route_clock</pre> | Routes the clock nets  UpdatesI/Olatencies                                   |
| <pre>clock_opt -from final_opto</pre>                          | Performs data path optimization to meet timing and DRCs                      |

After the build\_clock phase, the clock trees are global-routed. The global routing occurs during the "optimization" (CTO) part



# The American University in Cairo Classic CTS Flow: clock\_opt vs. Atomic

#### **□** To analyze intermediate results, you can either:

- Run clock opt stages individually using -from/-to options
- Use <u>atomic</u> commands

| clock_opt stage                                                | Atomic commands                                                                                   |
|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
| <pre>clock_opt \ -to build_clock</pre>                         | <pre>synthesize_clock_trees balance_clock_groups</pre>                                            |
| <pre>clock_opt \   -from route_clock \   -to route_clock</pre> | <pre>route_group -all_clock_nets \ -reuse_existing_global_route true  compute_clock_latency</pre> |
| clock_opt -from final_opto                                     |                                                                                                   |



## The American University in Cairo Post-CTS: Post-route Clock Tree Optimization

#### **Problems:**

- Possibility of clock tree QoR degradation (skew, latency, logical DRC) due to miscorrelation between global route(build clock) and detail route(route clock)
- Clock tree QoR can further degrade due to coupling capacitance and crosstalk effects
- **Solution: Post-route clock tree optimization** 
  - Works on detail-routed clock trees to address these degradations
  - Optimizations performed include on-route buffer insertion, buffer sizing, and gate sizing

```
synthesize clock trees -postroute
```

- Post-route CTO can also be performed after signal routing
- If CCD is enabled, the command will only fix logical DRCs



#### Iniversity in Cairo CCD Power or Area Recovery

- The CCD engine can recover power or area from the clock network
  - Power/area recovered without any timing QoR impact
  - Optimization includes repeater removal, clock cell and register sizing.
  - Applies to either the classic CTS or CCD flow
  - Runs during the final opto stage of clock\_opt



- Power: Should use accurate SAIF files for accurate dynamic power calculation
- Area: Use if accurate SAIF files are not available or if area is an optimization goal





#### University in Cairo Effects of Clock Tree Synthesis

- **Clock buffers added**
- **Congestion may increase**
- Non clock cells may have been moved to less ideal locations
- **□** Can introduce new timing and max tran/cap violations





How do you handle new violations?

```
report_clock_qor \
[-type area|balance_groups|drc_violators| latency| local_skew | power | robustness| structure|summary] \
[-histogram_type latency|transition|level|...] \
[-modes ...] [-corners ...] ...
```

□ Reports max global skew (by default), late/early insertion delay, clock DRC violations, number of clock tree references (buffers), structure of the clock tree, robustness, histogram...



### Iniversity in Cairo Analyzing CTS Results: Clock Timing Report

```
report clock timing
      -type summary|transition |
       -modes {ml m2}
      -corners {cl c2} ...
```

- □ Reports actual, relevant skew, latency, inter-clock latency, etc. for paths that are related
- **☐** Includes effects of early/late timing derates
  - report\_clock\_qor does not consider derates
- **Example:**

```
report clock timing -type skew
```



#### University in Cairo Analysis Using the CTS GUI

- CTS browser
  - **Identify clock tree object** properties and attributes
  - Traverse clock tree levels
- CTS schematic
  - **Trace clock paths**
  - **Cross-probe** with layout view
- **Clock tree-levelized graph**
- Clock tree latency graph per corner



# The American University in Cairo Clock Tree Final View



- **□** Ron Rutenbar "From Logic to Layout"
- **□** Synopsys University Courseware
- **□** Synopsys Documentation
- □ IDESA
- **□** Cadence Documentation



### Thank You <sup>©</sup>