Place and Route

Useful Skew-Based Optimization

Clock trees are an integral part of any chip, and making them do what they should be doing is far less the expectation when designers try to build clock trees. Traditionally, clock trees are built to distribute clocks from the clock generator or clock port to the flip-flops or sink elements in the most efficient way; i.e., with minimal delay and without degrading the clock characteristics.

But now, the demand for clock trees is entirely different. They not only need to efficiently deliver the clock from source to sink points, but also use a fewer number of cells to do it, use lower power and also aid the datapath in meeting timing. Add to that meeting basic requirements of latency and skew in multiple operating corners and across multiple operating modes.

Since there is a lot of buzz around the useful skew-based timing closure, in this post, I am going to focus on using CTS to help meet datapath timing by tuning the arrival or required time of the clock tree. Some companies try to claim this as clock optimization or CTS-based optimization, which I will refer to as useful skew. While most of the commercial place-and-route tools in the market support useful skew, they only do it after the fact; i.e., after the clock tree is built. More recently, some companies have claimed to do this during CTS, which can enable post-CTS timing closure. I started looking more into this and realized that this is still a half-hearted solution.

If you are trying to make use of useful skew to achieve timing closure, you need to do this during the entire flow. Claiming to do it during CTS or at any other step is a half-baked pie, since you are not taking advantage of all the engines to do this. Also, if all the engines in your tool are not aware of the changes you are attempting during CTS, it may back-fire and give you complete garbage results. If you are planning to allow useful skew in the design, you should have the freedom of doing it right from the placement step. The placement and optimization engine before CTS should be aware of this, and should automatically come up with necessary budgets to allocate for useful skew on paths, for which timing is hard to meet, even with an ideal clock. Doing this takes the burden off the placer and optimizer to unnecessarily keep cranking on some hard-to-meet paths, which will never achieve timing closure without help from the clock tree.

Now, when doing CTS, the tool should be able to build the clock tree while considering the budgets set by the placement-based optimization engine (place_opt). This will help build a correct tree by construction, which is useful skew-aware. Post-CTS, there has to be one more optimization step that is aware of this tree and makes necessary datapath optimization as well as doing local useful skew to achieve timing closure. Claiming to meet datapath timing while building the clock tree is marketing stuff, and in reality you need a clean-up step to get an accurate post-CTS timing closed design.

But it doesn’t end here, as most other companies would like it to. Router and post-route optimization have to be an integral part of this as well. The router needs to understand the exact topology for routing the clock tree to really achieve skew/latency claims, while building the clock tree and post-route optimization steps need to also do useful skew-based optimization to achieve complete timing closure. Without this crucial step, you can achieve very good results during and after CTS, but once you route the design, due to routing topology changes and SI, your datapath timing can degrade even further than place_opt can predict. The only way to achieve true timing closure is to allow post-route optimization to do more changes to the clock tree. This again requires accurate and tightly coupled engines that can do the right changes, or else one small change to the clock tree can result in 10’s or possibly 100’s of timing violations.

Now add to this MCMM (multi-corner and multi-mode)-based useful skew. This will add to the complexity of how much useful skew can be used, and has possible implications on other operating corners. The only way to address this is an engine that can understand the effect of applying useful skew in one corner on other corners and make necessary adjustments.

To summarize, unless a tool understands and applies useful skew to the full place-and-route flow, you may be not taking full advantage of useful skew. I will discuss other CTS-related topics in later posts. Please leave your comments and let me know how you liked this article, as well as topics you would like me to discuss in future posts. Also, if you have a real-world experience with some tool, let me know, I can post it as “anonymous” if you want it that way.

Discussion

One comment for “Useful Skew-Based Optimization”

  1. Interesting article, well done..
    I think you are almost on the right track..

    Here are some things to ponder:

    What we need to think about is not just ‘useful skew’ but ‘useless skew’ as well.
    i.e. skew is not relevant and has no physical meaning, when registers/sequential elements don’t communicate..

    The process of determining the ’skew requirements’ for each reg-reg pair is called ‘clock scheduling’..
    Which is a challenging computationally intensive problem..

    To make a clock tree that reflects the design intent, it is required to build a clock-schedule, and then implement this..

    Enclosed is a link to an article that describes this, if you are interested:
    http://www.eetimes.com/showArticle.jhtml?articleID=181500545

    Regards
    Sandeep Srinivasan

    Posted by sandeep srinivasan | June 18, 2009, 12:22 pm

Post a comment