<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Unlock the Low-Power Design Puzzle with Algorithmic Synthesis</title>
	<atom:link href="http://www.techguri.com/2009/09/14/unlock-the-low-power-design-puzzle-with-algorithmic-synthesis/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.techguri.com/2009/09/14/unlock-the-low-power-design-puzzle-with-algorithmic-synthesis/</link>
	<description>Technical blog EDA, semiconductor industry</description>
	<lastBuildDate>Wed, 26 May 2010 09:37:12 -0700</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Marc Swinnen</title>
		<link>http://www.techguri.com/2009/09/14/unlock-the-low-power-design-puzzle-with-algorithmic-synthesis/comment-page-1/#comment-83</link>
		<dc:creator>Marc Swinnen</dc:creator>
		<pubDate>Fri, 18 Sep 2009 17:49:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.techguri.com/?p=554#comment-83</guid>
		<description>Fernando is right that power saving opportunities must be exploited at all levels, and he is also correct that clock gating has proven to be one of the most practical and successful techniques.

Fernando is wrong, however, when he claims that automatic clock gating is not possible after RTL coding is done. Azuro&#039;s PowerCentric clock tree synthesis product has the ability to add a significant number of clock gates at the gate level, during CTS.

The benefits of this technique are:
(1.) 20% to 40% additional clock power savings (verified by TSMC and included in their Ref Flow 10)
(2.) Complete formal equivalence with RTL
(3.) Gate enable paths meet timing because PowerCentric synthesizes the enable logic together with the CTS and places it too. So PowerCentric has full visibility into the placed-gate timing (clock and logic) and ensures its correctness.

Azuro&#039;s advanced clock gating technology is one of the reasons PowerCentric has been adopted by 4 of the top 5 semiconductor vendors in the world.</description>
		<content:encoded><![CDATA[<p>Fernando is right that power saving opportunities must be exploited at all levels, and he is also correct that clock gating has proven to be one of the most practical and successful techniques.</p>
<p>Fernando is wrong, however, when he claims that automatic clock gating is not possible after RTL coding is done. Azuro&#8217;s PowerCentric clock tree synthesis product has the ability to add a significant number of clock gates at the gate level, during CTS.</p>
<p>The benefits of this technique are:<br />
(1.) 20% to 40% additional clock power savings (verified by TSMC and included in their Ref Flow 10)<br />
(2.) Complete formal equivalence with RTL<br />
(3.) Gate enable paths meet timing because PowerCentric synthesizes the enable logic together with the CTS and places it too. So PowerCentric has full visibility into the placed-gate timing (clock and logic) and ensures its correctness.</p>
<p>Azuro&#8217;s advanced clock gating technology is one of the reasons PowerCentric has been adopted by 4 of the top 5 semiconductor vendors in the world.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc Swinnen</title>
		<link>http://www.techguri.com/2009/09/14/unlock-the-low-power-design-puzzle-with-algorithmic-synthesis/comment-page-1/#comment-81</link>
		<dc:creator>Marc Swinnen</dc:creator>
		<pubDate>Thu, 17 Sep 2009 17:13:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.techguri.com/?p=554#comment-81</guid>
		<description>Just to add some more technical background on my previous comment:

Clock gates are typically added before or during synthesis. This is a good thing, but has 2 fundamental limitations. The first limitation is that automatic clock gating techniques can only identify a limited set of gating opportunities - basically it finds explicit recirculation muxes and replaces them with a clock gate. The limitation is that no other gating opportunities are exploited. This leaves a lot of power savings on the table.
The second limitation of clock gate insertion at RTL level or higher is that design timing is basically unknown, or only very vaguely known (remember: most delay is in the wires and the wires are unknown before placement). This means that the feasibility of the clock gate enable timing is basically unknown. This is not a minor point. The timing on a clock gate enable signal is always very problematic and is one of the key limiting factors on the maximum effectiveness of clock gating. Basically, you want the clock gate to be as high up the tree as possible for maximum power savings. But for timing closure it is better to push the clock gate as low down the clock tree as possible (closer to the FFs). Finding the optimal position for the clock gate is a classic engineering trade-off problem that cannot be solved at the RTL level.

My point is that both these limitations are overcome by adding and optimizing clock gates at the gate level during CTS.  The reasons this is so are:
(a.) many more clock gating opportunities become visible at the gate level that are not visible at the RTL level.
(b.) the optimal placement of clock gates can only be done during CTS because there is no clock tree before then (duh!), and there is not enough timing information before then to determine the feasibility of the gate.
There are more issues that have to do with power and activity, but that gets us into deeper waters than I have time for in this comment.
In summary: Clock gating is indeed important, but RTL clock gating is only half the story and the other half can only be done at the gate level during CTS.</description>
		<content:encoded><![CDATA[<p>Just to add some more technical background on my previous comment:</p>
<p>Clock gates are typically added before or during synthesis. This is a good thing, but has 2 fundamental limitations. The first limitation is that automatic clock gating techniques can only identify a limited set of gating opportunities &#8211; basically it finds explicit recirculation muxes and replaces them with a clock gate. The limitation is that no other gating opportunities are exploited. This leaves a lot of power savings on the table.<br />
The second limitation of clock gate insertion at RTL level or higher is that design timing is basically unknown, or only very vaguely known (remember: most delay is in the wires and the wires are unknown before placement). This means that the feasibility of the clock gate enable timing is basically unknown. This is not a minor point. The timing on a clock gate enable signal is always very problematic and is one of the key limiting factors on the maximum effectiveness of clock gating. Basically, you want the clock gate to be as high up the tree as possible for maximum power savings. But for timing closure it is better to push the clock gate as low down the clock tree as possible (closer to the FFs). Finding the optimal position for the clock gate is a classic engineering trade-off problem that cannot be solved at the RTL level.</p>
<p>My point is that both these limitations are overcome by adding and optimizing clock gates at the gate level during CTS.  The reasons this is so are:<br />
(a.) many more clock gating opportunities become visible at the gate level that are not visible at the RTL level.<br />
(b.) the optimal placement of clock gates can only be done during CTS because there is no clock tree before then (duh!), and there is not enough timing information before then to determine the feasibility of the gate.<br />
There are more issues that have to do with power and activity, but that gets us into deeper waters than I have time for in this comment.<br />
In summary: Clock gating is indeed important, but RTL clock gating is only half the story and the other half can only be done at the gate level during CTS.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
