<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>TechGuri &#187; Place and Route</title>
	<atom:link href="http://www.techguri.com/category/place-and-route/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.techguri.com</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Wed, 10 Mar 2010 09:44:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Myth # 8 : Crosstalk aware timing fixing can be done in post-route stage only.</title>
		<link>http://www.techguri.com/2009/09/08/myth-8-crosstalk-aware-timing-fixing-can-be-done-in-post-route-stage-only/</link>
		<comments>http://www.techguri.com/2009/09/08/myth-8-crosstalk-aware-timing-fixing-can-be-done-in-post-route-stage-only/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 20:56:33 +0000</pubDate>
		<dc:creator>Alpesh Kothari</dc:creator>
		
		<category><![CDATA[Place and Route]]></category>

		<category><![CDATA[65nm]]></category>

		<category><![CDATA[crosstalk]]></category>

		<category><![CDATA[CTS]]></category>

		<category><![CDATA[ECOs]]></category>

		<category><![CDATA[EDA]]></category>

		<category><![CDATA[P&R tool]]></category>

		<category><![CDATA[placement-based optimization]]></category>

		<category><![CDATA[post-clock optimization]]></category>

		<guid isPermaLink="false">http://www.techguri.com/?p=538</guid>
		<description><![CDATA[One of the most convincing stories you can hear from any big EDA marketing person is that crosstalk is something which can only show up after the routes are laid down and thus needs to be fixed as a post-process step after routing.]]></description>
			<content:encoded><![CDATA[<p>Myth # 8 : Crosstalk aware timing fixing can be done in post-route stage only.</p>
<p>One of the most convincing stories you can hear from any big EDA marketing person is that crosstalk is something which can only show up after the routes are laid down and thus needs to be fixed as a post-process step after routing. In theory, this looks reasonable since without real knowledge of the wires how can you predict coupling between them? But, if you step back and look at this from 1000 ft view, the same theory can apply to computing wire delays. Without routing how can you predict the wire delay and thus you need to route the design to accurately predict wire delays. But wait, how are we doing placement and placement-based optimization, CTS and post-clock optimization? </p>
<p>I recently surveyed few of my customers and P&#038;R designers to check on how the big “three” handles SI in their flow:</p>
<p>Here is what I found/heard:</p>
<p><strong>Company A :</strong> Detail Route the design. Fix all the regular timing. Turn on SI aware timing and do SI based timing optimization.</p>
<p><strong>Company B :</strong> Route the design and later do SI based timing analysis and fix the regular and SI based timing. All this may be clubbed into one name but if you look at the logfile you can see the obvious!</p>
<p><strong>Company C :</strong> They try to do some SI prevention during global routing based on heuristics and later some quick optimization following track assignment. Later post-detail routing, if you still have SI based timing violations left, you need to run post-route timing optimization.</p>
<p>The question anyone can have is what is wrong with the above approaches? These are perfectly valid approaches at 90nm or 130nm where SI was not significant in the designs. At 90nm, I know designers who’ll add 100ps extra margin to account for SI and not really do SI based timing analysis in implementation tool. They will just do it in sign-off tool and later run few ECOs to fix timing on outlier paths. There were several reasons for not doing SI based timing optimization in P&#038;R tool:</p>
<p>a.	P&#038;R tool’s SI numbers wont correlate with sign-off tool’s SI prediction.<br />
b.	P&#038;R tools runs really slow especially timer and extractor in post-route mode. So, if you want to do SI based timing optimization during post-route you are talking in days.<br />
c.	SI effects were not significant at 90 and 130nm. So, you can add extra margin and/or run few ECOs to close timing.</p>
<p>All this is changing at 65nm and 40nm. The crosstalk based timing effects are really significant. I have seen 500ps to 1ns difference in timing when between regular and crosstalk aware analysis. This is huge and cannot be solved by doing ECOs or adding 100ps or so margin.</p>
<p>So, all the designers who are working at 65nm and below are forced to run crosstalk aware timing closure flow in the P&#038;R tool. The result be it company A, B or C, runtimes are loooong and no guarantee of complete SI based timing closure or correlation to sign-off tools (see my other post Myth # 9 : I need to tune R and C factors to get good correlation to sign-off tools and achieve predictable timing closure). While talking to one of my friend, I found that post-route SI aware timing optimization took 3.5 days in one tool and still it was not clean. Later, they did several (more than 10) ECOs for next 2 weeks to get timing closure!</p>
<p>This brings me back to my initial question: Crosstalk aware optimization is possible in post-route stage only? The answer is no. Any company, who wants you to believe otherwise are trying to hide the fundamental limitations with their P&#038;R software to handle this effect appropriately.</p>
<p>Let me talk about how AtopTech’s Aprisa has tried to address this very issue. Instead of thinking about SI as a post-route issue, it is handled more in-line in the tool as any other thing like doing CTS or placement. Crosstalk prevention and fixing starts as soon as the global route is laid out. Here, actual optimization engine is called to optimize the design to fix and prevent crosstalk based timing effects. All this is not done based on some heuristics but actual timing windows etc. to accurately fix the problems where they will get reported by the sign-off tool. To do all this, P&#038;R tool needs ultra-fast timer and extraction engines. Not only that, timer as well as extraction engines need to incremental to get to the next violating path faster. Both of this is achieved in Aprisa with help of multi-threaded extractor and timer. In addition, after doing global route and track assignment based SI fixing, the same is done during and after detail routing. Again, fast timer/extractor makes it possible to achieve timing closure in real time and not waiting for days to figure out if the design will be ultimately timing and routing clean.</p>
<p>In summary, if you are still struggling with timing closure on your chip, I will strongly recommend to challenge the methodologies setup for crosstalk aware timing closure…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.techguri.com/2009/09/08/myth-8-crosstalk-aware-timing-fixing-can-be-done-in-post-route-stage-only/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Myth #9: I need to tune R and C factors to get good correlation to sign-off tools and achieve predictable timing closure.</title>
		<link>http://www.techguri.com/2009/06/17/myth-9-i-need-to-tune-r-and-c-factors-to-get-good-correlation-to-sign-off-tools-and-achieve-predictable-timing-closure/</link>
		<comments>http://www.techguri.com/2009/06/17/myth-9-i-need-to-tune-r-and-c-factors-to-get-good-correlation-to-sign-off-tools-and-achieve-predictable-timing-closure/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 16:40:38 +0000</pubDate>
		<dc:creator>Alpesh Kothari</dc:creator>
		
		<category><![CDATA[Place and Route]]></category>

		<category><![CDATA[EDA place-and-route]]></category>

		<guid isPermaLink="false">http://www.techguri.com/?p=384</guid>
		<description><![CDATA[Lately, I have been hearing from a lot of customers using some of the big EDA place-and-route tools that they need to tune resistance and capacitance factors to achieve good timing correlation to sign-off tools for 65 and 40 nm designs. This started to make me wonder why is this needed.]]></description>
			<content:encoded><![CDATA[<p>Lately, I have been hearing from a lot of customers using some of the big EDA place-and-route tools that they need to tune resistance and capacitance factors to achieve good timing correlation to sign-off tools for 65 and 40 nm designs. This started to make me wonder why is this needed.</p>
<p>I dug deeper with a few of the current and potential customers to find that things were O.K. at 90 nm and above, but, with 65 nm and below designs, the prediction by P&#038;R tools of timing critical paths vastly differs from that of sign-off tools. Let’s first examine various things that may differ between implementation and sign-off:</p>
<p><strong>a.	R/C extraction</strong><br />
<strong>b.	Timers (NLDM-based non-SI timing analysis)</strong><br />
<strong>c.	Delta-delay (SI)-based timing analysis</strong><br />
<strong>d.	Settings between implementation and sign-off tools</strong></p>
<p>Let’s talk about this in more detail:</p>
<p><strong>a.	R/C extraction</strong></p>
<p>Lower geometry nodes demand more from extraction engines because there will be more wires packed in a given square micron area as compared to higher geometry nodes. In addition, there are more things being demanded from foundries to be considered while doing extraction. Typically, sign-off extractors are 3D type and can model these effects in a most precise way. With implementation tools, the extractors are 2D, or 2.5D at the most, and it becomes a challenge to make up for that last 0.5D inaccuracy. One way to make up for this inaccuracy is by tuning R/C factors in the implementation tool. </p>
<p>In addition, extracting the right coupling caps is also important, since they will directly contribute to crosstalk effects. I once heard a customer saying that a certain company just told them that the difference in coupling caps is given and you need to rely on the sign-off tool’s extracted data (mostly SPEF) to get the right correlation!</p>
<p><strong>b.	Timers</strong> </p>
<p>Most P&#038;R tools seem  to have nailed down this part well, and mostly I see it as a problem a generation ago (i.e., designs at 130 nm or 90 nm). So, assuming you got the right R/C numbers plugged in, the design can be timed accurately.</p>
<p><strong>c.	Delta-delay (SI)-based timing analysis</strong></p>
<p>Your implementation tool needs to predict the right crosstalk(delta-delay) values on each timing path based on the sign-off tool you are using. Mostly, people use PTSI or Celtic to compute the SI effects. Both of these tools use different heuristics, and most of the P&#038;R tools in the market are tuned to work with one of the sign-off tools. So, what if you own the sign-off tool from one company and implementation tool from the other? Most likely, you are hosed!!!! This is like the fox guarding the hen house, since you are forced to buy both sign-off and implementation tools from the same company to achieve results correlated to sign-off. Having said that, there are designers out there who have found a way to address this. One of the ways is by tuning the R/C factors. I won’t even touch the topic of implementation and sign-off tools from the same company and still not achieving the desired timing closure. I simply consider that as a bug in that tool!</p>
<p><strong>d.	Settings between implementation and sign-off tools</strong></p>
<p>A mostly overlooked part of the correlation process is that your sign-off tool runs at one setting to get better accuracy, but your implementation tool runs at a different setting to get better runtime. Lots of times, there are settings in your sign-off tool that are missing in the implementation tool. There is no way for you to match for those missing knobs, and your only resort is to make some adjustments to R/C factors or add some more margin in the design.</p>
<p>This brings me back to my original question: is it possible to achieve correlated results to sign-off without really tuning the R/C factors? The answer to this is yes, it’s possible. Using Aprisa, we have demonstrated to multiple semiconductor companies that it is possible to get correlated timing closure at 65 nm and below. What I am hearing and have seen with other tools is that their architecture won’t allow the desired scalability to address some of the issues I discussed here, since when they were designed, they were designed to solve different problems that existed at those tech nodes. </p>
<p>I won’t talk about how it is done in Aprisa, but if you are interested, you can invite the sales team to talk about it. ☺</p>
]]></content:encoded>
			<wfw:commentRss>http://www.techguri.com/2009/06/17/myth-9-i-need-to-tune-r-and-c-factors-to-get-good-correlation-to-sign-off-tools-and-achieve-predictable-timing-closure/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Useful Skew-Based Optimization</title>
		<link>http://www.techguri.com/2009/05/28/useful-skew-based-optimization/</link>
		<comments>http://www.techguri.com/2009/05/28/useful-skew-based-optimization/#comments</comments>
		<pubDate>Fri, 29 May 2009 03:19:05 +0000</pubDate>
		<dc:creator>Alpesh Kothari</dc:creator>
		
		<category><![CDATA[Place and Route]]></category>

		<category><![CDATA[Clock Tree Synthesis]]></category>

		<category><![CDATA[MCMM]]></category>

		<guid isPermaLink="false">http://www.techguri.com/?p=312</guid>
		<description><![CDATA[Clock trees are an integral part of any chip, and making them do what they should be doing is far less the expectation when designers try to build clock trees. Traditionally, clock trees are built to distribute clocks from the clock generator or clock port to the flip-flops or sink elements in the most efficient [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">Clock trees are an integral part of any chip, and making them do what they should be doing is far less the expectation when designers try to build clock trees. Traditionally, clock trees are built to distribute clocks from the clock generator or clock port to the flip-flops or sink elements in the most efficient way; i.e., with minimal delay and without degrading the clock characteristics.</p>
<p class="MsoNormal">But now, the demand for clock trees is entirely different. They not only need to efficiently deliver the clock from source to sink points, but also use a fewer number of cells to do it, use lower power and also aid the datapath in meeting timing. Add to that meeting basic requirements of latency and skew in multiple operating corners and across multiple operating modes.</p>
<p class="MsoNormal">Since there is a lot of buzz around the useful skew-based timing closure, in this post, I am going to focus on using CTS to help meet datapath timing by tuning the arrival or required time of the clock tree. Some companies try to claim this as clock optimization or CTS-based optimization, which I will refer to as useful skew. While most of the commercial place-and-route tools in the market support useful skew, they only do it after the fact; i.e., after the clock tree is built. More recently, some companies have claimed to do this during CTS, which can enable post-CTS timing closure. I started looking more into this and realized that this is still a half-hearted solution.</p>
<p class="MsoNormal">If you are trying to make use of useful skew to achieve timing closure, you need to do this during the entire flow. Claiming to do it during CTS or at any other step is a half-baked pie, since you are not taking advantage of all the engines to do this. Also, if all the engines in your tool are not aware of the changes you are attempting during CTS, it may back-fire and give you complete garbage results. If you are planning to allow useful skew in the design, you should have the freedom of doing it right from the placement step. The placement and optimization engine before CTS should be aware of this, and should automatically come up with necessary budgets to allocate for useful skew on paths, for which timing is hard to meet, even with an ideal clock. Doing this takes the burden off the placer and optimizer to unnecessarily keep cranking on some hard-to-meet paths, which will never achieve timing closure without help from the clock tree.</p>
<p class="MsoNormal">Now, when doing CTS, the tool should be able to build the clock tree while considering the budgets set by the placement-based optimization engine (place_opt). This will help build a correct tree by construction, which is useful skew-aware. Post-CTS, there has to be one more optimization step that is aware of this tree and makes necessary datapath optimization as well as doing local useful skew to achieve timing closure. Claiming to meet datapath timing while building the clock tree is marketing stuff, and in reality you need a clean-up step to get an accurate post-CTS timing closed design.</p>
<p class="MsoNormal">But it doesn’t end here, as most other companies would like it to. Router and post-route optimization have to be an integral part of this as well.<span> </span>The router needs to understand the exact topology for routing the clock tree to really achieve skew/latency claims, while building the clock tree and post-route optimization steps need to also do useful skew-based optimization to achieve complete timing closure. Without this crucial step, you can achieve very good results during and after CTS, but once you route the design, due to routing topology changes and SI, your datapath timing can degrade even further than place_opt can predict. The only way to achieve true timing closure is to allow post-route optimization to do more changes to the clock tree. This again requires accurate and tightly coupled engines that can do the right changes, or else one small change to the clock tree can result in 10’s or possibly 100’s of timing violations.</p>
<p class="MsoNormal">Now add to this MCMM (multi-corner and multi-mode)-based useful skew. This will add to the complexity of how much useful skew can be used, and has possible implications on other operating corners. The only way to address this is an engine that can understand the effect of applying useful skew in one corner on other corners and make necessary adjustments.</p>
<p>To summarize, unless a tool understands and applies useful skew to the full place-and-route flow, you may be not taking full advantage of useful skew. I will discuss other CTS-related topics in later posts. Please leave your comments and let me know how you liked this article, as well as topics you would like me to discuss in future posts. Also, if you have a real-world experience with some tool, let me know, I can post it as “anonymous” if you want it that way.</p>
<p><!--[if gte mso 9]&gt;  Normal 0   false false false        MicrosoftInternetExplorer4  &lt;![endif]--><!--[if gte mso 9]&gt;   &lt;![endif]--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.techguri.com/2009/05/28/useful-skew-based-optimization/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Join the discussion on Place and Route</title>
		<link>http://www.techguri.com/2009/05/07/join-the-discussion-on-place-and-route/</link>
		<comments>http://www.techguri.com/2009/05/07/join-the-discussion-on-place-and-route/#comments</comments>
		<pubDate>Thu, 07 May 2009 17:23:49 +0000</pubDate>
		<dc:creator>TechGuri Administration</dc:creator>
		
		<category><![CDATA[Place and Route]]></category>

		<guid isPermaLink="false">http://www.techguri.com/?p=222</guid>
		<description><![CDATA[Find industry latest and relevant discussion about Place and Route.]]></description>
			<content:encoded><![CDATA[<p>Find industry latest and relevant discussion about Place and Route.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.techguri.com/2009/05/07/join-the-discussion-on-place-and-route/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
