Sunday, November 7, 2010

Chasing the wave on EDA research


Many voices raise recently that EDA (or VLSI CAD) is becoming a fading research area, all easy tasks have been solved and hard bones are still left untouched (EDA researchers are smart, they don't touch things like P=?NP). Things are getting worse when the future of the entire semiconductor industry are doomed by the uncertainties of the technology scaling. To keep straight publication however things are hopeless for EDA researchers, we've got to have something new (solving old problems with new ways or "creating" new problems.... I know it sounds pathetic...).

Exotic topics like bio and mobile start to show up in the conventional EDA conference like DAC and ICCAD. Yep, people like to chase the wave of new and hot stuffs, partly because that's where the big money coming from. Although being told by my Ph.D. advisor -- "don't chase the wave, stick with what you're good at", I'm a big fan of wave chaser, actually more like surfing on the front-wave of the new concepts. That keeps me awake and less bored.

This time we took one of the hottest technology -- Cloud Computing -- in EDA. We solved an old problem in logic synthesis -- Boolean matching -- using SaaS 2.0. We do have proposed a couple of interesting engineering ideas in this work, like Bloom filter and cache optimization in the key-value-based database.



For further reading, please refer to the following paper:

Chun Zhang, Yu Hu, Lei He, Lingli Wang and Jiarong Tong, Engineering a Scalable Boolean Matching Based on EDA SaaS 2.0, ICCAD 2010.

Power and fault-tolerance: strangers or fraternal twins?


I have been working with UCLA folks for FPGA fault-tolerance for a while. Before I moved to UofA, my work was primarily focused on the logic-level optimization for fault tolerance. In a cold night of February 2010 when I was walking from my office to my apartment in -20degree Edmonton, I was thinking to send something about fault-tolerance to ICCAD 2010. In that afternoon, I was going through the old proceedings of FPGA symposium and recent papers of some big names in FPGA. While I was walking, Steve Wilton's ICCAD 2003 paper (an empirical study for power-aware physical synthesis for FPGAs from mapping to routing) came across my mind. Why not to create the same one in the fault-tolerance? Since we had got all the infrastructures of the fault-tolerant synthesis here, it would be easy for us to build a fault-tolerant physical synthesis flow just by following Steve's methodology!

Steve's method was simple but effective. They took VPR which was optimized solely for timing, and changed the cost function in VPR for power. The simulated-annealing-based placement and negation-based routing are both very flexible for people to play with different cost function for various optimization objective. Steve's work was for power optimization, and they basically appended the power factors in the cost function (originally only included timing and wire length factors). I thought we could do just the exactly the same thing: appending the fault-tolerance factors in the cost function! Manu Jose, a quick-hand study in UCLA, worked out the initial implementation. It turned out it worked out amazingly!

While enjoying the preliminary success, we started to think a bit deeper along the line: why it worked? In our initial implementation, we designed and tested several cost functions for fault-tolerance and it turned out the cost function template designed by Steve in their power-optimization work worked the best in terms of fault-tolerance and delay tradeoff. This phenomenon led to a natural hypothesis: there must be some sort of connection between power and fault-tolerance in the physical synthesis!

Not long after we had such a hypothesis, we realized that power and fault-tolerance optimization problem essentially share the same structure. Think about it, power (dynamic) = V* capacitance * switch_activity; fault_rate = number_config_bits * \sum criticality_i, where "number_config_bits" is the number of configuration bits in an FPGA and "criticality_i" of a bit i is the percentage of input vectors that will cause wrong outputs when bit i is flipped (e.g., caused by soft error). "capacitance" is in fact roughly proportional to "number_config_bits" and "switch_activity" and "criticality_i" are two very related values. That's why it's likely that power-aware and fault-tolerant synthesis can be used to optimize for fault-tolerance and power, respectively. An important indication is the existing power-aware physical synthesis CAD tool can be used to optimization fault-tolerance with little or none change! Below is one of many experimental results that we've got in this project:



For further reading, please refer to the following paper:

Manu Jose, Yu Hu and Rupak Majumdar, On Power And Fault-Tolerant Optimization In FPGA Physical Synthesis, ICCAD 2010.

The source code of this work is here:

http://robust-fpga.mpi-sws.org