I have been working with UCLA folks for FPGA fault-tolerance for a while. Before I moved to UofA, my work was primarily focused on the logic-level optimization for fault tolerance. In a cold night of February 2010 when I was walking from my office to my apartment in -20degree Edmonton, I was thinking to send something about fault-tolerance to ICCAD 2010. In that afternoon, I was going through the old proceedings of FPGA symposium and recent papers of some big names in FPGA. While I was walking, Steve Wilton's ICCAD 2003 paper (an empirical study for power-aware physical synthesis for FPGAs from mapping to routing) came across my mind. Why not to create the same one in the fault-tolerance? Since we had got all the infrastructures of the fault-tolerant synthesis here, it would be easy for us to build a fault-tolerant physical synthesis flow just by following Steve's methodology!
Steve's method was simple but effective. They took VPR which was optimized solely for timing, and changed the cost function in VPR for power. The simulated-annealing-based placement and negation-based routing are both very flexible for people to play with different cost function for various optimization objective. Steve's work was for power optimization, and they basically appended the power factors in the cost function (originally only included timing and wire length factors). I thought we could do just the exactly the same thing: appending the fault-tolerance factors in the cost function! Manu Jose, a quick-hand study in UCLA, worked out the initial implementation. It turned out it worked out amazingly!
While enjoying the preliminary success, we started to think a bit deeper along the line: why it worked? In our initial implementation, we designed and tested several cost functions for fault-tolerance and it turned out the cost function template designed by Steve in their power-optimization work worked the best in terms of fault-tolerance and delay tradeoff. This phenomenon led to a natural hypothesis: there must be some sort of connection between power and fault-tolerance in the physical synthesis!
Not long after we had such a hypothesis, we realized that power and fault-tolerance optimization problem essentially share the same structure. Think about it, power (dynamic) = V* capacitance * switch_activity; fault_rate = number_config_bits * \sum criticality_i, where "number_config_bits" is the number of configuration bits in an FPGA and "criticality_i" of a bit i is the percentage of input vectors that will cause wrong outputs when bit i is flipped (e.g., caused by soft error). "capacitance" is in fact roughly proportional to "number_config_bits" and "switch_activity" and "criticality_i" are two very related values. That's why it's likely that power-aware and fault-tolerant synthesis can be used to optimize for fault-tolerance and power, respectively. An important indication is the existing power-aware physical synthesis CAD tool can be used to optimization fault-tolerance with little or none change! Below is one of many experimental results that we've got in this project:
For further reading, please refer to the following paper:
Manu Jose, Yu Hu and Rupak Majumdar, On Power And Fault-Tolerant Optimization In FPGA Physical Synthesis, ICCAD 2010.
The source code of this work is here:
http://robust-fpga.mpi-sws.org