Automatic Mapping of Khoros-based applications to Adaptive Computing Systems S. Natarajan, B. Levine, C. Tan, D. Newport and D. Bouldin Electrical and Computer Engineering University of Tennessee Knoxville, TN 37996-2100 senthil@microsys7.engr.utk.edu, bouldin@microsys7.engr.utk.edu (423) 974-5414 ABSTRACT Adaptive computing systems (ACS) exploit recent advances in field-programmable gate array technology to provide flexible hardware accelerators. However, the mapping of an application onto an ACS has traditionally taken months to develop and debug. The goal of the project described here is to develop and demonstrate a software design environment which will perform this mapping in just a few minutes. The environment permits high-level design entry using the popular Khoros Cantata visual programming software[5][6]. Thus, application developers need not be familiar with the details of the hardware. Using Cantata, an image-processing designer can develop a Khoros workspace to implement an algorithm by connecting Khoros glyphs together. These glyphs can be thought of as C subroutines, while the Cantata workspace can be viewed as a main C program that invokes these subroutines and controls the data flow from one subroutine to another. Therefore, the objectives of our translation software (which we call CHAMPION) are to parse a Khoros workspace and automatically translate the design captured by the workspace into a form that can be executed on a multi-FPGA platform. The first multi-FPGA platform chosen for CHAMPION to target is a Wildforce board that contains five Xilinx FPGAs. A library of Khoros glyphs has been developed for CHAMPION to use for its translations. This library contains glyphs that are referred to as hardware- equivalent glyphs. A Khoros user who wishes to use CHAMPION must develop his workspace using these hardware-equivalent glyphs. A similar hardware glyph library of pre-compiled XNF files has also been developed for the targeted Xilinx FPGA architecture. CHAMPION will partition, place, and route these pre- compiled hardware glyphs as described in the Khoros workspace. It is therefore necessary to ensure the same functionality between the hardware-equivalent glyphs and the hardware glyphs (XNF files). The first step towards achieving this goal is defining precisely a glyph's parameters and its explicit functionality. A hardware-equivalent glyph is developed using a high level programming language such as C++. This developed glyph is then used to generate output vectors for a set of applied input test vectors. These input and output vectors together constitute a test bench that can be used for computer simulations of the hardware glyph. Once a hardware-equivalent glyph is completed, the hardware description language VHDL is used to model a hardware version of the glyph. By developing the hardware using VHDL, the same source code can be used to develop hardware glyphs for other target architectures such as the new Xilinx VIRTIX and Altera CPLDs. Using CAD tools, this code is synthesized to generate a digital circuit, which is then targeted for programmable gate arrays using commercial place and route CAD software. The synthesized circuit is simulated with the previously generated test bench to ensure the validity of the hardware glyph. At this stage, the hardware glyph is ready for execution on the FPGA platform. Verification of the hardware execution entails applying the same test bench to the hardware and observing its outputs. The hardware glyph passes this final verification stage when its results concur with those of the computer-aided simulations. Once verified, the hardware glyph is characterized in terms of size, delay, power consumption, and functionality. A target recognition algorithm has been implemented as a Khoros workspace using the hardware-equivalent glyph library[4]. This workspace has been mapped manually to execute on the Wildforce board using the pre-compiled hardware glyphs. The manual mapping of the algorithm will be used as a comparison against the automatic mapping generated by CHAMPION. CHAMPION's processing of the workspace is two-fold. First, it partitions the design according to the specifications of the Wildforce board. The partitioning of the workspace is primarily based on the size of each hardware glyph and the net counts between adjacent FPGAs. Next, once CHAMPION achieves both an acceptable and feasible partition, it then automatically interconnects the pre-compiled hardware glyphs to generate the appropriate bit files needed for the execution of the design on the Wildforce board. Our approach to the partitioning problem is based on the variant of standard move-based bipartition heuristics [1][3]. The multi- way partitioning is achieved by recursively applying the bipartitioning algorithms to the netlist of the design until it is split into the required number of sub-netlists. The approach to this partitioning problem is similar to the method described in [2]. For our first implementation, we arranged the FPGAs on Wildforce board into a linear array. With this board topology, the multi-way partitioning order proceeded in a left-to-right direction starting from FPGA0 and ending at FPGA4. The first bipartition split the netlist of the Khoros workspace into two unbalanced sub-netlists such that one of the sub-netlists satisfied the size and the number of pins constraints of the first FPGA. Then we again apply the same bipartition technique to the remaining sub-netlist to obtain the second partition, which is then mapped to the second FPGA. We continue applying this bipartition technique to obtain sub-netlists for the remaining FPGAs. After partitioning, we add the rquired I/O ports to the structural description of each sub-netslist. Each of these sub- netlists is then placed and routed to obtain the configuration file that is downloaded to the corresponding FPGA on the Wildforce board. A detailed comparison between the automatic and manual mapping results will be presented at the conference. Acknowledgement: The authors gratefully acknowledge the support of DARPA grant F33615-97-C-1124. References: [1] C. M. Fiduccia and R. M. Mattheyses, "A Linear-Time Heuristic for Inmproving network partitions", Proc. Of 19th ACM/IEEE Design Automation Conference, pp. 175-181, 1982. [2] S. Hauck and G. Borriello. "Logic Partition Orderings for Multi-FPGA Systems", Proc. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, pp. 32-38, February, 1995. [3] B. W. Kernighan and S. Lin, "An Effecient Heuristic Procedure for Partitioning of Electrical Circuits", Bell Systems Technical Journal, Vol. 49, No. 2, pp. 291-307, February 1970. [4] B. Levine, S. Natarajan, C. Tan, D. Newport and D. Bouldin, "Mapping of an Automated Target Recognition Application from a Graphical Software Environment to FPGA-based Reconfigurable Hardware", Proceedings of the 1999 IEEE Symposium on Field Programmable Custom Computing Machines, April 1999. [5] J.R. Rasure and C.S. Williams, "An Integrated Data Flow Visual Language and Software Development Environment", Visual Languages and Computing, vol. 2, pp. 217-246, 1991. [6] J.R. Rasure and S. Kubica, "The Khoros Application DevelopmentEnvironment", Khoros Research Inc., ALbuquerque, New Mexico, http://www.khoral.com/.