% Index Number: 1 % Abstract % The processor reconfiguration through instruction-set %metamorphosis (PRISM) general-purpose architecture, which speeds up %computationally intensive tasks by augmenting the core processor's %functionality with new operations, is described. The PRISM approach %adapts the configuration and fundamental operations of a core %processing system to the computationally intensive portions of a %targeted application. PRISM-1, an initial prototype system, is %described, and experimental results that demonstrate the benefits of %the PRISM concept are presented. @article {anthanas-silverman:93, author = {P. M. Anthanas and H. F. Silverman}, title = {Processor Reconfiguration through Instruction-set Metamorphosis}, journal = {Computer}, volume = {26}, number = {3}, pages = {11-18}, year = {1993}, month = {March} } %------------------------------------------------------------------ % Index Number: 2 % Abstract % Existing FPGA architectures can be classified along two %dimensions: reprogrammable vs. one-time programmable and general- %purpose vs. domain specific. The most challenging class of FPGA %architectures to design is the reprogrammable, general-purpose %FPGA, of which Xilinx is the most well-known example. In this %paper we describe Triptych, a new FPGA architecture that addresses %two problems of current reprogrammable FPGAs: the large delays %incurred in composing large functions and the strict division %between routing and logic resources. Our studies indicate that %Triptych is more area-efficient than current architectures and has %comparable delay characteristics for a large range of circuits %that include both data-path elements and control logic. %----------------------------------------------------------------- @inproceedings{ebeling-boriello:91, author = {C. Ebeling and G. Borriello and S. A. Hauck and D. Song and E. A. Walkup}, title = {TRIPTYCH: a New {FPGA} Architecture}, booktitle = {{FPGAs}. International Workshop on Field Programmable Logic and Applications}, year = {1991}, month = {September}, pages = {75-90} } % Index Number: 3 % Abstract % Xilinx Field Progammable Gate Arrays (FPGAs) are used to %implement reconfigurable special purpose computing hardware for %computationally intensive many-body problems in physics and %mathematics. The inexpensive PC-based design environment used for %this work is described, and the performance for several different %problems of the resulting reconfigurable hardware is compared with %that of some general purpose computers. The merits of using FPGAs %in special purpose computational hardware are outlined. %----------------------------------------------------------------- @article {monaghan-noakes:92, author = {S. Monaghan and P. D. Noakes}, title = {Reconfigurable Special Purpose Hardware for Scientific Computation and Simulation}, journal = {Computing & Control Engineering Journal}, year = {1992}, month = {September}, pages = {225-234} } % Index Number: 4 % Abstract % This paper describes a new VLSI architecture-Configurable %Array Logic (CAL) which, at its lowest level, can be programmed %electrically to implement any circuit composed of logic gates. At %higher levels the technology provides a medium for the direct %implementation of algorithms. It particularly addresses systolic %and cellular automaton algorithms where the basic computational %elements perform computations unsuited to conventional processors. %----------------------------------------------------------------- @article {kean-gray:90, author = {T. Kean and J. Gray}, title = {Configurable Hardware: Two Case Studies of Micro-Grain Computation}, journal = {Journal of VLSI Signal Processing}, volume = {2}, number = {1}, year = {1990}, month = {September}, pages = {9-16} } % Index Number: 5 % Abstract % Field-Programmable Gate Arrays (FPGAs) are now a recognized %technology for the implementation of digital systems, but they %suffer from reduced speed and logic density compared to Mask- %Programmed Gate Arrays. Many studies have been performed %concerning the effect of an FPGA's architecture on its speed and %density. In this paper we describe how these studies are used in %an actual implementation of a high-performance FPGA. % The architecture of the FPGA logic block was determined %through an experimental process using custom-built CAD tools. The %result is a logic block that is an asymmetric tree of four-input %lookup tables that are hardwired together. A segmented routing %architecture, also tuned using experiments, is employed to improve %the speed of the interconnect. % To address the problems of full-custom design, a novel layout %style for FPGAs is proposed. It can be likened to the technique %used in PLAs, in which a 'mini-tile' contains a portion of most %components in the logic tile. The mini-tile is optimized for %layout density and speed, and placed into a 4x4 array, where it is %then customized by adding vias to obtain the desired hardwired %connections. As well as providing ease of layout, this technique %gives the capability to easily change the hardwired connections in %the logic block architecture, and the segmentation length %distribution in the routing architecture. %------------------------------------------------------------------ @inproceedings{chow-rose:93, author = {C. Chow and S. E. Seo and K. Chung and G. Paez and J. Rose}, title = {A High-Speed {FPGA} Using Programmable Mini-tiles}, booktitle = {Research on integrated systems: Proceedings of the 1993 Symposium}, editor = {G. Borriello and C. Ebeling}, year = {1993}, pages = {103-122} } % Index Number: 6 % Abstract % We present some quantitative performance measurements for the %computing power of Programmable Active Memories (PAM), as %introduced by [2]. Based on Field Programmable Gate Array (FPGA) %technology, the PAM is a universal hardware co-processor closely %coupled to a standard host computer. The PAM can speed up many %critical software applications running on the host, by executing %part of the computations through a specific hardware design. The %performance measurements presented are based on two PAM %architectures and ten specific applications, drawn from %arithmetics, algebra, geometry, physics, biology, audio and video. %Each of these PAM designs proves as fast as any reported hardware %or super-computer for the corresponding application. In cases %where we could bring some genuine algorithmic innovation into the %design process, the PAM has proved an order of magnitude faster %than any previously existing system (see [19] and [18]). %----------------------------------------------------------------- @inproceedings{bertin-roncin:93, author = {P. Bertin and D. Roncin and J. Vuillemin}, title = {Programmable Active Memories: a Performance Assessment}, booktitle = {Research on Integrated Systems: Proceedings of the 1993 Symposium}, editor = {G. Borriello and C. Ebeling}, year = {1993}, pages = {88-102} } % Index Number: 7 % Abstract % With the arrival of large Field Programmable Gate Arrays %(FPGAs) it is possible to build an entire computer using only FPGA %and memory. In this paper we share some experience from building %a highly parallel computer using this concept. Even if today's %FPGAs are of considerable size, each processor must be relatively %simple if a highly parallel computer is to be constructed from %them. Based on our experience of other parallel computers and %thorough studies of the intended applications, we think it is %possible to build very powerful and efficient computers using bit- %serial processing elements with SIMD (Single Instruction stream, %Multiple Data streams) control. % A major benefit of using FPGAs is the fact that different %architectural variations can easily be tested and evaluated on %real applications. In the primary application area, which is %artificial neural networks, the gains of extensions like bit- %serial multipliers or counters can quickly be found. A concrete %implementation of a processor array, using Xilinx FPGAs is %described in this paper. % To get efficient usage and high performance with the FPGA %circuits signal flow plays an important role. As the current %implementation of the Xilinx EDA software does not support that %design issue, the signal flow design has to be made by hand. The %processing elements are simple and regular which makes it easy to %implement them with the XACT Editor. This gives high performance, %up to 40-50 MHz. %----------------------------------------------------------------- @article{linde-nordstrom:92, author = {A. Linde and T. Nordstrom and M. Taveniku}, title = {Using FPGAs to Implement a Reconfigurable Highly Parallel Computer}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {199-210} } % Index Number: 8 % Abstract % There is a growing awareness that reconfigurable logic, in the %form of SRAM-based field-programmable gate arrays (FPGA's), is an %ideal vehicle for implementing a wide range of compute-intensive %algorithms. The CLi6000 series of SRAM FPGA's from Concurrent %Logic is especially well suited to the special needs of this area. %We describe the architecture of the CLi6000 series of SRAM-based %FPGA's, emphasizing those features that support efficient %implementation of pipelined arithmetic circuits. These features %are illustrated through a massively parallel, highly pipelined %algorithm for motion estimation, an especially compute-intensive %algorithm used in digital video compression. %----------------------------------------------------------------- @inproceedings{furtek:93, author = {F. Furtek}, title = {A Field-Programmable Gate Array for Systolic Computing}, booktitle = {Research on Integrated Systems: Proceedings of the 1993 Symposium}, editor = {G. Borriello and C. Ebeling}, year = {1993}, pages = {183-199} } % Index Number: 9 % Abstract % At present there are two main methods of implementing %algorithms: interpretation of a data stream representing a program %by an active processing unit (software) and interconnection of %active logic elements (hardware). In one case the computation %performed is dependent on data stored in memory and in the other %on the interconnection between a set of physical devices %(transistors). Both paradigms can be shown, given reasonable %definitions, to be essentially equivalent in terms of the %functions they can compute (see, for example, [Savage76]). In %this paper we will make the case for a third paradigm: %Configurable Hardware in which the interconnection between active %logic elements, and hence the function computed, is dependent on %a control store. %----------------------------------------------------------------- @inproceedings{gray-kean:89, author = {J. P. Gray and T. A. Kean}, title = {Configurable Hardware: A New Paradigm for Computation}, booktitle = {Decennial CalTech Conference on VLSI}, year = {1989}, month = {March}, pages ={277-293}, address = {Pasadena, CA} } % Index Number: 10 % Abstract % Splash 2 is an attached special purpose parallel processor in %which the computing elements are user programmable FPGA devices. %The architecture of Splash 2 is designed to accelerate the %solution of problems which exhibit at least modest amounts of %temporal or data parallelism. Applications are developed by %writing behavioral descriptions of algorithms in VHDL, which are %then iteratively refined and debugged within the Splash 2 %simulator. Once an application is determined to be functionally %correct in simulation, it is compiled to a gate list and optimized %by logic synthesis. The gate list is then mapped onto the FPGA %architecture by automatic placement and routing tools to form a %loadable FPGA object module. A C language library and a symbolic %debugger comprise the execution environment. The Splash 2 system %has been shown to be effective on a variety of applications, %including text searching, sequence analysis, and image processing. %----------------------------------------------------------------- @inproceedings{arnold:93, author = {J. M. Arnold}, title = {The Splash 2 Software Environment}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, pages ={88-93}, address = {Napa, CA} } % Index Number: 11 % Abstract % This paper proposes a flexible, reprogrammable hardware solution %to the acceleration of text-based keyword search problems. In these %problems, a stream of input text is checked against a known list of %keywords (a dictionary) for occurrences of those keywords in the %text. Our solution employs an attached processor called Splash 2, %which exploits the speed and reconfigurability of Field Programmable %Gate Array technology. The Splash 2 system was designed and built at %the SRC for a wide variety of applications. A Splash 2 system is %comprised of an interface board to a Sun Sparc-2 host and up to 16 %Splash boards, each of which contains 16 Xilinx 4010 FPGAs %interconnected in a linear array and also through a 16-way full %crossbar switch. Each Xilinx chip is coupled with a 4 Mbit static %RAM through a dedicated interface. The text searching program %implemented on a one-board Splash 2 system is capable of processing %text at an estimated rate of 50 million characters per second. %----------------------------------------------------------------- @inproceedings{pryor-thistle:93, author = {D. V. Pryor and M. R. Thistle and N. Shirazi}, title = {Text Searching On Splash 2}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {172-177} } % Index Number: 12 % Abstract % The Splash attached processor board (referred to as Splash %1) was designed and built at the SRC to provide very high performance %on a range of bit-processing problems. It proved to be highly %successful; notwithstanding the known dangers of Second System %Syndrome, a follow-on system, Splash 2, is being designed and %built. This paper describes Splash 2, compares it with Splash 1 and %to discusses both its programming and two algorithmic applications. %----------------------------------------------------------------- @inproceedings{arnold-buell:92, author = {J. M. Arnold and D. A. Buell and E. G. Davis}, title = {Splash 2}, booktitle = {Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures}, year = {1992}, month = {June}, pages = {316-324} } % Index Number: 13 % Abstract % Splash 2 is an attached special purpose parallel processor in %which the computing elements are user programmable FPGA devices. %The programming environment for Splash 2 is based upon the VHSIC %Hardware Description Language (VHDL), simulation and logic %synthesis. Application programs for Splash 2 are developed by %writing behavioral descriptions of algorithms in VHDL which are %then iteratively refined and debugged within the Splash 2 %simulator. Logic synthesis and automatic placement and routing %techniques are used to compile the VHDL applications into loadable %FPGA object modules. %------------------------------------------------------------------ @inproceedings{arnold-buell:93, author = {J. M. Arnold and D. A. Buell and E. G. Davis}, title = {{VHDL} Programming on Splash 2}, booktitle = {More {FPGAs}: Proceedings of the 1993 International Workshop on Field-Programmable Logic and Applications}, year = {1993}, month = {September}, pages = {182-191}, address = {Oxford, England} } % Index Number: 14 % Abstract % This paper introduces a new design methodology for rapid %implementation of cheap high-performance ASIC's. The method %described here derives from high-level algorithm specifications or %from high-level source programs not only the target hardware, but %(in contrast to silicon compilers) at the same time also the machine %code to run it. The new method is based on a novel sequential %machine paradigm where execution is used (being orders of %magnitude more efficient) instead of simulation and where %programmers may do the design job, rather than real hardware %designers. The paper illustrates that for a very large class of %commercially important algorithms (DSP, graphics, image processing %and many others) this paradigm is orders of magnitude more %efficient than the von Neumann paradigm. Compared to von-Neumann- %based implementations, acceleration factors of up to more than %2000 have been obtained experimentally. The performance of ASIC's %obtained by this new methodology is mostly competitive with ASIC %designs obtained in the much slower and much more expensive %"traditional" way. As a byproduct the new methodology also %supports the automatic generation of universal accelerators for %coprocessor use in workstations, etc., such as, e.g., to %accelerate EDA tools. It is the goal of this paper to explain the %highly efficient application of the xputer paradigm, rather than %to introduce its hardware implementation. It is the goal of this %paper to illustrate the innovative power of this paradigm, and its %potential as a major step in progress toward systematically %deriving ASIC designs from algorithm specifications. %----------------------------------------------------------------- @article {hartenstein-hirschbiel:91, author = {R. W. Hartenstein and A. G. Hirschbiel and M. Riedmuller and K. Schmidt and M. Weber}, title = {A Novel ASIC Design Approach Based on a New Machine Paradigm}, journal = {IEEE Journal of Solid-State Circuits}, volume = {26}, number = {7}, year = {1991}, month = {July}, pages = {975-989} } % Index Number: 15 % Abstract % Current MIMD machines are used for coarse grain-parallelism and %also offer message passing mechanisms to deal with inter-processor %communications. But these mechanisms lack efficiency in fine- %grain parallel applications such as systolic computation. This %article presents the use of an FPGA chip to set up a fast systolic %communication agent on a linear asynchronous network of TRANSPUTER %processors; the machine is called ARMEN. %------------------------------------------------------------------ @inproceedings{raimbault-lavenier:93, author = {F. Raimbault and D. Lavenier and S. Rubini and B. Pottier}, title = {Fine Grain Parallelism on a MIMD Machine Using {FPGA}s}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {2-8} } % Index Number: 16 % Abstract % A processor with multiple reconfigurable execution units has %been designed and implemented. The reconfigurable execution units %are implemented using reprogrammable field programmable gate array %(FPGA) chips. The architecture and implementation of this %processor are described in detail in this paper. An example shows %that this reconfigurable processor is able to compute the new %state of 100'000'000 cells of Conway's game of life per second %with a clock speed of 6.25 MHz. %----------------------------------------------------------------- @inproceedings{iseli-sanchez:93, author = {C. Iseli and E. Sanchez}, title = {Spyder: A Reconfigurable VLIW Processor using {FPGA}s}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {17-24} } % Index Number: 17 % Abstract % Highly concurrent systems occur frequently in the physical %world. They include weather systems, traffic systems, %electrocardiac systems and integrated circuits. To better %understand such systems requires that they be rigorously described %and then simulated. How do we best perform this description? %Since such systems are inherently concurrent and do not fit well %onto sequential von Neumann architectures, what type of machine %should be used to simulate them? % This paper focuses on a class of systems characterised as %being highly concurrent and which are composed out of many simple %parts which interact with other parts in their locality. It %discusses how to describe these systems and introduces a cellular %automata type of architecture which is used to simulate these %systems directly in hardware, with physical concurrency being %realised by true hardware concurrency. The architecture of the %SPACE machine (Scalable Parallel Architecture for Concurrency %Experiments), which is constructed from reconfigurable FPGA logic %is introduced and it is demonstrated how to simulate road traffic %systems using it. %----------------------------------------------------------------- @inproceedings{milne-cockshott:93, author = {G. Milne and P. Cockshott and G. McCaskill and P. Barrie}, title = {Realising Massively Concurrent Systems on the SPACE Machine}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {26-32} } % Index Number: 18 % Abstract % Virtual hardware is a technique to realize a large digital %circuit with a small real hardware by using an extended Field %Programmable Gate Array (FPGA) technology. Several configuration %RAM modules are provided inside the FPGA chip, and the %configuration of the gate array can be rapidly changed by %replacing the active module. Data for configuration are %transferred from an off-chip backup RAM to an un-used %configuration RAM module. % A novel computation mechanism called the WASMII, which %executes a target dataflow graph directly, is proposed on the %basis of the virtual hardware. A WASMII chip consists of the FPGA %for virtual hardware and the additional mechanism to replace %configuration RAM modules in the data driven manner. %Configuration data are preloaded by the order which is assigned in %advance with a static scheduling preprocessor. By connecting a %number of WASMII chips, a highly parallel system can be easily %constructed. %----------------------------------------------------------------- @inproceedings{ling-amano:93, author = {X. P. Ling and H. Amano}, title = {{WASMII}: a Data Driven Computer on a Virtual Hardware}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {33-42} } % Index Number: 19 % Abstract % Virtual Computing is an entirely new form of supercomputing %that allows an algorithm to be implemented in hardware. Based on %the Xilinx FPGA[1] and ICubes FPID[2] the Virtual Computer is %completely reconfigurable in every respect. Computing machines %based on reconfigurable logic are hyper-scalable meaning they %scale up better than 1-1. %----------------------------------------------------------------- @inproceedings{casselman:93, author = {S. Casselman}, title = {Virtual Computing and The Virtual Computer}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {43-48} } % Index Number: 20 % Abstract % Recent developments in the design and fabrication of field %programmable logic devices (FPGA's) may well change the way in which %we design and fabricate conventional microprocessors. The use of %uncommitted logic whose function may be modified at run time makes %the prospect of dynamic application specific integrated circuits %closer to reality than ever before. % Much of the work to date on reconfigurable logic has focussed %on its application in co-processor and "glue" roles. This paper %discusses how complete processors might be fabricated with a %minimum of "fixed" or static logic. It is shown that in order to %exploit FPGAs, a processor that is radically different from %conventional architectures is required. The paper concludes by %considering what evolutions of current logic families would favour %this type of application. %----------------------------------------------------------------- @inproceedings{french-taylor:93, author = {P. C. French and R. W. Taylor}, title = {A Self-Reconfiguring Processor}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {50-59} } % Index Number: 21 % Abstract % This paper describes a special purpose application accelerator %using field programmable gate arrays to accelerate a range of %applications. The accelerator is designed to support applications %by allowing the user to implement a processor with an instruction %set designed for the specific application being accelerated, using %specialized instructions to implement critical fragments of the %application. A compiled-code software organization is used to %reduce overhead operations. A prototype has been built, and the %first application to be ported to it, logic simulation, is %underway. %------------------------------------------------------------------ @inproceedings{lewis-vanierssel:93, author = {D. M. Lewis and M. H. van Ierssel and D. H. Wong}, title = {A Field Programmable Accelerator for Compiled-Code Applications}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {60-67} } % Index Number: 22 % Abstract % Recently, several machines have been built using Field %Programmable Gate Array (FPGA) technology. These reconfigurable %architectures have demonstrated very high performance for a %variety of problems. The configuration of these machines %typically rely on some form of hardware specification. In this %paper we demonstrate that a more traditional software approach may %be used. A vector based data-parallel model and its mapping to a %reconfigurable architecture are introduced. Included in the model %are parallel prefix or scan operators. The language supporting %this model is a subset of the C programming language. %----------------------------------------------------------------- @inproceedings{guccione-gonzales:93, author = {S. A. Guccione and M. J. Gonzalez}, title = {A Data-Parallel Programming Model for Reconfigurable Architectures}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {79-87} } % Index Number: 23 % Abstract % A PC-AT hosted DSP processor architecture implemented in SRAM- %based field programmable gate arrays (FPGA) and static memories is %described. Despite its simplicity, the processor circuits can be %reconfigured under software control to tackle a class of multi-bit %'pixel' processing problems of current interest in the statistical %physics of disordered materials, thereby offering some of the %problem flexibility of a general purpose processor and the %performance of custom hardware. The flexibility offered by the %FPGA implementation is discussed in detail as is a particular %application of the processor (to disordered superconductors). The %performance of the processor is shown to compare well with %similarly costing commercial DSP hardware. The low cost of the %processor means it can be replicated to obtain dedicated %supercomputer performance. %------------------------------------------------------------------ @inproceedings{monaghan-cowen:93, author = {S. Monaghan and C. P. Cowen}, title = {Reconfigurable Multi-Bit Processor for DSP Applications in Statistical Physics}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {103-110} } % Index Number: 24 % Abstract % This paper describes the CM-2X prototype. This one-of-a-kind %machine is the result of a Supercomputing Research Center/Thinking %Machines Corporation joint effort to examine the suitability of a %hybrid combination of CM-2 architecture and Xilinx programmable %gate array technology. In addition to a description of the CM-2X %and Xilinx architecture, a simple applications example is provided %that illustrates many of the issues involved in programming the %machine. %------------------------------------------------------------------ @inproceedings{cuccaro-reese:93, author = {S. A. Cuccaro and C. F. Reese}, title = {The {CM-2X}: A Hybrid CM-2/Xilinx Prototype}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {121-130} } % Index Number: 25 % Abstract % M is a highly parallel asynchronous computer for the analysis %and control of complex systems. A complex system is a system with %many interacting components. Examples of complex systems include %applications in molecular biology, economics, and signal %processing. M asynchronous computations reproduce the structural %dynamics of a system using high fidelity behavioral modeling. %Programs are composed of an application model, an environment %model, and a distributed subsumption operation system. Processes %are implemented using position independent instructions that %operate in parallel on strings of binary data. All M FPGA fine %grained parallel processing nodes are double buffered, %asynchronous, and highly pipelined. The fiber system memory is %optically multiplexed, and asynchronous. The technology will %extend new gigabit ATM optical networks with integrated high %performance computing services. %------------------------------------------------------------------ @inproceedings{wood:93, author = {L. F. Wood}, title = {High Performance Analysis and Control of Complex Systems Using Dynamically Reconfigurable Silicon and Optical Fiber Memory}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {132-141} } % Index Number: 26 % Abstract % Existing FPGA-based logic emulators suffer from limited inter- %chip communication bandwidth, resulting in low gate utilization %(10-20 percent). This resource imbalance increases the number of %chips needed to emulate a particular logic design and thereby %decreases emulation speed, since signals must cross more chip %boundaries. Current emulators only use a fraction of potential %communication bandwidth because they dedicate each FPGA pin %(physical wire) to a single emulated signal (logical wire). These %logical wires are not active simultaneously and are only switched %at emulation clock speeds. % Virtual wires overcome pin limitations by intelligently %multiplexing each physical wire among multiple logical wires and %pipelining these connections at the maximum clocking frequency of %the FPGA. A virtual wire represents a connection from a logical %output on one FPGA to a logical input on another FPGA. Virtual %wires not only increase usable bandwidth, but also relax the %absolute limits imposed on gate utilization. The resulting %improvement in bandwidth reduces the need for global %interconnect, allowing effective use of low dimension inter-chip %topologies, coupledf with the ability of virtual wires to overlap %communication with computation, can even improve emulation speeds. %We present the concept of virtual wires and describe our first %implementation, a "softwire" compiler which utilizes static %routing and relies on minimal hardware support. Results from %compiling netlists for the 18K gate Sparcle microprocessor and the %86K gate Alewife Communications and Cache Controller indicate that %virtual wires can increase FPGA gate utilization beyond 80 percent %without a significant slowdown in emulation speed. %------------------------------------------------------------------ @inproceedings{babb-tessier:93, author = {J. Babb and R. Tessier and A. Agarwal}, title = {Virtual Wires: Overcoming Pin Limitations in {FPGA}-based Logic Emulators}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {142-151} } % Index Number: 27 % Abstract % FPGAs which are configured by static RAM can be rapidly %changed from one logic configuration to another. This raises the %possibility of configuring the logic to implement a function for %a specific set of values, i.e. folding the inputs into the logic %design. The paper discusses data folding with respect to %Algotronix FPGAs, presenting a text searching circuit as an %example. This folded circuit saves at least half the logic over %a conventional circuit, and very much more if data folding is taken as %far as possible. It also presents performance figures for the folded %circuit, and discusses other applications, and suggests features which %are desirable if data folding is to be practicable, most of which are %possessed by the Algotronix CAL array. %----------------------------------------------------------------- @inproceedings{foulk:93, author = {P. W. Foulk}, title = {Data-folding in SRAM configurable {FPGA}s}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {163-171} } % Index Number: 28 % Abstract % In this paper, we describe two systolic arrays for computing %the edit distance between two genetic sequences using a well-known %dynamic programming algorithm. The systolic arrays have been %implemented for the Splash 2 programmable logic array, and are %intended to be used for database searching. Simulations indicate %that the faster Splash 2 implementation can search a database at %a rate of 12 million characters per second, several orders of %magnitude faster than implementations of the dynamic programming %algorithm on conventional computers. %------------------------------------------------------------------ @inproceedings{hoang:93, author = {D. T. Hoang}, title = {Searching Genetic Databases on Splash 2}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {185-191} } % Index Number: 29 % Abstract % We describe a method for speeding up divide-and-conquer %algorithms with a hardware coprocessor, using sorting as an %example. The method employs a conventional processor for the %"divide" and "merge" phases, while the "conquer" phase is handled %by a purpose-built coprocessor. It is shown how transformation %techniques from the Ruby language can be adopted in developing a %family of systolic sorters, and how one of the resulting designs %is prototyped in eight FPGAs on a PC coprocessor board known %as CHS2x4 from Algotronix. The execution of the hardware %unit is embedded in a sorting program, with the PC host merging %the sorted sequences from the hardware sorter. The performance of %this implementation is compared against various sorting algorithms %on a number of PC systems. %------------------------------------------------------------------ @inproceedings{luk-lok:93, author = {W. Luk and V. Lok and I. Page}, title = {Hardware Acceleration of Divide-and-Conquer Paradigms: a Case Study}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {192-201} } % Index Number: 30 % Abstract % In this paper we present an expandable digital architecture %that provides an efficient real time implementation platform for %large neural networks. The architecture makes heavy use of the %techniques of bit serial stochastic computing to carry out the %large number of required parallel synaptic calculations. In this %design all real valued quantities are encoded on to stochastic bit %streams in which the '1' density is proportional to the given %quantity. The actual digital circuitry is simple and highly %regular thus allowing very efficient space usage of fine grained %FPGAs. % Another feature of the design is that the large number of %weights required by a neural network are generated by circuitry %tailored to each of their specific values, thus saving valuable %cells. Whenever one of these values is required to change, the %appropriate circuitry must be dynamically reconfigured. This may %always be achieved in a fixed and minimum number of cells for a %given bit stream resolution. %----------------------------------------------------------------- @inproceedings{daalen-jeavons:93, author = {M. van Daalen and P. Jeavons and J. Shawe-Taylor}, title = {A Stochastic Neural Architecture that Exploits Dynamically Reconfigurable {FPGA}s}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {202-211} } % Index Number: 31 % Abstract % Reprogrammable Field-Programmable Gate Arrays (FPGAs) have %enabled the realization of high performance and affordable %reconfigurable computing engines. We examine the architectural %tradeoffs involved in designing general purpose FPGA-based %computing systems with field-programmable gate arrays and field- %programmable interconnects. The fact that FPGAs provide both %programmable logic and programmable interconnects raises numerous %design issues that need to be considered with care. Factors that %influence the tradeoffs are routability, rearrangeability, and %speed. %------------------------------------------------------------------ @inproceedings{chan-schlag:93, author = {P. K. Chan and M. D. F. Schlag}, title = {Architectural Tradeoffs in Field-Programmable-Device-Based Computing Systems}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, CA}, pages = {152-161} } % Index Number: 32 % Abstract % We describe a compiler which maps programs expressed in a %subset of Occam into netlist descriptions of parallel hardware. %Using Field-Programmable Gate Arrays to implement such netlists, %problem-specific hardware can be generated entirely by a software %process. Inner loops of time-consuming programs can be %implemented as hardware and the less intensively-used parts of the %program can be mapped into machine code by a conventional %compiler. Software investment is protected since the same program %can run entirely in software, entirely in hardware, or in a %mixture of both. A single program can thus result in many %implementations across a potentially wide cost-performance range. %The compilation system has been used to generate inner-loops, %hardware interfaces to real-world devices, systolic arrays, and %complete microprocessors. In the near future we hope to have a %proven version of the compiler, enabling us automatically to %generate provably correct hardware implementations, including %microprocessors, from higher-level specifications. %------------------------------------------------------------------ @inproceedings{page-luk:91, author = {I. Page and W. Luk}, title = {Compiling Occam into {FPGA}s}, booktitle = {FPGAs. International Workshop on Field Programmable Logic and Applications}, year = {1991}, month = {September}, address = {Oxford, UK}, editors = {W. R. Moore and W. Luk}, pages = {271-283} } % Index Number: 33 % Abstract % ArMen is a parallel machine in which each node is coupled to %an FPGA ring. The underlying idea is to complement an MIMD %architecture with global coprocessors providing extra control and %processing properties. The use of regular hardware patterns such as %cellular automata or pipelines allows high level definitions of the %coprocessors. The results are fast prototyping possibilities for %specific applications such as image processing or industrial control. %Basic realizations are described. Changing from an FPGA technology %to a VLSI one provider benefits with respect to cost and performance, %without any effort at the specification level. The MADMACS pattern %generator can be used to fold several FPGA configurations into the %same VLSI circuit. %------------------------------------------------------------------ @inproceedings{champeau-pape:94, author = {J. Champeau and L. Le Pape and B. Pottier and S. Rubini and E. Gautrin and L. Perraudeau}, title = {Flexible Parallel {FPGA}-Based Architectures with ArMen}, booktitle = {Proceedings of the 27th Hawaii International Conference on System Sciences}, year = {1994}, month = {January}, address = {Wailea, HI}, editor = {T. N. Mudge and B. D. Shriver}, pages = {105-113} } % Index Number: 34 % Abstract % This paper presents a methodology and a design environment %to support validation and design space exploration for embedded %systems including application specific digital signal processors %prototyping. Our approach to heterogeneous system design is based on %rapid prototyping integrated with a set of graphical design entry, %synthesis, and analysis tools. System partitioning into a set of %software and hardware modules is done at system description level. %User guided and automated synthesis tools generate a fully functional %prototype that can be connected to real world processes to verify %system design and to estimate system performance. %------------------------------------------------------------------ @inproceedings{herpel-held:94, author = {H. J. Herpel and M. Held and M. Glesner}, title = {A Design Methodology for the Conceptual Design of Application Specific Digital Processors in Mechatronic Systems}, booktitle = {Proceedings of the 27th Hawaii International Conference on System Sciences}, year = {1994}, month = {January}, address = {Wailea, HI}, editor = {T. N. Mudge and B. D. Shriver}, pages = {78-86} } % Index Number: 35 % Abstract % New field-programmable gate array (FPGA) technologies have %increased the industrial interest in tools which map a DSP application %and a set of performance constraints to a specific VLSI %architecture. This paper presents an optimization methodology for %mapping a DSP application and a set of performance constraints into an %architecture targeted for FPGA technologies with user-programmable RAM %blocks on chip. The target architecture supports multiple register %files, multiple busses, complex types of functional units, and %multichip implementation. The optimization methodology presented in %this paper maps DSP applications to optimized register file %architectures suitable for FPGAs using a number of different integer %programming models. A new integer programming model is presented and %used to minimize the number of busses required in the %application-specific architectures. Results show that the optimization %methodology provides architectures with 22% fewer bus connections than %previous research in practical cpu times. For the first time this %research provides industry with 1) a high level design optimization %methodology that synthesizes application-specific DSP architectures %for implementation in new field programmable VLSI technologies, and 2) %a methodology to support fast prototyping of DSP applications using %multiple FPGA chips. %------------------------------------------------------------------ @inproceedings{gebotys-gebotys:94, author = {C. H. Gebotys and R. J. Gebotys}, title = {Application-Specific Architectures for Field-Programmable VLSI Technologies}, booktitle = {Proceedings of the 27th Hawaii International Conference on System Sciences}, year = {1994}, month = {January}, address = {Wailea, HI}, editor = {T. N. Mudge and B. D. Shriver}, pages = {124-130} } % Index Number: 36 % Abstract % Run-time reconfiguration is a way of more fully exploiting the %flexibility of reconfigurable FPGAs. The Run-Time Reconfiguration %Artificial Neural network (RRANN) uses run-time reconfiguration to %increase the hardware density of FPGAs. The RRANN architecture %also allows large amounts of parallelism to be used and is very %scalable. RRANN divides the back-stages and configures the FPGAs %to execute only one stage at a time. The FPGAs are reconfigured %as part of normal execution in order to change stages. % Using reconfigurability in this way increases the number of %hardware neurons a single Xilinx XC3090 can implement by 500%. %Performance is effected by reconfiguration overhead, but this %overhead becomes insignificant in large networks. This overhead %is made even more insignificant with improved configuration %methods. Run-time reconfiguration is a flexible realization of %the time/space trade-off. The RRANN architecture has been %designed and built using commercially available hardware, and its %performance has been measured. %------------------------------------------------------------------ @inproceedings{eldredge-hutchings:94, author = {J. G. Eldredge and B. L. Hutchings}, title = {Density Enhancement of a Neural Network Using FPGAs and Run-Time Reconfiguration}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1994}, month = {April}, address = {Napa, CA}, pages = {180-188} } % Index Number: 37 % Abstract % Reconfigurable Field-Programmable Gate Arrays (FPGAs) provide %an effective programmable resource for implementing hardware-based %Artificial Neural Networks (ANNs). They are low cost, readily %available and reconfigurable--all important advantages for ANN %applications. However, FPGAs lack the circuit density necessary %to implement large parallel ANNs with many thousands of synapses. %This paper presents an architecture that makes it feasible to %implement large ANNs with FPGAs. The architecture combines %stochastic computation techniques with a novel lookup-table-based %architecture that fully exploits the lookup-table structure of %many FPGAs. This lookup-table-based architecture is extremely %efficient: it is capable of supporting up to two synapses per %Configurable Logic Block (CLB). In addition, the architecture is %simple to implement, self-contained (weights are stored directly %in the synapse), and scales easily across multiple chips. %------------------------------------------------------------------ @inproceedings{bade-hutchings:94, author = {S. Bade and B. L. Hutchings}, title = {{FPGA}-Based Stochastic Neural Networks: Implementation}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1994}, month = {April}, address = {Napa, CA}, pages = {189-198} } % Index Number: 38 % Abstract % Reconfigurable logic systems approach the performance of %Application-Specific integrated Circuits (ASICs) while retaining %much of the generality of conventional computing systems through %reconfiguration. Unfortunately, the development of these systems, %unlike conventional software systems, is hardware intensive, %requiring significant hardware development time. One way to %introduce a more flexible development approach is to implement a %customizable stored-program processor. For a given application, %the designer can develop customized hardware to increase %performance and then control the sequencing and operation of this %hardware with software. Development time can be significantly %reduced because conventional software development tools, e.g., %assemblers and compilers, can be used to quickly develop new %applications on the customized processor. This paper presents the %Nano Processor (nP), a fully customizable reconfigurable %processor, together with its integrated assembler, that has been %successfully implemented on the Xilinx 3000 series Field %Programmable Gate Arrays (FPGA). %------------------------------------------------------------------ @inproceedings{wirthlin-gilson:94, author = {M. K. Wirthlin and K. Gilson and B. L. Hutchings}, title = {The Nano Processor: A Low Resource Reconfigurable Processor}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1994}, month = {April}, address = {Napa, CA}, pages = {23-30} } % Index Number: 39 %NOT IN INSPEC @inproceedings{eldredge-hutchings:94b, author = {J. G. Eldredge and B. L. Hutchings}, title = {{RRANN}: The Run-Time Reconfiguration Artificial Neural Network}, booktitle = {Custom Integrated Circuits Conference}, year = {1994}, month = {May}, address = {San Diego, CA}, pages = {77-80} } % Index Number: 40 %NOT IN INSPEC @inproceedings{eldredge-hutchings:94c, author = {J. G. Eldredge and B. L. Hutchings}, title = {{RRANN}: A Hardware Implementation of the Backpropagation Using Reconfigurable {FPGA}s}, booktitle = {IEEE World Conference on Computational Intelligence}, year = {1994}, month = {June}, address = {Orlando, FA} } % Index Number: 41 @inbook{weste-eshraghian:93, author = {N. Weste and K. Eshraghian}, title = {Priniciples of {CMOS} {VLSI} Design: a Systems Perspective}, edition = {2nd}, publisher = {Addison-Wesley Publishing Co.}, pages = {399-403}, year = {1993} } % Index Number: 42 @inbook{brown-francis:92, author = {S. D. Brown and R. J. Francis and J. Rose and Z. Vranesic}, title = {Field-Programmable Gate Arrays}, publisher = {Kluwer Academic Publishers}, chapter = {1}, year = {1992} } % Index Number: 43 % Abstract % The architecture, implementation, and application of %GANGLION, a totally digital connectionist classifier, are %described. This fully interconnected feedforward net with one hidden %layer is capable of generating 4.48 billion interconnection/s. The %architecture is realized on a single 9U VME card and is built entirely %from off-the-shelf components. The very high throughput of 20 million %decision/s is achieved by making efficient use of field- programmable %gate arrays. Specifically, the authors take advantage of the %reprogrammability of the devices to automatically generate new custom %hardware for each application of the classifier. %------------------------------------------------------------------ @article {cox-blanz:92, author = {C. E. Cox and W. E. Blanz}, title = {GANGLION - a fast field-programmable gate array implementation of a connectionist classifier}, journal = {IEEE Journal of Solid-State Circuits}, volume = {27}, number = {3}, pages = {288-299}, year = {1992}, month = {March} } % Index Number: 44 % Abstract % A flexible neural network architecture that permits the %realization of networks of arbitrary topologies and dimensions is %presented. Furthermore, the performance of this architecture is %essentially independent of the size of the network, and permits %processing of typically 100,000 patterns per second. The key is %the representation of neuron activations and synaptic weights as %stochastic functions of time, leading to efficient implementations %of the synapses and the integrating neurons. High densities of %synapses per silicon area, exceeding even analog implementations, %have been achieved. Because the neuron activations are %represented digitally, as are the synaptic computations, DNNA can %be fabricated using a variety of standard, low-cost semiconductor %processes. We present a pair of general purpose chips, that %permit post facto construction of neural networks of arbitrary %topology and virtually unlimited dimensions. Finally, we derive %the activation function for the network, and demonstrate its %suitability for implementing a popular back-propagation learning %algorithm. %------------------------------------------------------------------ @inproceedings{tomlinson-sivilotti:91, author = {M. S. Tomlinson and M. A. Sivilotti}, title = {A high-performance scalable digital neural network architecture for {VLSI}}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, pages = {262-273}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin} } % Index Number: 45 @inproceedings{guccione-gonzales:93b, author = {S. A. Guccione and M. J. Gonzalez}, title = {A neural network implementation using reconfigurable architectures}, booktitle = {More {FPGAs}: Proceedings of the 1993 International workshop on field-programmable logic and applications}, year = {1993}, %date of publication: 1994 month = {September}, address = {Oxford, England}, pages = {443-451}, editor = {W. Moore and W. Luk} } % Index Number: 46 %NOT IN INSPEC @inproceedings{erdogan-hong:93, author = {S. S. Erdogan and T. H. Hong}, title = {Massively Parallel back-propagation algorithm using the reconfigurable machine}, booktitle = {World Congress on Neural Networks `93}, year = {1993}, address = {Portland, Oregon}, pages = {4:861-864} } % Index Number: 47 %NOT IN INSPEC @mastersthesis{ferrucci:94, author = {A. T. Ferrucci}, title = {A field-programmable gate array implementation of a self-adapting and scalable connectionist network}, school = {University of California, Santa Cruz}, address = {Santa Cruz, California}, month = {January}, year = {1994} } % Index Number: 48 % Abstract % This paper discusses the architecture and compiler for a %general-purpose metamorphic computing platform called PRISM-II. %PRISM-II improves the performance of many computationally- %intensive tasks by augmenting the functionality of the core %processor with new instructions that match the characteristics of %targeted applications. In essence, PRISM is a general purpose %hardware platform that behaves like an application-specific %platform. Two methods for hardware synthesis, one using the VHDL %Designer and the other using X-BLOX, are presented and synthesis %results are compared. %------------------------------------------------------------------ @inproceedings{wazlowski-agarwal:93, author = {M. Wazlowski and L. Agarwal and T. Lee and A. Smith and E. Lam and P. Athanas and H. Silverman and S. Ghosh}, title = {{PRISM-II} Compiler and Architecture}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1993}, month = {April}, address = {Napa, California}, pages = {9-16} } % Index Number: 49 % Abstract % PRISM, a computer architecture consisting of a general-purpose %core processor and a reconfigurable FPGA platform, was designed to %bridge the gap between general-purpose and specialized computers. %The proof-of-concept system, PRISM-I, suffers from several %limitations, principal among them being: single bus-cycle %restriction on the evaluation time of the function synthesized on %an FPGA, inability to execute loops with dynamic loop-counts, and %inefficient execution of control constructs such as "if-then- %else". This paper presents a novel execution model in PRISM-II, %that addresses the above limitations in a general manner. Also %presented is a new framework for translating a C function into an %FPGA-based custom architecture. %------------------------------------------------------------------ @inproceedings{agarwal-wazlowski:94, author = {L. Agarwal and M. Wazlowski and S. Ghosh}, title = {An asynchronous approach to efficient execution of programs on adaptive architectures utilizing {FPGA}s}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1994}, month = {April}, address = {Napa, California}, pages = {101-110} } % Index Number: 50 % Abstract % This paper describes the implementation of a parallel %algorithm for sequence comparison on the SPLASH programmable logic %array. The algorithm, originally developed for a custom VLSI %chip, has applications in molecular genetics and runs faster on %SPLASH than it does on supercomputers. I discuss details of the %problem and its systolic solution, the SPLASH architecture and %design environment, and the implementations currently running on %SPLASH. %------------------------------------------------------------------ @inproceedings{lopresti:91, author = {D. P. Lopresti}, title = {Rapid implementation of a genetic sequence comparator using field-programmable gate arrays}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, pages = {138-152}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin} } % Index Number: 51 @inproceedings{fawcett:93, author = {B. K. Fawcett}, title = {Applications of Reconfigurable Logic}, booktitle = {More {FPGAs}: Proceedings of the 1993 International workshop on field-programmable logic and applications}, year = {1993}, %year of publication: 1994 month = {September}, address = {Oxford, England}, pages = {57-69}, editor = {W. Moore and W. Luk} } % Index Number: 52 @proceedings{fpgas:91, title = {{FPGAs}: Proceedings of the 1991 International workshop on field-programmable logic and applications}, editor = {W. Moore and W. Luk}, publisher = {Abingdon EE and CS Books}, address = {Oxford, England}, Month = {September}, year = {1991} } % Index Number: 53 @proceedings{fpgas:93, title = {More {FPGAs}: Proceedings of the 1993 International workshop on field-programmable logic and applications}, editor = {W. Moore and W. Luk}, publisher = {Abingdon EE and CS Books}, address = {Oxford, England}, Month = {September}, year = {1993} %year of publication: 1994 } % Index Number: 54 @proceedings{fpgas:92, title = {{FPGAs}: Proceedings of the 1992 International workshop on field-programmable logic and applications}, editor = {H. Grunbacher and R. Hartenstein}, publisher = {Spinger-Verlag}, address = {Vienna, Austria}, Month = {September}, year = {1992} } % Index Number: 55 @proceedings{fccm:93, title = {Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines}, year = {1993}, month = {April}, address = {Napa, CA}, editor = {D. A. Buell and K. L. Pocek} } % Index Number: 56 @proceedings{fccm:94, title = {Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines}, year = {1994}, month = {April}, address = {Napa, CA}, editor = {D. A. Buell and K. L. Pocek} } % Index Number: 57 @inproceedings{lysaght-dunlop:93, author = {P. Lysaght and J. Dunlop}, title = {Dynamic Reconfiguration of {FPGA}s}, booktitle = {More {FPGAs}: Proceedings of the 1993 International workshop on field-programmable logic and applications}, year = {1993}, %year of publication:1994 month = {September}, address = {Oxford, England}, pages = {82-94}, editor = {W. Moore and W. Luk} } % Index Number: 58 % Abstract % During the past decade the microprocessor has become a key %commodity component for building all kinds of computational %systems. During this time frame large, reconfigurable logic %arrays have exploited the same advances in IC fabrication %technology to emerge as viable system building blocks. Looking at %both the technology prospects and application requirements, there %is compelling evidence that microprocessors with integrated %reconfigurable logic arrays will be a primary building block for %future computing systems. In this paper, we look at the role such %components can play in building high-performance and economical %systems, as well as the ripe technological outlook. We note how %the tight integration of reconfigurable logic into the processor %can overcome some of the major limitations of contemporary, %attached reconfigurable compute engines. We specifically consider %the use of integrated Dynamically Programmable Gate Array %structures for the configurable logic and examine the advantages %rapid reconfiguration provides in this application. %------------------------------------------------------------------ @inproceedings{dehon:94, author = {A. DeHon}, title = {{DPGA}-Coupled Microprocessors: Commodity {IC}s for the Early 21st Century}, booktitle = {Proceedings of IEEE Workshop on {FPGA}s for Custom Computing Machines}, editor = {D. A. Buell and K. L. Pocek}, year = {1994}, month = {April}, address = {Napa, CA}, pages = {31-39} } % Index Number: 59 %NOT IN INSPEC @article{rumelhart-hinton:86, author = {D. E. Rumelhart and G. E. Hinton and R. J. Williams}, title = {Learning internal representations by error propagation}, journal = {Parallel and Distributed Processing}, volume = {1}, chapter = {8}, pages = {318-362}, publisher = {Cambridge MIT Press}, year = {1986}, editor = {D. Rumelhart and J. McClelland} } % Index Number: 60 % Abstract % An implementation of error backpropagation on a massively %parallel cellular array processor, AAP-2, is described. Parallel %operations of a neural network simulation on the AAP-2 are described, %including: allocation of the processors, computing the activation %value with a stable-lookup method, and communication between %processors. Currently, this simulator on the AAP-2 can run at %approximately 18 MCPS (million connections per second), which is 45 %times faster at a learning phase than that of the IBM-3090. The %results indicate that fine-grained, cellular array computers %consisting of a large number of simple processors can be efficient %neural network simulators. %------------------------------------------------------------------ @inproceedings{watanabe-sugiyama:89, author = {T. Watanabe and Y. Sugiyama and T. Kando and Y. Kitamura}, title = {Neural network simulation on a massively parallel cellular array processor: AAP-2}, booktitle = {IJCNN: International Joint Conference on Neural Networks}, volume = {2}, pages = {155-161}, year = {1989}, address = {Washington, D.C.} } % Index Number: 61 % Abstract % The motivation for the X1 architecture described was to %develop inexpensive commercial hardware suitable for solving large, %real-world problems. Such an architecture must be systems oriented and %flexible enough to execute any neural network algorithm and work %cooperatively with existing hardware and software. The early %application of neural networks must proceed in conjunction with %existing technologies, both hardware and software. Using state-of-the- %art technology and innovative architectural techniques, the author's %architecture approaches the speed and cost of analog systems while %retaining much of the flexibility of large, general-purpose parallel %machines. The author has aimed at a particular set of applications and %has made cost-performance tradeoffs accordingly. The goal is an %architecture that could be considered a general-purpose microprocessor %for neurocomputing. %------------------------------------------------------------------ @inproceedings{hammerstrom:90, author = {D. Hammerstrom}, title = {A {VLSI} architecture for high-performance, low-cost, on-chip learning}, booktitle = {IJCNN: International Joint Conference on Neural Networks}, volume = {2}, pages = {537-544}, year = {1990}, address = {San Diego, CA} } % Index Number: 62 @mastersthesis{eldredge:93, author = {J. G. Eldredge}, title = {{FPGA} Density enhancement of a neural network through run-time reconfiguration}, school = {Brigham Young University}, address = {Provo, UT}, month = {December}, year = {1993} } % Index Number: 63 % Abstract % Recent advances in configurable logic technology provide %sufficient processing density and bandwidth to directly implement %image and signal processing algorithms in digital hardware. Our %research demonstrates the feasibility of employing field %programmable gate arrays (FPGAs) to realize high-speed algorithm- %specific processing architectures for avionic signal processing %applications. Architectures composed of FPGAs provide a low-cost %and flexible alternative to custom hard-wired preprocessors and a %lower-cost, physically smaller alternative to massively parallel %processors (both SIMD and MIMD Machines). Algorithm segments %which require processing hundreds of millions of operations per %second have been mapped into a single FPGA device. This %technology may ultimately fill a range of processing requirements %in the areas of radar and communication processing as well as %image enhancement applications. % The application of configurable logic devices allows %realization of processing architectures to efficiently compute %low-level algorithmic functions, or segments. Reconfiguration of %FPGAs to implement several algorithm segments is analogous to %selecting subroutines to form a software algorithm suite in a %conventional processor, since it can be accomplished without %hardware modification. Specific architecture configurations %corresponding to algorithm segments can be chosen from a library %and immediately configured in hardware to realize the same %algorithm suite that could be realized in software, but with %greatly enhanced processing performance (typically two orders of %magnitude). For example, the processing architecture can be %reconfigured to realize an algorithm segment with a 5X5 filter %window instead of 3X3 window, or replace a median filter segment %with a morphological filter segment. %------------------------------------------------------------------ @inproceedings{lazarus-meyer:93, author = {R. B. Lazarus and F. M. Meyer}, title = {Realization of a Dynamically Reconfigurable Preprocessor}, booktitle = {Proceedings of the IEEE 1993 National Aerospace and Electronics Conference. {NAECON} 1993}, year = {1993}, month = {August}, %is this the right date? CONF LOCATION: Dayton, OH, USA; 24-28 May 1993 address = {Dayton, OH}, %is this the right address? PUBLISHER: IEEE; New York, NY, USA pages = {74-80} } % Index Number: 64 % Abstract % FPGAs need not be limited to a single fixed-size truth table %in each block. This paper discusses the utility of allowing each %block's single large table (e.g., one 5-input, 32-bit table ) to %be reconfigured into smaller table (e.g., eight 4-bit tables). %Results describing the efficiency of packing some standard %benchmark circuits into various configurations are presented and %the cost/benefits discussed. We show that a logic block %containing four lookup tables, each of which is 8-bit RAM, is the %best choice if only the area efficiency is considered. We also %show that if circuit speed is considered, a logic block, %containing two lookup tables, each of which contains 16 bits of %RAM, is the best choice. %------------------------------------------------------------------ @inproceedings{hill-woo:93, author = {D. Hill and N.-S. Woo}, title = {The Benefits of Flexibility in Lookup Table-Based {FPGA}s}, booktitle = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems}, volume = {12}, number = {2}, year = {1993}, month = {February}, address = {USA}, pages = {349-353} } % Index Number: 65 % Abstract % A configurable data path processor is presented which can be %modified to optimize performance. FPGA, PLA and PAL devices %provide a great amount of flexibility to realize arbitrary control %functions. The new processor is specifically designed for %arbitrary data path operations and can be dynamically %reconfigured. %------------------------------------------------------------------ @inproceedings{maki-whitaker:91, author = {G. Maki and S. Whitaker and G. Ganesh}, title = {A Reconfigurable Data Path Processor}, booktitle = {Proceedings of the Fourth Annual IEEE International ASIC Conference and Exhibit}, year = {1991}, month ={September}, address = {Rochester, NY}, pages = {P18-4.1-4.4} } % Index Number: 66 % Abstract % A processor array architecture based on dynamically %configurable logic cell array is designed to contain an 8X8 array %of processing units. This array is expandable to construct larger %arrays by combining chips together in a matrix. The configuration %data for the processing units is loaded parallel into an internal %configuration RAM to enable quick reconfiguration for a new task. %------------------------------------------------------------------ @inproceedings{korpiharju-viitanen:91, author = {T. Korpiharju and J. Viitanen and H. Kiminkinen and J. Takala and K. Kaski}, title = {TUTCA Configurable Logic Cell Array Architecture}, booktitle = {Proceedings of the Fourth Annual IEEE International ASIC Conference and Exhibit}, year = {1991}, month = {September}, address = {Rochester, NY}, pages = {P3-3.1-3.4} } % Index Number: 67 % Abstract % This paper describes a ten-year-long technical and social %experiment in our department to build a laboratory that designs %and fabricates microelectronic systems in support of research in %computer architecture. This experiment has been fairly successful %by several measures: We have built and demonstrated a number of %systems, some of significant size and complexity, spanning a %fairly wide range of approaches and applications. We have %validated the laboratory's original premise, demonstrating %economies of scale by sharing a system-building facility over %multiple projects. Most importantly, we have fielded experimental %systems on a long-term, maintainable basis. Our Microelectronic %Systems Laboratory (MSL) may serve as a useful model for others who %want to develop system-building capability in a university %setting. I'll describe the model in three parts: %1. An outline of the context, purpose, organization, and working %style of the lab, along with a brief chronology and description of %the experimental systems we have built. %2. A history of the longest-running and most ambitious project we %have so far undertaken, building several generations of Pixel- %Planes graphics systems. %3. A summary of what we believe we've done right, what we got %wrong, what we could have done better, and few remarks about where %we may go from here. %------------------------------------------------------------------ @article{poulton:91, author = {J. Poulton}, title = {Building Microelectronic Systems in a University Environment}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin}, pages = {387-400} } % Index Number: 68 %NOT IN INSPEC % Abstract % Field-Programmable Gate Arrays (FPGAs) and Single-Instruction %Multiple-Data (SIMD) processing arrays share many architectural %features. In both architectures, an array of simple, fine- %grained logic elements is employed to provide high-speed %customizable, bit-wise computation. In this paper, we present a %unified computational array model which encompasses both FPGAs and %SIMD arrays, Within this framework, we examine the differences %and similarities between these array structures and touch upon %techniques and lessons which can be transfered between the %architectures. The unified model also exposes promising prospects %for hybrid array architectures. We introduce the Dynamically %Programmable Gate Array which combines the best features from %FPGAs and SIMD arrays into a single array architecture. %------------------------------------------------------------------ @article{bolotski-dehon:94, author = {M. Bolotski, A. DeHon, and T.F. Knight, Jr.}, title = {Unifying FPGAs and SIMD Arrays}, booktitle = {FPGA '94 -- 2nd International ACM/SIGDA Workshop on FPGAs}, year = {1994}, address = {Berkeley, CA}, month = {March}, pages = {1-10} } % Index Number: 69 %NO ENTRY % Abstract %------------------------------------------------------------------ % Index Number:70 % Abstract % Field-programmable gate arrays are frequently used to %implement system interfaces and glue logic. However, there has %been little attention given to the special problems of these types %of circuits in FPGA architectures. In this paper we describe %Montage, a Triptych-based FPGA designed for implementing %asynchronous logic and interfacing separately-clocked synchronous %circuits. Asynchronous circuits have different requirements than %synchronous circuits, which make standard FPGAs unusable for %asynchronous applications. At the same time, many asynchronous %design methodologies allow components with greatly different %performance to be substituted for one another, making a design %environment which migrates between FPGA, MPGA, and semi-custom %implementations very attractive. Similar problems also exist for %interfacing separately-clocked synchronous circuits. We discuss %these problems, and demonstrate how the Montage FPGA satisfies the %demands of these classes of circuits. %------------------------------------------------------------------ @article{hauck-borriello:92, author = {S. Hauck, G. Borriello, S. Burns, and C. Ebeling}, title = {MONTAGE: An FPGA for Synchronous and Asynchronous Circuits}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference. Year of publication:1993 address = {Vienna, Austria}, %conference location: Vienna, Austria;publisher loc.: Berlin, Germany month = {August}, %conference dates: 31 Aug.-2 Sept. 1992 page = {44-51} } % Index Number: 71 % Abstract %The new family of Field Programmable Gate Arrays CLI6000 from %Concurrent Logic Inc realizes the truly Cellular Logic. It has %been mainly designed for the realization of data path %architectures. However, introduced by it new universal logic cell %calls also for new logic synthesis methods based on approximate, %for the minimization of Permuted Reed-Muller Trees that are %obtained by repetitive application of Davio expansions (Shannon %expansions for EXOR gates) in all possible orders of variable in %subtrees. Such trees are particularly well matched to both the %realization of logic cell and connection structure of the CLI6000 %device. It is shown on several standard benchmarks that the %heuristic algorithm gives good quality results in much less time %than the exact algorithm. %------------------------------------------------------------------ @article{wu-perkowski:92, author = {L.-F. Wu and M. A. Perkowski}, title = {Minimization of Permuted Reed-Muller Trees for Cellular Logic Programmable Gate Arrays}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {78-87} } % Index Number: 72 % Abstract % AT&T's ORCA (Optimized Reconfigurable Cell Array) architecture %extends FPGA applicability into a larger domain than is possible %with today's parts, including datapath intensive designs such as %memory controllers, signal processing parts, and telecommunication %interfaces. Key to the suitability of the ORCA for these jobs is %the fact that each of its basic blocks is capable of processing %four bits. So, for example, a 16 bit adder requires exactly 4 %blocks, not 9 or 16 as in other architectures. Yet the total %complexity of each block is comparable to other current parts, %thus yielding a significant improvement in functional density. %------------------------------------------------------------------ @article{hill-britton-oswald-woo-singh-chen-krambeck author = {D. Hill and B. Britton and B. Oswald and N.-S. Woo and S. Singh and C.-T. Chen and B. Krambeck} title = {ORCA: A New Architecture for High-Performance FPGAs} booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {52-60} } % Index Number: 73 % Abstract % In this paper, the design of VHDL coded squarers by using %logic synthesis is considered. The square function is important %for the digital processing of signals using e.g. matched filters %and Viterbi equalizers in receivers for communication systems. %However, many arithmetical functions like the square function are %not supported by VHDL. Hence, two major drawbacks arise in the %logic synthesis of VHDL code. Firstly, the designers are forced %to implement the needed arithmetical functions in VHDL by %themselves. Secondly, when implementing arithmetical functions %such as the square function in VHDL, special care must by taken in %order to circumvent massive hardware overhead of the synthesis %results compared with manually designed architectures. In the %case of the square function, this massive hardware overhead mainly %stems from the fact that the synthesis results of squarers are as %hardware expensive as the synthesis results of multipliers. In %the course of the present paper the authors shall demonstrate how %this hardware overhead of squarers can be reduced by using a %modified square algorithm (MSA) which was developed by the %authors. The MSA was derived based on the Dadda algorithm which %will be discussed briefly. %------------------------------------------------------------------ @article{kempa-jung:92, author = {G. Kempa and P. Jung}, title = {FPGA Based Logic Synthesis of Squarers Using VHDL}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {112-123} } % Index Number: 74 % Abstract % Chameleon is an experimental workstation based on a RISC %processor. It provides unprecedented flexibility and speed for %certain applications due to the use of RAM-configurable Field %Programmable Gate Arrays (FPGAs). FPGAs are used to replace glue %logic as well as to provide a non-dedicated computation resource. %This resource can be regarded as a general purpose coprocessor %which can be reconfigured and thus transformed into a special %purpose coprocessor in milliseconds at run-time. The coprocessor %can be used both for handling complex input/output functions as %well as to replace time critical inner loops of user programs %running on the central processing unit. Chameleon radically %relies on FPGAs for all input/output functions. It serves as a %means to probe the limits of FPGA usage while at the same time %being the development system for its own FPGA circuits. %------------------------------------------------------------------ @article{heeb-pfister:92, author = {B. Heeb and C. Pfister}, title = {Chameleon: A Workstation of a Different Colour}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {152-161} } % Index Number: 75 % Abstract % The NSR (non-synchronous RISC) architecture is an architecture %for a general purpose processor structured as a collection of %self-timed blocks that operate concurrently and communicate over %bundled data channels in the style of micropipelines. A 16- %bit version of the NSR architecture has been implemented using %Actel field programmable gate arrays (FPGAs). Each of the major %components of the NSR is implemented using one or two Actel FPGA %chips using a library of self-timed circuit modules. This %prototype implementation is being used to gain experience with the %NSR architecture and to gather statistics about the architectural %choices. The Actel FPGAs have proven to be extremely useful in %quickly prototyping this novel computer architecture. %------------------------------------------------------------------ @article{brunvand:92, author = {E. Brunvand}, title = {Using FPGAs to Prototype a Self-Timed Computer}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {192-198} } % Index Number: 76 % Abstract % The SPACE machine is introduced as a new type of computer %architecture, capable of very fast simulation of highly concurrent %systems. The machine is designed to be scalable, constructed from %a vast array of boards. The decisions made in the design of the %board are discussed, and the actual hardware (based on an array of %Field Programmable Gate Array chips) is described. It is shown %that this machine can be programmed by translating a subset of the %Occam language into asynchronous modules. Using the Circal %process algebra, a new method of formally verifying asynchronous %modules for these circuits is presented. This method allows %bounded gate delays to be included in a two-level modelling %mechanism. %------------------------------------------------------------------ @article{shaw-milne:92, author = {P. Shaw and G. Milne}, title = {A Highly Parallel FPGA-Based Machine and its Formal Verification}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {162-173} } % Index Number: 77 %NO ENTRY % Abstract %------------------------------------------------------------------ % Index Number: 78 %NO ENTRY % Abstract %------------------------------------------------------------------ % Index Number: 79 %NO ENTRY % Abstract %------------------------------------------------------------------ % Index Number: 80 % Abstract % Designers know how to use strings of inverters with %geometrically increasing sizes to drive large capacitive loads. %But they are unsure how to optimize arbitrary logic networks so as %to achieve least delay without resorting to trial-and-error %circuit simulations. The method of logical effort, introduced in %this paper, is a simple method that optimizes networks for speed. % The method of logic effort shows how many stages of logic are %required for the fastest implementation of any given logic %function. The effort of computing a logic function requires %amplification stages just the same as the effort of driving large %capacitive loads. The method reveals the proper transistor sizes %in each stage to realize the fastest overall operation. It also %provides a guide that can be applied in the early "back of the %envelope" stages of design to choose among major alternative %structures without extensive simulation work. % The method assigns a logical effort to each logic function. %The logical effort of an inverter is taken to be one. The logical %effort for any other logic function describes how much worse it is %than an inverter at producing output current, given an equivalent %amount of input capacitance. The logical effort of a logic function %depends mainly on its circuit topology and slightly on the %electrical properties of the fabrication process used to build it. %In CMOS the logical effort of each input of common two-input logic %functions ranges from about 4/3 for NAND to 4 for XOR. The logical %effort of functions with more than two inputs is generally higher. % Logical efforts for individual stages of logic can be combined %to find the logical effort on networks. Where several stages of %logic drive each other in a string, the overall effort involves %the product of their individual efforts. Where several logic %devices are driven from a common source, the overall effort %involves the sum of the efforts of the driven devices. Compound %circuits with smaller overall logical effort can be made to run %faster than logically equivalent circuits with larger logical %effort. %------------------------------------------------------------------ @article{sutherland-sproull:91, author = {I. E. Sutherland and R. F. Sproull}, title = {Logical Effort: Designing for Speed on the Back of an Envelope}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin}, pages = {1-16} } % Index Number: 81 % Abstract % This paper proposes a new parallel VLSI architecture for %computing the 2-D Fourier transform. Problems with parallel %computation of the multi-dimensional Fourier transform are the %interprocessor communication, I/O costs, and the large memory %requirements. In this paper we combine an efficient 2-D Fourier %transform design with a transposition architecture that pipelines %communication with processing. A comparison of this architecture %with other systems is presented. We show that the design has a %nearly optimal Thompson A*T measure of O(N 4 log s). %------------------------------------------------------------------ @article{kelley-madisetti:91, author = {B. Kelley and V. Madisetti}, title = {Optimal Concurrent VLSI Architectures for 2-D Transposition}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin}, pages = {290-306} } % Index Number: 82 % Abstract % As multiprocessor computer networks are scaled to support %thousands and millions of processors, we must exploit locality in %order to avoid uniform degradation in network performance. Fat- %tree networks offer a topology that theoretically scales %arbitrarily while allowing the exploitation of considerable %locality. In this paper, I present a scheme for constructing %practical fat-tree networks. Integrating expanders for redundant %multipath switching networks, I incorporate fault-tolerance into %the fat-tree network. I present primitive building blocks for the %construction of these networks and describe how these building %blocks can be synthesized using current technology. I also %present organizational structures for composing these primitives %into arbitrarily large networks. This synthesis results in a %practical scheme for building large-scale, high-performance %multiprocessor computer networks. With suitable locality and %technology, a 786,432 processor network can route a message on the %first attempt with over 70% probability when the network is fully %loaded. The latency through the network from one endpoint to %another is at most 320 ns. For more local connections, the %network latency can be as small as 40 ns. %------------------------------------------------------------------ @article{dehon:91, author = {Andre DeHon}, title = {Practical Schemes for Fat-Tree Network Construction}, booktitle = {Advanced Research in {VLSI}: Proceedings of the 1991 University of California/Santa Cruz Conference}, year = {1991}, month = {March}, address = {Santa Cruz, CA}, editor = {C. Sequin}, pages = {307-322} } % Index Number: 83 % Abstract % This paper describes the design of JAPROC, an 8-bit micro %controller. JAPROC is a processor-core which is being developed %within the EUREKA project JAMIE. The design consists of %Approximately 5000 gates and has been implemented in a FPGA Xilinx %X4005. % For testing purposes a PC board has been developed which %allows to configure the FPGA, download and execute micro %controller code and compare the results to an emulator. %------------------------------------------------------------------ @article{grunbacher-jaud:92, author = {H. Grunbacher and A. Jaud}, title = {JAPROC - An 8 bit Micro Controller Design and its Test Environment}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {146-151} } % Index Number:84 % Abstract % This paper describes an optimized fuzzy controller (FC) %architecture and its realization with field programmable gate %arrays (FPGAs). In consideration of data dependencies and minor %user restrictions within the definition of fuzzy rules (FRs), it %is possible to develop a high speed FPGA architecture. A %prototype of the FC operates at 5 MHz and needs 50 mu s operation %time (8 bit resolution) independent of the number of %inputs/outputs with 256 fuzzy rules. A pipeline architecture is %used to achieve a high processing speed. %------------------------------------------------------------------ @article{surmann-ungering:92, author = {H. Surmann and A. Ungering and K. Goser}, title = {Optimized Fuzzy Controller Architecture for Field Programmable Gate Arrays}, booktitle = {Field-Programmable Gate Arrays: Architectures and Tools for Rapid Prototyping. Second International Workshop on Field Programmable Logic and Applications}, year = {1992}, %above is the year of the conference; year of publication:1993 address = {Vienna, Austria}, %above is the place of the conf.; place of publication:Berlin, Germany month = {August}, %date of conference: 31 Aug.-2 Sept. 1992 pages = {124-133} } % Index Number: 85 %NO ENTRY % Abstract %------------------------------------------------------------------ % Index Number: 86 % Abstract % A new approach to application specific processor design is %presented in this paper. Existing application specific processors %are either based on existing general purpose processors or custom %designed special purpose processors. The availability of a new %technology, the Xilinx Logic Cell Array, presents the opportunity %for a new alternative. The Flexible Processor Cell is a prototype %of an extremely reconfigurable application specific processor. %Flexible processors can potentially provide the performance %advantages of special purpose processors. The flexible processor %concept opens many potential areas for future research in %processor architecture and implementation. This paper presents %the design, implementation, and preliminary performance evaluation %of an experimental flexible processor. %------------------------------------------------------------------ @inproceedings{wolfe-shen:88, author = {A. Wolfe and J. P. Shen}, title = {Flexible Processors: a Promising Application-Specific Processor Design Approach}, booktitle = {Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21}, year = {1988}, address = {San Diego, CA}, month = {November}, pages = {30-39} } % Index Number: 87 % Abstract % We present experimental results on FPGA use in special and %general purpose processors, using as case studies a computational %accelerator for gene sequence analysis, an integer implementation %of the DLX microprocessor and a real-time signal processor for %rocket telemetry. All these devices have been successfully %prototyped, and are now completely functional. We present %detailed analysis of our experience with FPGAs in these machines, %describing typically an order of magnitude improvement over %discrete IC implementations. %------------------------------------------------------------------ @article{fagin:93, author = {B. S. Fagin}, title = {Quantitative Measurements of FPGA Utility in Special and General Purpose Processors}, Journal = {Journal of VLSI Signal Processing}, volume = {6}, number = {2}, year = {1993}, address = {Boston, Massachusetts}, %INSPEC says the place of publication is the Netherlands month = {August}, pages = {129-137} }