KEYWORDS: Radon, Logic, Algorithms, Algorithm development, Signal processing, Visualization, Mathematics, Data processing, Current controlled current source, Performance modeling
This paper presents a new implementation of a floating-point divider unit with a competitive performance and reduced area based on proposed modifications to the recursive equations of Goldschmidt algorithm. The Goldschmidt algorithm takes advantage of parallelism in the Newton-Raphson method with the same quadratic convergence. However, recursive equations in the Goldschmidt algorithm consist of a series of multiplications with full-precision operands, and it suffers from large area consumption. In this paper, the recursive equations in the algorithm are modified to replace full-precision multipliers with smaller multipliers and squarers. Implementations of floating-point reciprocal and divider using the modification are presented. Synthesis result shows around 20% to 40% area reduction when it is compared to the implementation based on the conventional Goldschmidt algorithm.
This paper presents a scalable Elliptic Curve Crypto-Processor (ECCP) architecture for computing the point multiplication for curves defined over the binary extension fields (GF(2n)). This processor computes modular inverse and Montgomery modular multiplication using a new effcient algorithm. The scalability feature of the proposed crypto-processor allows a fixed-area datapath to handle operands of any size. Also, the word size of the datapath can be adjusted to meet the area and performance requirements. On the other hand, the processor is reconfigurable in the sense that the user has the ability to choose the value of the field parameter (n). Experimental results show that the proposed crypto-processor is competitive with many other previous designs.
A new class of nonlinear filters for color image processing was proposed by Lucchese and Mitra. This type of color filter processes the chromatic component of images encoded in the International Commission on Illumination (CIE) u'v' color space. Images processed by this filter do not show color shifts near edges between regions with different intensities. The filter uses linear convolution operations internally and is effective and efficient for denoising and regularizing color images. Image processing systems are computationally intensive and usually require a large amount of area in order to reach desirable levels of performance. The use of on-line arithmetic can decrease the area of the hardware implementation and still maintain a reasonable throughput. This work presents the design of the color filter as a network of on-line arithmetic modules. The network topology and some detail of each arithmetic module are provided. The final implementation targets FPGAs and it is compared in terms of area against an estimate of a conventional design. The throughput of this solution is capable of supporting real-time processing of common image formats.
KEYWORDS: Clocks, Field programmable gate arrays, Digital signal processing, Multiplexers, Logic, Control systems, Switching, Control systems design, Electrical engineering, Computer science
On-line division is one of the slowest operations among the basic arithmetic operations and naturally becomes a bottleneck in networks of on-line modules that use it. A higher radix divider has a good potential to attain higher throughput than radix-2 dividers and therefore improve the overall throughput of networks where division is needed. The improvement in throughput when using radix 4 is not straightforward since several components of the divider become more complex than in the radix-2 case. Previously proposed radix-4 designs were based on operand pre-scaling to simplify the selection function and reduce the critical path delay, at the cost of more complexity in the algorithm conditions and operations, plus a variable on-line delay, which is a very unattractive feature when small precision values are used (usually the case for DSP). These designs include several phases for pre-scaling and actual division. This paper proposes a design approach based on overlapped replication that results in a radix-4 on-line division module with low algorithm complexity, single division phase, less restrictions to the input values, and a small and fixed on-line delay.
KEYWORDS: Clocks, Field programmable gate arrays, Digital signal processing, Computer aided design, Finite impulse response filters, Signal processing, Logic, Optical filters, Data conversion, Radon
This paper shows the design and the evaluation of on-line arithmetic modules for the most common operators used in DSP applications, using FPGAs as the target technology. The designs are highly optimized for the target technology and the common range of precision in DSP. The results are based on experimental data collected using CAD tools. All designs are synthesized for the same type of devices (Xilinx XC4000) for comparison, avoiding rough estimates of the system performance, and generating a more reliable and detailed comparison of on-line signal processing solutions with other state of the art approaches, such as distributed arithmetic. We show that on-line designs have a hard stand for basic DSP applications that use only addition and multiplication. However, we also show that on-line designs are able to overtake other approaches as the applications become more sophisticated, e.g. when data dependencies exist, or when non constant multiplicands restrict the use of other approaches.
The design of digital systems that make use of redundant digit sets requires a specific design methodology. These types of systems are frequently used in high-performance arithmetic algorithms. Since more than one output represents the correct result and more than one bit-vector may represent the same output value, there are many possibilities for the system's realization. An important decision involves the selection of the digit codes. This decision impacts the area and delay of the final system. After the digit codes are defined, the number of possible implementations is usually large. This paper presents a design methodology for the implementation of redundant digit systems which allows the creation of an environment for the investigation of design alternatives for such systems. The use of this methodology makes it possible to systematically select digit codes and determine the best design solution. It provides the basis for the implementation of a CAD tool that can assist the designer in the generation of minimal gate networks for redundant digit systems. Besides presenting the methodology we also illustrate its use in the design of some redundant adders, and underline rules that should be followed in order to obtain the best gate networks.
KEYWORDS: Field programmable gate arrays, Digital signal processing, Clocks, Data conversion, Matrices, Algorithm development, Computer architecture, Logic, Computer engineering, Digital filtering
This paper presents the application of the Linear Sequential Array (LSA) retiming approach, developed for conventional digit- recurrence algorithms, to on-line multiplication. The result is a modular and fast pipelined structure which due to a small constant fan-out and cycle time independent of precision is suitable for FPGA implementation. First we present the basics of on-line multiplication, and determine data dependencies according to the LSA design methodology. Based on these dependencies we redesign the traditional on-line multiplier to obtain the LSA structure. Since in DSP applications one of the multiplier operands is fixed for a long sequence of operations, we briefly present a parallel-serial multiplication unit that receives one of the operands in parallel and the other operand in Most-Significant-Digit-First format. Performance and area results are provided for the LSA on-line multiplier design and then compared with the conventional on-line design, using Xilinx FPGAs as the target technology.
KEYWORDS: Field programmable gate arrays, Computing systems, Binary data, Earth Viewing Camera, Very large scale integration, Computer science, Computer arithmetic, Signal processing, Voltage controlled current source, Switching
We present a design of high-radix digit-slices for the implementation of on-line multiply-add operator (OMA). Our evaluation of performance and cost shows that speedups above 1.5 can be obtained with respect to radix 2 at reasonable increase in cost. The design and evaluation are based on the Xilinx FPGAs. We also discuss the use of OMAs modules in solving linear recurrences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.