Paper Title
Design and Implementation of 32-Bit Inexact Floating Point Arithmetic Unit
Abstract
In computing, floating-point format is an arithmetic formulaic representation of real numbers as an
approximation so as to support a trade-off between range and precision. For this reason, floating-point computation is often
found in systems which include very small and very large real numbers, which require fast processing times. In general
floating point format is denoting as a mode of representing numbers as two sequences of bits, one representing the digits in
the number called mantissa and the other an exponent which determines the position of the radix point. The traditional
method of floating point arithmetic involves accurate computation for all applications. This traditional method of computing
on floating point arithmetic requires high power. But power has become a key constraint in nano scale integrated circuit
design due to the increasing demands for mobile computing and higher integration density. As an emerging computational
paradigm, an inexact circuit offers a promising approach to significantly reduce both static and dynamic power dissipation
for error tolerant applications. The objective of this project is to implement an inexact 32 bit binary floating point arithmetic
which includes floating point adder, subtrctor and multiplier with improving performance. Here pipelined architecture is
used in order to increase the performance and to increase the operating frequency. At the same time, the related logic
includes both normalizer and the rounder according to the inexact mantissa and exponent parts. Floating point arithmetic is
handled by the FP add, FP sub, FP mul. FPadd adds the value in the floating point accumulator to the floating point
accumulator. FPsub subtracts the value in the floating point operand from the floating point accumulator. FP mul multiplies
the value in the floating accumulator by the floating point operand. In this project,the proposed architecture is simulated and
synthesized by Xilinx ISE 14.7.
Keywords - Floating Point adder, Floating Point subtractor, Floating Point multiplier, Dadda multiplier