Elementary function calculator algorithms:
a tutorial

First posted
Tuesday January 18, 2005 06:33

Updated
Friday February 25, 2005 10:01

Elementary function examples include the square root, logarithm, trigonometric function, exponentiation, ...

From 1960 or earlier until 1972 bill thought the best way to compute the square root on a computer was Newton's method.

Then along came hand-held electronic calculators some of which had a square root key.

How do these calculators compute the square root?

Newton's method assumes the real number system, floating point multiplication and divisions. So it seemed unlikely that Newton's method is used.

In 1972-73 bill and family went on sabbatical leave from Washington State University in Pullman, WA to the University of Illinois at Urbana-Campaign for the job assignment of getting ILLIAC III working.

We moved to Urbana-Campaign using white ford!

Professor James E Robertson led the team effort [there were about a half dozen hardware and software team members] to get ILLIAC III working.

It soon become apparent to all that ILLIAC III was not going to work!

Robertson commented, "It's no fun working on someone else's failing project."

So the team members did other things! No sense wasting time or effort working on some else's failing project.

The hardware was flakey. Do the lead hardware engineer and the group elected to take several weeks off while the board fingers were gold plated.

The project never resumed.

The testing computer was a pdp 8e. When the software crashed, the boot code had to be hand toggled into the 8e.

This is another reason bill was interested in a forth operating system.

Many of the Illiac III team started looking for other jobs.

The lead software engineer had a pig farm to fall back onto!

bill started working on a partial sort [find the i-th largest number without sorting all of the numbers] algorithm.

Sage advice from Henry Neugass concerning his late [deceased] girl friend Suzannah [a Stanford grad].

Stanford University teaches you how to succeed.

Stanford does not teach you how to fail.

Bill asked Robertson if he knew how elementary functions were computed in a calculator. A big smile crossed Robertson's face.

Robertson gave bill a copy of

DeLugish:1970:CAA [DeL70] Bruce Gene DeLugish. A class of algorithms for automatic evaluation of certain elementary functions in a binary computer. Technical Report 399, Department of Computer Science, University of Illinois at Urbana-Champaign,Urbana, Illinois, 1970. 191 pp.

Here's a link to another reference to DeLugish's [and Robertson's] work

2.1.6 Second step The second step consists in a more precise test for the arguments that failed during the rst step. The exponential is computed with a higher precision, chosen so that the probability that the test fails for an argument is very low. We chose a variation of De Lugish s algorithm [8] for computing the exponential (since it contains no multiplication) The computations were performed on 128 bit integers (four 32 bit integers) The algorithm was implemented in assembly language (which is, for the present purpose, simpler than in C language) 2.1.7 Improvements The algorithm ....

B. DeLugish. A class of algorithms for automatic evaluation of functions and computations in a digital computer. PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 1970.

Bill's questions were answered.

Someone asked Robertson how fast he could make these algorithms work. Robertson responded, "How much memory do I get?"

We have megabits of flash memory today at very little cost!

The following email is too important to include just a link.

Meggitt's article is the fundamental article about digit-by-digit computations of elementary functions.

From knighten@ssd.intel.com Wed Aug 31 18:31:36 EDT 1994
Article: 61830 of sci.math
Newsgroups: sci.math
Path: galois!bloom-beacon.mit.edu!grapevine.lcs.mit.edu!uhog.mit.edu!news.kei.com!ssd.intel.com!ssdintel!knighten
From: knighten@ssd.intel.com (Robert L. Knighten)
Subject: Re: How calculators compute sin(x)
In-Reply-To: pmohseni@iastate.edu's message of 30 Aug 94 18:35:05 GMT
Message-ID: KNIGHTEN.94Aug31144644@tualatin.ssd.intel.com
Sender: usenet@SSD.intel.com
Nntp-Posting-Host: tualatin
Reply-To: knighten@ssd.intel.com (Bob Knighten)
Organization: Supercomputer System Division, Intel
References: pmohseni.778271705@eng1.iastate.edu
Date: Wed, 31 Aug 1994 21:46:42 GMT
Lines: 71
Status: RO

CORDIC and Elementary Function Computation

CORDIC stands for COordinate Rotation DIgital Computer. It was initially a special purpose digital computer for real-time aircraft navigation. It has come to stand for the method embodied in this computer. That method is based on the observation that the sin of 128 degrees is the y-coordinate of the result of rotating the vector (1,0) through 128 degrees and that can be efficiently computed as a composition of rotations through smaller angles: 128 ~ 90 + 45 - 22.5 + 11.2 + 5.7 - 2.9 + 1.5 - 0.8 and these rotations can be very efficiently computed. This was generalized to other functions by considering vectors in the complex plane and complex rotations.

The information about the CORDIC algorithm, which is used in virtually all scientific calculators and very seldom in computers, can be found in:

Jack E. Volder, The CORDIC Trignometric Computing Technique IRE Trans. Electronic Computing, EC-8 (1959) pp. 330-334

The original paper.

J. E. Meggitt, Pseudo Division and Pseudo Multiplication Processes IBM J. of Res. and Devel., (1962) pp. 210-226

Extension of the CORDIC method to division and multiplication.

R. J. Linhardt and H. S. Miller, Digit-by-Digit Transcendental-Function Computation. RCA Review, 30 (1969) pp. 209-247

This considers a somewhat different algorithm, but is a good survey article and has a nice comparison with polynomial approximations.

J. S. Walther, A Unified Algorithm for Elementary Functions Spring Joint Computer Conference (1971) pp. 379-385

The paper by Walther is particularly interesting as it describes a unified method for computing the trignometric functions, the hyperbolic functions, exp, ln and square root. This is the base for all of the Hewlett-Packard scientific calculators.

The recently developed table based methods are discussed in

R. C. Agarwal, J. W. Cooley, F. G. Gustavson, J. B. Gustavson, J. B. Shearer, G. Slishman and B. Tuckerman, New Scalar and Vector Elementary Functions for the IBM System/370. IBM J. Res. Develop., 30 (1986) pp. 126-144

S. Gal, Computing Elementary Functions: A New Approach for Achieving High Accuracy and Good Performance. Proceedings of the International Scientific Symposium of IBM, March 12-14, 1985 (Bad Neuenahr, Germany)

A good general discussion of the methods for computing elementary functions is in

W. J. Cody, Software for the Elementary Functions in MATHEMATICAL SOFTWARE, J. Rice, ed., Academic Press, Inc. (1971) pp. 171-186

and the usual reference for a discusion of common algorithms for polynomial approximation methods from a how-to-do-it perspective is

W. J. Cody and W. Waite, SOFTWARE MANUAL FOR THE ELEMENTARY FUNCTIONS Prentice-Hall, 1980

So let's look at the essential idea in digit-by-digit [Henry Briggs 2] computation. By example, of course.

The essential idea is to guess what the next digit [bit in our binary world] is, then try to confirm if the guess was correct or not.

Let's compute the square root of three bit-by-bit.

The value of the square root of 3 =1.732 is easy to remember for an American.

Born in 1732 into a Virginia planter family, he learned the morals, manners, and body of knowledge requisite for an 18th century Virginia gentleman.

Let's do the computation in binary.

3. to base 10 is 11. in binary. That's a decimal point after the 3 and a binary point after the 11.

Scaling

Putting the binary point just of the left of the most significant bit is a good idea.

11. = .11 x 10^2

x represents multiplication and ^ exponentiation.

We apologize for representing the exponent in decimal while the base is represented in binary. We could have written 11. -= .11 x 10^10 but this might be even more confusing.

Eschew obfuscation.

Game plan

The game plan for computing a square root is to estimate the value of the first bit.

A square root in general will contain about half the number of bits.

We'll guess that the next least significant bit is a one. Then we will square the estimated square root.

If, in fact, the next least significant bit is a 1, then the square will be less than 3. If, on the other hand, the square is greater than 3, the the trial bit must be a zero.

Let's start our computation guess with 1.5 as the square root of 3. In binary 1.1. And in our scaled binary .11 x 10^1.

So let's square .11 x 10^1.

In binary we use shift and add.

In these bit-by-bit algorithm implementable on, of course, an 8051 family microcontroller, of course, we will only use ADD with carry, SUBTRACT with borrow, shift right and left and compares

Here we go. 11 x 11 = 11 + 110 = 1001 times 10^2. In decimal this is 10.01 or 2.25 which is less than 3. So we guessed correct that the first two bits of the square root are 1.1.

Let's test whether the next significant bit is a one. The guess is .111 x 10^1.

Keep in mind we are shifting and adding. A one in the multiplier means we shift and add. A zero in the multiplier means we merely shift.

111 x 111 = 111 + 1110 + 11100 = 10101 + 11100 = 110001 x 10^2 or 11.0001 which is 3.0625.

Whoops!

3.0625 is bigger than 3 so the first 3 bit of the square root must be 1.10.

So let's see if the fourth bit is a 1.

1.101 = .1101 x 10^1.

Lets square. 1101 x 1101 = 1101 +110100 + 1101000 = 10000001 + 1101000 = 10101001 x 10^2 = 10.101001

.101001 is 101001. x 10^-6 in decimal equals 41 / 2^6 or 29/64 = 2.604625 which is less than 3 so the fourth bit is a 1.

We can proceed on computing bit by bit using a calculator, of course!

Let's test to see if the fifth bit is a 1.

1.1011 = .11011 x 10^1.

For this example 11011 was pushed on the the HP48G [a forth based calculator, of course] 4 times.

Multiply was keyed to calculate 1011011001.

1011011001 was DUPed [forth parlance - ENTER in HP48G parlance].

1011011001 was converted to decimal 729.

2 was pushed on to the stack. 8 was pushed on to the stack an y^x computed to yield 256.

256 was divided into 729 to give the 2.8 ....

So we can continue on and on computing the bits.

Note that the convergence to the correct value is from below. The computed square root will always be smaller than the "true" [computed to an countable infinite number of bits] value.

Long strings of ones

A trick to avoid lots of adds with carry is to replace long strings of 1's. Seven in binary is 111 or 1000 - 1. So, in this case, three shift and adds with carry are replaced by one shift of 4 followed by a subtract with borrow.

This set-up work for the above trick can be more labor than the savings.

But, hey, we may be doing more work than we need by squaring each trial square root each time!

Eliminating redundant computations

The name-of-the-game is to use as much of the computations as possible for testing whether i-th bit in a one in testing whether the i+1st bit is a one.

Let's look at

So let's see if the fourth bit is a 1.

1.101 = .1101 x 10^1.

Lets square. 1101 x 1101 = 1101 +110100 + 1101000 = 10000001 + 1101000 = 10101001 x 10^2 = 10.101001

but do the computations a bit different.

Let's compute 1.100 squared and then compute 1.101 squared from 1.100 squared.

1100 x 1100 = 110000 + 1100000 = 10010000.

(x+1)^2 = x^2 + 2x +1.

So

1101 x 1101 = 10010000 + 11000 + 1 = 10101001

The point of this is that if bit i is not a 1, then you have to have the computation available for bit i a zero to compute bit i+1. Or you can "undo" of assuming the i-th bit is a 1 with (x-1)^2 = x^2 -2x + 1.

Note that computation of -2x is is first a shift by one binary digit, then a subtract with possibly a borrow.

If one views the terminal bits of elementary functions as random numbers [WHICH LOOK LIKE RANDOM NUMBERS, AND ARE, BUT THEY ARE THE RIGHT RANDOM NUMBERS!], then the probability of the i-th bit being a one is 1/2.

The terminal bits of elementary functions are increased accuracy.

So if the i-th bit is a 1 the i+1 st then all of the succeeding bits being 1 can be eliminated.

Here's the reason. Suppose all of the remaining bits are a 1.

Let's locate the binary point to left of the first 1.

n = . 111111111 .... with ones forever

10 x n = 1.1111111 ... forever [that's one zero in binary, not 10 in decimal

10 x n - n = 1.00000000 with zeros forever.

so n = 1.

This causes a carry in the i-th position so that the i-th bit which upsets correctly computed more significant bits.

So conclusion is that one or more of the terminal bits must be zeros!

Aboulghassem [sp?] Zirakzadeh taught bill's first course in higher algebra at the University of Colorado in the summer of 1858.

Let's look at relevant text.



Rational number repeat. See red bar.

Find integers p/q = 2.7148459459459....



The square root of three doesn't have any repetitions.

The square root of 3 is irrational. See green bar.



Here's Weiss on two representations of, say, 2. See blue bar.



Bill was 12 years old when Weiss wrote Higer Algebra

Bill has the distinction of being 45 days younger than Saddam Hussein!




and was 21 when Zirakzadeh taught him some of what we are presenting on this page.

What's amazing is that the technology for hardware, and occasionally software, evaluation of elementary functions predates the work of Isaac Newton by about 100 years.

Bill got falsely accused of sending another of his tutorials to Japan.

Bill is innocent .... of this charge!

Without the above course in higher algebra taught by Iranian Zirakzadeh bill might not have authored the job assignment tutorial RSA ENCRYPTION?

Some early electronic calculators would return the answer 1.99999999 ... when you took the square root of 4.

The marketing department, of course, got after the algorithm departmen to fix the problem ... at least for the perfect square integers.

However, let's experiment taking the square root of 4 on a forth-based HP48G.

It got 2.

Then we took the square root of 2, DUPed it [ENTER in 48G parlance], then square the root of 2.

Terminal bit problems, of course.

So don't get the idea that you can compute so many bits accurately, then tack on a random number! Cheat! Fraud! Flim-Flam!

And we are going to see when we get back to terminal bits of a/d conversion that there can be a bit of fraud in products that don't digitize at the sensor.

So calculation of the i-th bit being a 0, then computing the i-th bit being a 1 is inefficient. It's best to assume that the i-th bit is a 1, then correct when it is not since it will be a 1 half of the time!

In the real world, the work involved in converting a decimal number to binary and at the end of the computation converting from binary to decimal is so great that it is frequently, if not about always, best for a calculator to work in binary coded decimal [bcd].

Here is the encoding for bcd.

decimal  bcd
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
0? 1010
1? 1011
2? 1100
3? 1101
4? 1110
5? 1111

Two bcd digits are packed per byte. 37 is represented 00110111.

For binary coding greater than 9, a question mark is added for the reason that there are different ways to handle the "invalid" representations.

The 8051 family instruction set included Decimal-adjust Accumulator of Addition to help with bcd arithmetic.





Bill and Richard J Hanson had lunch on Thursday January 27, 2005.

Bill brought our HP48G.

We discussed calculator algorithm accuracy ... among other topics.

Hanson commented that calculator algorithm computations can be greatly speeded if some computations are pre -computed and stored in memory.

Memory is the plentiful and cheap today!

The basic idea is to use a much information as possible about computing whether bit i is 1 or 0 to compute whether bit i+1 is a 1 or a 0 as opposed to recomputing from scratch.

Unsigned Division

Robertson pointed out that amount of work to compute elementary functions is on the order of the amount of work to do any unsigned divide.

In an unsinged divide a trial divisor is used to see if it is too big nor not.

In binary integer computer arithmetic, the trial divisor computation is made. If the divisor is too big, then the computation is "undone."

Let's first look at the Intel 8080 32 bit dividend and 16 bit divisor algorithm to evaluate the forth word U/ as presented by Mitch Derick and Linda Baker in FORTH Encyclopedia.







Simply stated, coding and debugging U/ is very unpleasant.

We ported 8085 forth to the 8051 at Sandia labs. The National Security Agency paid for the work.

Jerry Boutelle was paid to do the port.

However, Boutelle refused to take the contract if he was required to code and debug

1 U/
2 ENCLOSE
3 -FIND and associated (FIND)
4 the 8051 assembler.

So bill coded the above. First starting with the assembler.

Bill got fancy and wrote a syntax-checking 8051 forth assembler. Most forth assemblers do not syntax check and, as a result, can produce garbage code.

ENCLOSE is bad because there are so many cases.

-FIND was super-fun. Bill modified -FIND to run lots faster than the "real" -FIND.

U/ is so horrible to code and debug that we initially installed Bright Forth in an unused IBM PC motherboard socket, copied the code to disk, then "borrowed" U/ until bill had time to recode U/ from scratch.

On bill's initial attempt there was a bug in his U/. But the bug surfaced very infrequently.

Bill recalls that the problem involved a special case but doesn't recall ... and won't study the code for the reason that the experience of coding and debugging U/ was really horrible!

Isn't this what graduate students are for, in academia? Or contractors, in government?

As a result forth didn't work quite right!

Here's the final [see green bars]





The important point of U/ from this elementary function digit by digit computation is that U/ "backs-out" or "undoes" a mistaken 1 bit in the divisor with the 8051 ADDC.

Bit-by-bit algorithms implemented in hardware or microcode are on the two orders [x100] faster than implementation [as above 8080 and 8051 U/ examples] in software.

Hardware is usually best debugged using a hardware/software interface.

Forth is likely the ultimate tool for debugging hardware because of its interactive incremental compiling and assembler features. And reliability.

This closes the brief tutorial on digit-by-digit computation of elementary functions.

We have not done justice to table look-up in speeding computations. Maybe later.

Look what we have to work with for huge inexpensive memory!

Designing Systems with xD-Pciture Card Support

Hitting their stride
Nonvolatile-memory upstarts draw near to established leaders

EDN January 20, 2005

Forth, especially 8051 family!, really speeds interface development to such hardware

We have bigger fish to fry.

Terminal digits of a/d conversions! There is some real money here!

The contemplation of things as they are,
without error or confusion,
without substitution or imposture,
is in itself a nobler thing
than a whole harvest of invention.

Francis Bacon