Beeler, M., Gosper, R.W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. Retyped and converted to html ('Web browser format) by Henry Baker, April, 1995.

PROGRAMMING HACKS

Previous Up Next

WARNING: Numbers in this section are octal (and occasionally binary) unless followed by a decimal point. 105=69.. (And 105.=69 hexadecimal.) [PDP-10 Info]

ITEM 145 (Gosper):

Proving that short programs are neither trivial nor exhausted yet, there is the following:
0/      TLCA 1,1(1)
1/      see below
2/      ROT 1,9
3/      JRST 0
This is a display hack (that is, it makes pretty patterns) with the low 9 bits = Y and the 9 next higher = X; also, it makes interesting, related noises with a stereo amplifier hooked to the X and Y signals. Recommended variations include:
      CHANGE:         GOOD INITIAL CONTENTS OF 1:
        none            377767,,377767; 757777,,757757; etc.
        TLC 1,2(1)      373777,,0; 300000,,0
        TLC 1,3(1)      -2,,-2; -5,,-1; -6,,-1
        ROT 1,1         7,,7; A0000B,,A0000B
        ROTC 1,11       ;Can't use TLCA over data.
        AOJA 1,0

ITEM 146: MUNCHING SQUARES

Another simple display program. It is thought that this was discovered by Jackson Wright on the RLE PDP-1 circa 1962.
        DATAI 2
        ADDB 1,2
        ROTC 2,-22
        XOR 1,2
        JRST .-4
2=X, 3=Y. Try things like 1001002 in data switches. This also does interesting things with operations other than XOR, and rotations other than -22. (Try IOR; AND; TSC; FADR; FDV(!); ROT -14, -9, -20, ...)

ITEM 147 (Schroeppel):

Munching squares is just views of the graph Y = X XOR T for consecutive values of T = time.

ITEM 148 (Cohen, Beeler):

A modification to munching squares which reveals them in frozen states through opening and closing curtains: insert FADR 2,1 before the XOR. Try data switches =
4000,,4         1000,,2002      2000,,4        0,,1002
(Notation: <left half>,,<right half>)

Also try the FADR after the XOR, switches = 1001,,1.

ITEM 149 (Minsky): CIRCLE ALGORITHM

Here is an elegant way to draw almost circles on a point-plotting display:

NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X

This makes a very round ellipse centered at the origin with its size determined by the initial point. epsilon determines the angular velocity of the circulating point, and slightly affects the eccentricity. If epsilon is a power of 2, then we don't even need multiplication, let alone square roots, sines, and cosines! The "circle" will be perfectly stable because the points soon become periodic.

The circle algorithm was invented by mistake when I tried to save one register in a display hack! Ben Gurley had an amazing display hack using only about six or seven instructions, and it was a great wonder. But it was basically line-oriented. It occurred to me that it would be exciting to have curves, and I was trying to get a curve display hack with minimal instructions.

ITEM 150 (Schroeppel):

PROBLEM: Although the reason for the circle algorithm's stability is unclear, what is the number of distinct sets of radii? (Note: algorithm is invertible, so all points have predecessors.)

ITEM 151 (Gosper):

Separating X from Y in the above recurrence,

X(N+1) = (2 - epsilon^2) * X(N) - X(N-1)
Y(N+1) = (2 - epsilon) * Y(N) - Y(N-1).

These are just the Chebychev recurrence with cos theta (the angular increment) = 1-epsilon^2/2. Thus X(N) and Y(N) are expressible in the form R cos(N theta + phi). The phi's and R for X(N) and Y(N) can be found from N=0,1. The phi's will differ by less than pi/2 so that the curve is not really a circle. The algorithm is useful nevertheless, because it needs no sine or square root function, even to get started.

X(N) and Y(N) are also expressible in closed form in the algebra of ordered pairs described under linear recurrences, but they lack the remarkable numerical stability of the "simultaneous" form of the recurrence.

ITEM 152 (Salamin):

With exact arithmetic, the circle algorithm is stable iff |epsilon| < 2. In this case, all points lie on the ellipse

X^2 - epsilon X Y + Y^2 = constant,

where the constant is determined by the initial point. This ellipse has its major axis at 45 degrees (if epsilon > 0) or 135 degrees (if epsilon < 0) and has eccentricity

sqrt(epsilon/(1 + epsilon/2)).

ITEM 153 (Minsky):

To portray a 3-dimensional solid on a 2-dimensional display, we can use a single circle algorithm to compute orbits for the corners to follow. The (positive or negative) radius of each orbit is determined by the distance (forward or backward) from some origin to that corner. The solid will appear to wobble rigidly about the origin, instead of simply rotating.

ITEM 154 (Gosper):

The myth that any given programming language is machine independent is easily exploded by computing the sum of powers of 2. By this strategy, consider the universe, or, more precisely, algebra:

let X = the sum of many powers of two = ...111111
now add X to itself; X + X = ...111110
thus, 2X = X - 1 so X = -1
therefore algebra is run on a machine (the universe) which is twos-complement.

ITEM 155 (Liknaitzky):

To subtract the right half of an accumulator from the left (as in restarting an AOBJN counter):
        IMUL A,[377777,,1]

ITEM 156 (Mitchell):

To make an AOBJN pointer when the origin is fixed and the length is a variable in A:
        HRLOI A,-1(A)
        EQVI A,ORIGIN

ITEM 157 (Freiberg):

If instead, A is a pointer to the last word
        HRLOI A,-ORIGIN(A)
        EQVI A,ORIGIN
Slightly faster: change the HRLOIs to MOVSIs and the EQVI addresses to -ORIGIN-1. These two routines are clearly adjustable for BLKOs and other fenceposts.

ITEM 158 (Gosper, Salamin, Schroeppel):

A miniature (recursive) sine and cosine routine follows.
COS:    FADR A,[1.57079632679]  ;pi/2
SIN:    MOVM B,A        ;ARGUMENT IN A
        CAMG B,[.00017] ; <= 3^(1/3) / 2^13
        POPJ P,         ;sin X = X, within 27. bits
        FDVRI A,(-3.0)
        PUSHJ P,SIN     ;sin -X/3
        FMPR B,B
        FSC B,2
        FADRI B,(-3.0)
        FMPRB A,B       ;sin X = 4(sin -X/3)^3 - 3(sin -X/3)
        POPJ P,         ;sin in A, sin or |sin| in B
;|sin| in B occurs when angle is smaller than end test
Changing both -3.0's to +3.0's gives sinh:

sinh X = 3 sinh X/3 + 4 (sinh X/3)^3.

Changing the first -3.0 to a +9.0, then inserting PUSHJ P,.+1 after PUSHJ P,SIN gains about 20% in speed and uses half the pushdown space (< 5 levels in the first 4 quadrants). PUSHJ P,.+1 is a nice way to have something happen twice. Other useful angle multiplying formulas are

tanh X = (2 tanh X/2) / (1 + (tanh X/2)^2)

tan X = (2 tan X/2) / (1 - (tan X/2)^2),

if infinity is handled correctly. For cos and cosh, one can use

cos X = 1 - 2 (sin X/2)^2, cosh X = 1 + 2 (sinh X/2)^2.

In general, to compute functions like e^X, cos X, elliptic functions, etc. by iterated application of double and triple argument formulas, it is necessary to subtract out the constant in the Taylor series and transform the range reduction formula accordingly. Thus:

F(X) = cos(X)-1
F(2 X) = 2 F*(F+2)
F(epsilon) = -epsilon^2/2

G(X) = e^X - 1
G(2 X) = G*(G+2)
G(epsilon) = epsilon

This is to prevent the destruction of the information in the range-reduced argument by the addition of a quantity near 1 upon the success of the epsilon test. The addition of such a quantity in the actual recurrences is OK since the information is restored by the multiply. In fact, a cheap and dirty test for F(epsilon) sufficiently small is to see if the addition step has no effect. People lucky enough to have a square root instruction can get natural log by iterating

X <- X/(sqrt(1+X) + 1) until 1+X = 1.

Then multiply by 2^(number of iterations). Here, a LSH or FSC would work.

ITEM 159 (Gosper, Schroeppel):

(Numbers herein are decimal.)

The correct epsilon test in such functions as the foregoing SIN are generally the largest argument for which addition of the second term has no effect on the first. In SIN, the first term is x and the second is -x^3/6, so the answer is roughly the x which makes the ratio of those terms 1/2^27; so x = sqrt(3) / 2^13. But this is not exact, since the precise cutoff is where the neglected term is the power of 2 whose 1 bit coincides with the first neglected (28th) bit of the fraction. Thus, x^3/6 = 1/2^27 * 1/2^13, so x = 3^(1/3) / 2^13.

ITEM 160 (Gosper):

Here is a way to get log base 2. A and B are consecutive. Call by PUSHJ P,LOG2 with a floating point argument in A.
LOG2:   LSHC A,-33
        MOVSI C,-201(A)
        TLC C,211000    ;Speciner's bum
        MOVI A,200      ;exponent and sign sentinel
LOGL:   LSH B,-9
REPEAT 7, FMPR B,B      ;moby flunderflo
        LSH B,2
        LSHC A,7
        SOJG A,LOGL     ;fails on 4th try
        LSH A,-1
        FADR A,C
        POPJ P,         ;answer in A
Basically you just square seven times and use the low seven bits of the exponent as the next seven bits of the log.

ITEM 161 (Gosper):

To swap the contents of two locations in memory:
        EXCH A,LOC1
        EXCH A,LOC2
        EXCH A,LOC1
Note: LOC1 must not equal LOC2! If this can happen use MOVE-EXCH-MOVEM, clobbering A.

ITEM 162 (Gosper):

To swap two bits in an accumulator:
        TRCE A,BITS
        TRCE A,BITS
        TRCE A,BITS
Note (Nelson): last TRCE never skips, and used to be a TRC, but TRCE is less forgettable. Also, use TLCE or TDCE if the bits are not in the right half.

ITEM 163 (Sussman):

To exchange two variables in LISP without using a third variable:

(SETQ X (PROG2 0 Y (SETQ Y X)))

ITEM 164 (Samson):

To take MAX in A of two byte pointers (where A and B are consecutive accumulators):
        ROTC A,6
        CAMG A,B
        EXCH A,B
        ROTC A,-6

ITEM 165 (Freiberg):

A byte pointer can be converted to a character address < 2^18 by MULI A,<# bytes/word> followed by SUBI B,1-<# b/w>(A). To get full word character address, use SUB into a magic table.

ITEM 166 (Gosper, Liknaitzky):

To rotate three consecutive accumulators N < 37. places:
        ROTC A,N
        ROT B,-N
        ROTC B,N
Thus M AC's can be ROTC'ed in 2M-3 instructions. (Stallman): For 73. > N > 35.:
        ROTC A,N-36.
        EXCH A,C
        ROT B,36.-N
        ROTC A,N-72.

ITEM 167 (Gosper, Freiberg):

;B gets 7 bit character in A with even parity
        IMUL A,[2010040201]     ;5 adjacent copies
        AND A,[21042104377]     ;every 4th bit of left 4 copies + right copy
        IDIVI A,17<-7           ;casting out 15.'s in hexadecimal shifted 7

;odd parity on 7 bits (Schroeppel)
        IMUL A,[10040201]       ;4 adjacent copies
        IOR A,[7555555400]      ;leaves every 3rd bit+offset+right copy
        IDIVI A,9<-7            ;powers of 2^3 are +-1 mod 9
;changing 7555555400 to 27555555400 gives even parity

;if A is a 9 bit quantity, B gets number of 1's (Schroeppel)
        IMUL A,[1001001001]     ;4 copies
        AND A,[42104210421]     ;every 4th bit
        IDIVI A,17              ;casting out 15.'s in hexadecimal

;if A is 6 bit quantity, B gets 6 bits reversed (Schroeppel)
        IMUL A,[2020202]        ;4 copies shifted
        AND A,[104422010]       ;where bits coincide with reverse repeated base 2^8
        IDIVI A,377             ;casting out 2^8 - 1's

;reverse 7 bits (Schroeppel)
        IMUL A,[10004002001]    ;4 copies sep by 000's base 2 (may set arith. o'flow)
        AND A,[210210210010]    ;where bits coincide with reverse repeated base 2^8
        IDIVI A,377             ;casting out 377's

;reverse 8 bits (Schroeppel)
        MUL A,[100200401002]    ;5 copies in A and B
        AND B,[20420420020]     ;where bits coincide with reverse repeated base 2^10
        ANDI A,41               ;"
        DIVI A,1777             ;casting out 2^10 - 1's

ITEM 168 (PDP-1 hackers):

foo,    lat       /DATAI switches
        adm a     /ADDB
        and (707070
        adm b
        iot 14    /output AC sign bit to a music flip-flop
        jmp foo
Makes startling chords, arpeggios, and slides, with just the sign of the AC. This translates to the PDP-6 (roughly) as:
FOO:    DATAI 2
        ADDB 1,2
        AND 2,[707070707070]     ;or 171717171717, 363636363636, 454545454545, ...
        ADDB 2,3
        LDB 0,[360600,,2]
        JRST FOO
Listen to the square waves from the low bits of 0.

ITEM 169 (in order of one-ups-manship: Gosper, Mann, Lenard, [Root and Mann]):

To count the ones in a PDP-6/10 word:
        LDB B,[014300,,A]      ;or MOVE B,A then LSH B,-1
        AND B,[333333,,333333]
        SUB A,B
        LSH B,-1
        AND B,[333333,,333333]
        SUBB A,B               ;each octal digit is replaced by number of 1's in it
        LSH B,-3
        ADD A,B
        AND A,[070707,,070707]
        IDIVI A,77             ;casting out 63.'s
These ten instructions, with constants extended, would work on word lengths up to 62.; eleven suffice up to 254..

ITEM 170 (Jensen):

Useful strings of non-digits and zeros can arise when carefully chosen negative numbers are fed to unsuspecting decimal print routines. Different sets arise from different methods of character-to-digit conversion. Example (Gosper):
DPT:    IDIVI F,12
        HRLM G,(P)      ;tuck remainder on pushdown list
        SKIPE F
        PUSHJ P,DPT
        LDB G,[220600,,(P)]   ;retrieve low 6 bits of remainder
        TRCE G,"0       ;convert digit to character
        SETOM CCT       ;that was no digit!

TYO:    .IOT TYOCHN,G   ;or DATA0 or IDPB ...
        AOS G,CCT
        POPJ P,
This is the standard recursive decimal print of the positive number in F, but with a LDB instead of a HLRZ. It falls into the typeout routine which returns in G the number of characters since the last carriage return. When called with a -36., DPT types carriage return, line feed, and resets CCT, the character position counter.

ITEM 171 (Gosper):

Since integer division can never produce a larger quotient than dividend, doubling the dividend and divisor beforehand will distinguish division by zero from division by 1 or anything else, in situations where division by zero does nothing.

ITEM 172 (Gosper):

The fundamental operation for building list structure, called CONS, is defined to: find a free cell in memory, store the argument in it, remove it from the set of free cells, return a pointer to it, and call the garbage collector when the set is empty. This can be done in two instructions:
CONS:   EXCH A,[EXCH A,[...[PUSHJ P,GC]...]]
        EXCH A,CONS
Of course, the address-linked chain of EXCH's indicated by the nested brackets is concocted by the garbage collector. This method has the additional advantage of not constraining an accumulator for the free storage pointer.
UNCONS: HRLI A,(EXCH A,)
        EXCH A,CONS
        EXCH A,@CONS
Returns cell addressed by A to free storage list; returns former cell contents in A.

ITEM 173 (Gosper):

The incantation to fix a floating number is usually
        MULI A,400            ;exponent to A, fraction to A+1
        TSC A,A               ;1's complement magnitude of excess 200 exponent
        ASH A+1,-200-27.-8(A) ;answer in A+1
If number is known positive, you can omit the TSC.

On the PDP-10

        UFA A,[+-233000,,]    ;not in PDP-6 repertoire
        TLC A+1,233000        ;if those bits really bother you
When you know the sign of A, and |A| < 2^26, you can
        FAD A,[+-233400,,]    ;or FADR for rounded fix!
        TLC A,233400          ;if those bits are relevant
where the sign of the constant must match A's. This works on both machines and doesn't involve A+1. On the 10, FADRI saves a cycle and a constant, and rounds.

ITEM 174 (Gosper, Nelson):

21963283741. = 243507216435 is a fixed point of the float function on the PDP-6/10, i.e., it is the only positive number whose floating point representation equals its fixed.

ITEM 175 (Gosper):

To get the next higher number (in A) with the same number of 1 bits: (A, B, C, D do not have to be consecutive)
        MOVE B,A
        MOVN C,B
        AND C,B
        ADD A,C
        MOVE D,A
        XOR D,B
        LSH D,-2
        IDIVM D,C
        IOR A,C

Previous Up Next