[Bug-openmcl] Fwd: ppc64 %simple-bit-boole (fwd)
Gary Byers
gb at clozure.com
Sun Jul 31 20:04:48 MDT 2005
>> From: bryan o'connor <bryan-openmcl at lunch.org>
>> Date: July 31, 2005 4:34:55 PM PDT
>> To: bug-openmcl at clozure.com
>> Subject: ppc64 %simple-bit-boole
>>
>>
>> i think the ppc64 version of %simple-bit-boole is skipping
>> the last 4 bits. i don't know enough ppc assembly to fix it.
>>
>> ; ppc64
>> ? (CCL::%SIMPLE-BIT-BOOLE 6
>> #*1111110001011011100110000101000101101011010111101011111000110010111
>> #*1111110101110010100011000111001001000010011111111010001000000011111
>> #*0000000000000000000000000000000000000000000000000000000000000000000)
>> #*1111110001010010100010000101000001000010010111101010001000000010000
>> ^^^^
>> ; ppc32
>> (CCL::%SIMPLE-BIT-BOOLE 6
>> #*1111110001011011100110000101000101101011010111101011111000110010111
>> #*1111110101110010100011000111001001000010011111111010001000000011111
>> #*0000000000000000000000000000000000000000000000000000000000000000000)
>> #*1111110001010010100010000101000001000010010111101010001000000010111
>> ^^^^
>
I -think- (wouldn't hurt to test this further) that there was another
reason that the last few bits were set incorrectly. Note that the length
of the bit vectors in the test case is 67 (e.g., 3 more bits than will
fit in a 64-bit word.)
Here's an annotated version of my proposed fix; this might be of interest
to anyone who wants to learn a bit of PPC assembly (and it gives me a
chance to think out loud and see if the arithmetic is correct.)
I think that it is.
#+ppc64-target
(defppclapfunction %simple-bit-boole ((op 0) (b1 arg_x) (b2 arg_y) (result arg_z))
;; This function takes 4 arguments; the first ("op", a fixnum in the
;; range 0-15) was pushed on the value stack, and the last three
;; arguments were passed in the arg_x, arg_y, and arg_z registers.
;; Load the address (+ 8 vsp) into imm0; this is the caller's vsp
;; before any arguments (1) were vpushed.
;; (LA DEST DISPLACEMENT SRC) is entirely equivalent to
;; (ADDI DEST SRC DISPLACEMENT); the former may make it clearer
;; that some form of address arithmetic is going on.
(la imm0 8 vsp)
;; A LAP macro: SAVE-LISP-CONTEXT builds a frame on the control stack
;; and stores the link register value (return address), calling
;; function object (FN register), and caller's VSP in that frame.
;; It also has the effect of setting the FN to the NFN register;
;; the FN register can be used to access lisp constants, though
;; this fuction doesn't do that.
(save-lisp-context imm0)
;; Get the size (unboxed element count) of the RESULT bit vector
;; into the IMM4 register; use IMM4 as a temporary.
;; In Bryan's test case, IMM4 should contain 67 decimal.
;; Note that this function should only have been called if
;; all 3 bit vectors are exactly the same size.
(vector-size imm4 result imm4)
;; Set the IMM3 register to the result of (logically) shifting
;; IMM4 (the element count) right 6 bits; this is basically
;; equivalent to settinng IMM3 to (FLOOR IMM4 64), i.e., it
;; computes the number of full 64-bit words in the bit vector(s).
;; As a side-effect (indicated by the "." at the end of the
;; "srdi." mnemonic), set the flags in condition register 0
;; based on comparing the result to 0. The PPC effectively
;; has 8 4-bit condition registers (cr0-cr7), each of which
;; has a "Less than", "Equal", "Greater than", and "Sticky
;; Overflow" bit.
(srdi. imm3 imm4 6)
;; Set IMM4 to the result of clearing the leftmost 58 bits
;; in IMM4. This is equivalent to setting IMM4 to
;; (LOGAND IMM4 63), or (MOD IMM4 64); it yields the number
;; of bits in the last (partial) 64-bit word in the bit vector
;; (3 in our 67-bit example.)
;; (ANDI. IMM4 IMM4 63) would also work, but there's no "ANDI"
;; instruction (without the "." suffix), and we want to preserve
;; the value in cr0.
(clrldi imm4 imm4 (- 64 6))
;; Get the address of the @dispatch label into the link
;; register.
(bl @get-dispatch)
;; Set condition register CR1 to the result of comparing imm4
;; (the number of bits in the last partial word). That number
;; may be zero; we'll test the cr1[EQ] bit later.
;; Note that this was a "CMPWI", which would sign-extend the
;; low 32 bits of IMM4 and compare the result to 0. In this
;; case - where IMM4 has at most the 6 least significant bits
;; set - that wouldn't matter, but it's probably clearer to
;; use CMPDI instead of CMPWI in 64-bit code.
(cmpdi cr1 imm4 0)
;; Move the link register to the LOC-PC register. (The LOC-PC
;; register is the only general-purpose register than can contain
;; pointers -into- code vectors; the GC does extra work to handle
;; this.)
(mflr loc-pc)
;; Get the operation (op) into the temp0 register. OP is a fixnum:
;; a 61-bit integer in the upper 61 bits of a 64-bit value (the low
;; 3 bits are 0). We can add this value directly to the address
;; of the @dispatch label (now in LOC-PC); that'll give us the
;; address of a two-instruction subroutine that implements the
;; indicated boolean operation.
(ld temp0 op vsp)
(add loc-pc loc-pc temp0)
;; Store the address of the two-instruction subroutine in the
;; count register (ctr). On the PPC, only the CTR and the link
;; register can be branch or subroutine call targets.
(mtctr loc-pc)
;; Set IMM0 to the difference between where a tagged pointer
;; to a vector points and the address of the first byte of data
;; in that vector
(li imm0 ppc64::misc-data-offset)
;; Branch to the end of a loop which performs the boolean operation
;; on 64 bits at a time.
(b @testd)
@nextd
;; See if we're on the last iteration of the loop, and decrement
;; the loop counter
(cmpdi cr0 imm3 1)
(subi imm3 imm3 1)
;; Load imm1 from the b1 bit-vector and imm2 from the b2 bit-vector.
;; Imm0 is a biased (by ppc64::misc-data offset) index that allows
;; us to access each 64-bit word.
(ldx imm1 b1 imm0)
(ldx imm2 b2 imm0)
;; Call the subroutine: set IMM1 to (op IMM1 IMM2).
(bctrl)
;; Store operation result in the result bit vector.
(stdx imm1 result imm0)
;; Increment the biased index by the size (in bytes) of a 64-bit
;; word. THIS WAS INCREMENTING BY 4 INSTEAD OF 8; I think that that
;; (yet another copy-and-paste mishap) was the real bug here.
(addi imm0 imm0 8)
@testd
;; Maybe process another word
(bne cr0 @nextd)
;; We're done if there are no bits left over in the last partial
;; word.
(beq cr1 @done)
;; Not sure if we need to make this much fuss about the partial word
;; in this simple case, but what the hell.
;; It's pretty clear that we don't really need to make this much
;; fuss; any "extra" bits in the last word of a simple-bit-vector
;; should be zero. This code takes great care to preserve them
;; (which is probably better than setting them to non-zero values
;; as a side-effect of some BOOLE op - ), but a few instructions
;; could be safely saved here.
(ldx imm1 b1 imm0)
(ldx imm2 b2 imm0)
(bctrl)
(ldx imm2 result imm0)
(sld imm2 imm2 imm4)
(srd imm2 imm2 imm4)
(subfic imm4 imm4 64)
(srd imm1 imm1 imm4)
(sld imm1 imm1 imm4)
(or imm1 imm1 imm2)
(stdx imm1 result imm0)
@done
;; Load the LR, FN, and VSP from the top control stack frame and discard
;; that frame, then branch to the link register
(restore-full-lisp-context)
(blr)
;; BLRL (Branch to Link Register and Link) jumps to the value
;; in the LR and places the address of the next instruction in
;; the LR. When this is called (via BL), it'll return to the
;; caller with the address of @dispatch in the LR.
@get-dispatch
(blrl)
;; A table of 16 2-instruction subroutines, each of which sets
;; IMM1 to the result of (OP IMM1 IMM2) for the defined BOOLE
;; operations. Note that the 2 instructions are basically
;; "an ALU instruction followed by a Branch to Link Register (BLR)
;; instruction".
@disptach
(li imm1 0) ; boole-clr
(blr)
(li imm1 -1) ; boole-set
(blr)
(blr) ; boole-1
(blr)
(mr imm1 imm2) ; boole-2
(blr)
(not imm1 imm1) ; boole-c1
(blr)
(not imm1 imm2) ; boole-c2
(blr)
(and imm1 imm1 imm2) ; boole-and
(blr)
(or imm1 imm1 imm2) ; boole-ior
(blr)
(xor imm1 imm1 imm2) ; boole-xor
(blr)
(eqv imm1 imm1 imm2) ; boole-eqv
(blr)
(nand imm1 imm1 imm2) ; boole-nand
(blr)
(nor imm1 imm1 imm2) ; boole-nor
(blr)
(andc imm1 imm2 imm1) ; boole-andc1
(blr)
(andc imm1 imm1 imm2) ; boole-andc2
(blr)
(orc imm1 imm2 imm1) ; boole-orc1
(blr)
(orc imm1 imm1 imm2) ; boole-orc2
(blr))
More information about the Bug-openmcl
mailing list