; dividend is in BIGNUM, little-endian ; divisor is in accu sta DIVISOR lda #0 ldx #BIGNUMBYTES longdivmod ldy #8 asl BIGNUM - 1,x - rol cmp DIVISOR bcc + sbc DIVISOR + rol BIGNUM - 1,x dey bne - dex bne longdivmod ; quotient is in BIGNUM, little-endian ; remainder is in accu
; div ; input: ; - n-bytes dividend, little-endian ; - 8-bit divisor ; output: ; - n-bytes result stored in dividend ; - AC: remainder ; - XR: 0 ; - YR: 0 DIV_IN_BYTES = 20 div ldy #DIV_IN_BYTES * 8 lda #0 - clc ldx #-DIV_IN_BYTES & $ff - rol dividend + DIV_IN_BYTES - $100,x inx bmi - rol cmp divisor bcc + sbc divisor inc dividend + dey bne -- rts divisor .byte $00 dividend .byte $00, $00, $00, $00, $00, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00 .byte $00, $00, $00, $00
Using both cmp as well as sbc seems subject to possible optimization.
cmp DIVISOR bcc + sbc DIVISOR +
tax sbx #DIVISOR bcc + txa +
ldx #BIGNUMBYTES
ldx #BIGNUMBYTES + 1 - ldy BIGNUM - 1,x bne longdivmod dex bne -
Some correction on your routine
ldx #2; BIGNUMBYTES + 1 - ldy BIGNUM - 1,x; BIGNUM + 1, BIGNUM + 0 bne longdivmod dex bne -
To speed up the algorithm, I think it is best to skip zeros before going into the loop in case the number of bytes is always fixed and the value is low.
As i'm continuously extracting values, dividend and quotient are getting smaller and smaller. So the size of the byte array can be decreased whenever a most significant byte becomes zero. This should neatly halve overall execution time.