| |
Krill
Registered: Apr 2002 Posts: 2854 |
Stacking multicolour layers in assembly
Consider 3 single-coloured multicolour layers, such that, e.g.,
00 or 01 - layer 1
00 or 10 - layer 2
00 or 11 - layer 3 (with 00 being background or transparent).
Now, how to merge them, rendering one over/on top of the other (no "glenz"-like colour blending, particular layer ordering isn't important as long as any kind of priority regime is preserved, and background/transparent may not be 00) using only binary arithmetic or other primitives, but no lookup tables?
With the above example, it's some kind of max operation on bitpairs, with something like
|00 01 10 11
--------------
00|00 01 10 11
01|01 01 10 11
10|10 10 10 11
11|11 11 11 11 but this doesn't seem to map very well to the 6502's operations. =) |
|
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
doesnt seem to make sense to do this "brute force"
table or animate or somehow cheat it.
2 layers can only merge in 256 combinations and 16 more for the final one.
ldx table_4pixels_from_layer1_4pixels_from_layer2
lda table_4pixelsfromlayer3,x
sta
if you dont use all possible combinations of pixels then it can be cheated into a single 8 bit table 3+3+2bits for example. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
If it's not possible, i'd like to see some kind of elegant formal proof in a few sentences. =) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: If it's not possible, i'd like to see some kind of elegant formal proof in a few sentences. =)
first please notice you didnt say what exactly you are looking for. if its turing complete anything is possible.
edit: oh god damn, okay now I see it my bad.
edit2: seems like a good candidate for xy problem |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldedit2: seems like a good candidate for xy problem It's formulated as an academic question in this thread, but the origin is..., well, just having but 2 index registers. =) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
That has to be possible with some bitfiddling...maybe perhaps :) Why no lookup table though? :) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Quoting Oswaldedit2: seems like a good candidate for xy problem It's formulated as an academic question in this thread, but the origin is..., well, just having but 2 index registers. =)
then it really IS an XY problem :D dont think you will find a nice way, cheat it, or go around it. but lets see what the experts have to say :) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZThat has to be possible with some bitfiddling...maybe perhaps :) Why no lookup table though? :) The question is precisely about that bitfiddling! :)
Quoting Oswalddont think you will find a nice way, cheat it, or go around it. but lets see what the experts have to say :) Yes, that's what i thought when i created this thread. =)
(XY problem is irrelevant in this context.) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
<i misclicked something> |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting KrillQuoting chatGPZThat has to be possible with some bitfiddling...maybe perhaps :) Why no lookup table though? :) The question is precisely about that bitfiddling! :) And no tables, yeah, i hate swapping registers in and out in tight unrolled inner loops. Maybe the bitfiddling solution, if it exists, is surprisingly terse and elegant? Who knows! |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
It has to be bytes composed of 4 2bit pairs, right? |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZIt has to be bytes composed of 4 2bit pairs, right? Yes. :) Ready to be displayed by VIC.
(But if you have something on your mind that works but ignores this constraint, go ahead any say. =D) |
| |
Martin Piper
Registered: Nov 2007 Posts: 645 |
1) Large 256x256 table in cartridge. Do this lookup once per byte.
2) Isolate the two colour bits, use a 256x4x4 byte table (based on four colours and four shifted pixel positions), do this 4 times for each byte. Unroll for each pixel position optimises the table usage.
3) Use tiny tables and small code, but loop each pixel like below:
.sprColMaskTab
!by %00000011
!by %00001100
!by %00110000
!by %11000000
; Merge from: SpriteWorkingByte
; Into: SpriteFinalByte
; Only when the final pixel is clear.
MergeTwoColourBytes
ldy #3
.l2
; Front to back ordering
lda SpriteFinalByte
and .sprColMaskTab,y
bne .no0
; Final, destination, is empty so merge in the working source pixel
lda SpriteWorkingByte
and .sprColMaskTab,y
ora SpriteFinalByte
sta SpriteFinalByte
.no0
dey
bpl .l2
4) Unroll the above, remove index registers for the sprite mask lookups use constants and immediate mode, add index registers for the sprite/char/bitmap data access.
5) Use bit and keep the current pixel byte in A to avoid loading it back again. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
what about this
; 00 or 01 - layer 1
; 00 or 10 - layer 2
; 00 or 11 - layer 3 (with 00 being background or transparent).
;
; dst = ((layer1 | layer2) & (~(layer2 >> 1))) | layer3
lda layer2
lsr
eor #$ff
sta tmp
lda layer1
ora layer2
and tmp
ora layer3
sta dst
(There is probably a clever way to make this faster, with illegals perhaps) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
... and if you allow selfmod and one page table this becomes
lda layer2
sta lut_selfmod
ora layer1
lut_selfmod = *+1
and lut ; (n >> 1) ^ 0xff
ora layer3
sta dst
(suggested by Noobtracker)
edit: another suggestion by Noobtracker - that table can be located in the zeropage ($00 needs to be $ff, $01 is not used. Many other locations are unused too, so still possible to use zp variables)
Now i expect some cool routine using this from you :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
... and if you can afford to use X:
lax layer2
ora layer1
and lut,x ; (n >> 1) ^ 0xff
ora layer3
sta dst
... :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1381 |
My ten minute take, without having read the replies.
layer 3 trivially ORs over the others, so the tricky part is layering layer 2 over layer 1
We need high bits on l2 to mask low bits on l1, so a trivial implementation is
lda l2
lsr
eor#$55 ; low bits are only set if high bits were clear
and l1 ; bring in l1, but only if l2 was transparent
ora l2
ora l3
If we can change the inputs to
01 or 10 - layer 1
00 or 11 - layer 2
00 or 11 - layer 3 (with 00 being background or transparent).
and ensure carry is set on entry, then this is slightly shorter, and also preserves carry
lda l1
ora l2
sbc#$55
ora l2
ora l3
|
| |
ChristopherJam
Registered: Aug 2004 Posts: 1381 |
Quoting Krillusing only binary arithmetic or other primitives, but no lookup tables?
^^ (emphasis mine) |
| |
CyberBrain Administrator
Posts: 392 |
Very cool solutions!
Here is a solution if you don't want to use a lookup table or need the X-reg for something else, *BUT* it is allowed to change the bit-patterns that are used in layer1 and layer2 (the result will still have the correct patterns).
Then you could combine layer 1 and 2 with ADD and AND, in the same number of cycles as Groepaz's, something like this: (i hope. i'm tired)
If we use these bitpattern replacements for layer 1 and layer 2:
layer1_00 = %01 // <- in layer 1: instead of %00 we use %01
layer2_00 = %01 // <- in layer 2: --||--
layer1_01 = %00 // <- in layer 1: instead of %01 we use %00
layer2_10 = %10 // <- in layer 2: we still use %10 as %10 :'(
Then we can do: (Note that the bitpairs can never overflow, since we at most add 01 and 10)
dst = ((layer1 + layer2) & layer2) | layer3
Why this stupid crap? Because if we try it for all possible (legal) bitpair-values we get the right result:
(layer2_00 + layer1_00) & layer2_00 = (01 + 01) & 01 = 10 & 01 = 00
(layer2_00 + layer1_01) & layer2_00 = (01 + 00) & 01 = 01 & 01 = 01
(layer2_10 + layer1_00) & layer2_10 = (10 + 01) & 10 = 11 & 10 = 10
(layer2_10 + layer1_01) & layer2_10 = (10 + 00) & 10 = 10 & 10 = 10
Code:
lda layer1
adc layer2 // assume C=0. C=0 afterwards always, since no overflow
and layer2
ora layer3
sta dst
|
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Very nice!
Ignoring all the approaches using tables as per OP premise (thank you, CJam), CyberBrain's solution seems to be the fastest so far. ; fg bg
4 lda layer1; 00:01 00 00 01 01 <- will be bg 00:01 fg
4 adc layer2; 10:01 01 10 01 10 <- will be bg 00:10 fg
; 01 10 10 11
4 and layer2; 01 10 01 10
; 01 10 00 10
4 ora layer3
16 Can it get any more concise and elegant? :) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
one could just brute force check all logical / add commands and order of layer loads if any of the combination works :) fantastic solution, never thought it can come down to 4 instructions. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldfantastic solution, never thought it can come down to 4 instructions. Yes!
Only way to speed it up would be to somehow get rid of the second access to layer2.
Either completely or by replacing it with some immediate operation. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
how about:
lax layer1
axs #$ff-layer2 ;X:=A&X-#{imm}
lda table_+layer3,x
edit: ok final step not gonna fly, but food for thoughts
edit2:
lax layer1
axs #$ff-layer2 ;X:=A&X-#{imm}
txa
ora layer3 |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
No tables and no index registers, please. =) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: No tables and no index registers, please. =)
|
| |
Martin Piper
Registered: Nov 2007 Posts: 645 |
Hmm, are you sure you want to process one pixel, across three layers, at a time? Not process the whole byte and utilise optimisations processing 4 pixels in one go? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1381 |
Yes, excellent work CyberBrain!
lol @ Oswald
Martin - my and CyberBrain's solutions do process the whole byte and generate 4 pixels in one go. |
| |
CyberBrain Administrator
Posts: 392 |
Thx - same to you! My solution was based on Groepaz's/Noobtrackers very nice solutions and Groepaz's excellent breakdown of the problem, and those solutions also process whole bytes (4 pixels/bitpairs) at a time. Fun little riddle, btw - i'm sure it has no practical use whatsoever and is just a little brain teaser? :) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
I can imagine 3 layers additive like in many miggy vector fx, can be static scrolling texture, or even scroller.. and as Gunnar wants registers free maybe 3 zoomscrollers ? :P :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Ha! Cool stuff. Cjam and Cyberbrain kinda picked up where NT and me stopped, because tired :)
Now i really want to see what you make with it, Krill :=) |
| |
Martin Piper
Registered: Nov 2007 Posts: 645 |
CyberBrain...
lda layer1
adc layer2 ; Assume C = 0
and layer2
If:
layer1 = 01
layer2 = 00 (transparent)
layer3 = 00 transparent
Doesn't that produce 0, which forgets that layer 1 already has a colour? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Quote:I can imagine 3 layers additive like in many miggy vector fx, can be static scrolling texture, or even scroller.. and as Gunnar wants registers free maybe 3 zoomscrollers ? :P :)
Layered chessboard zoomers - but with Z rotator!
GOGOGO! :D |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZQuote:I can imagine 3 layers additive like in many miggy vector fx, can be static scrolling texture, or even scroller.. and as Gunnar wants registers free maybe 3 zoomscrollers ? :P :)
Layered chessboard zoomers - but with Z rotator!
GOGOGO! :D That was EXACTLY what i had in mind before i stumbled over this problem. :)
(But it was a sidetrack so i just asked the experts rather than spending too much time on this one.) |
| |
The Syndrom
Registered: Aug 2005 Posts: 56 |
Quote: CyberBrain...
lda layer1
adc layer2 ; Assume C = 0
and layer2
If:
layer1 = 01
layer2 = 00 (transparent)
layer3 = 00 transparent
Doesn't that produce 0, which forgets that layer 1 already has a colour?
@martin piper
you probably overlooked the twisted input bits of layer 1&2:
>If we use these bitpattern replacements for layer 1 and layer 2:
>layer1_00 = %01 // <- in layer 1: instead of %00 we use %01
>layer2_00 = %01 // <- in layer 2: --||--
>layer1_01 = %00 // <- in layer 1: instead of %01 we use %00
>layer2_10 = %10 // <- in layer 2: we still use %10 as %10 :'( |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Quote:Can it get any more concise and elegant? :)
Noobtracker to the rescue :)
lda layer1 ; 10/11 (becomes 00/01)
and layer2 ; 01/10 (becomes 00/10)
ora layer3 ; 00/11
|
| |
ChristopherJam
Registered: Aug 2004 Posts: 1381 |
Gorgeous! |
| |
CyberBrain Administrator
Posts: 392 |
Damn, that's cool! And elegant! This is one of those things, that makes you go "why th did i not see that?!" after you see it :) Well done, the quest has been solved! |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
lda layer1 ; 10/11 (becomes 00/01)
and layer2 ; 01/10 (becomes 00/10)
ora layer3 ; 00/11
I dont see it layer1 and layer2 will make everything 0 where there is a 0, while what is needed a kind of max function per bitpair. and how can an lda perform an operation ?
edit: unless it is using the bitpair scramble offered |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Now that Y is solved, i want to see X. AT X :D |
| |
JackAsser
Registered: Jun 2002 Posts: |
Quote: lda layer1 ; 10/11 (becomes 00/01)
and layer2 ; 01/10 (becomes 00/10)
ora layer3 ; 00/11
I dont see it layer1 and layer2 will make everything 0 where there is a 0, while what is needed a kind of max function per bitpair. and how can an lda perform an operation ?
edit: unless it is using the bitpair scramble offered
This works fine Oswald, very neat solution which I definitely will use at some point.
L3 L2 L1 (L1&L2|L3)
00 01 10 00
00 01 11 01
00 10 10 10
00 10 11 10
11 01 10 11
11 01 11 11
11 10 10 11
11 10 11 11
@Oswald: In layer 1 on=10, off=11, in layer 2 on=01, off=10, in layer 3 on=11, off=00 |
| |
Copyfault
Registered: Dec 2001 Posts: 467 |
First of all: nice question raised by Krill, and what a brilliant solution found by Noobtracker. I'd opt for granting him access to csdb, did never really understand why he got banned...
In order to unconfuse Oswald, the meaning of the bitpairs should be put right in your explanation, JA.
In Layer 1, a set (or foreground) pixel is encoded by %11, while a transparent one is encoded by %10.
For Layer 2, it will be %10 for a set pixel, while %01 does the job for a transparent one here.
Finally, Layer 3 is just like described: %11 is the bitpair for a set pixel, %00 for a transparent pixel.
The table given by JA is correct. It shows that Noobtracker's golden opcode trio always gives bitpairs that can be used to set the pixels in the standard encoding.
Now let's try to shorten it even more *evilgrinplusshallowlaughter*
CF |
| |
JackAsser
Registered: Jun 2002 Posts: |
Quote: First of all: nice question raised by Krill, and what a brilliant solution found by Noobtracker. I'd opt for granting him access to csdb, did never really understand why he got banned...
In order to unconfuse Oswald, the meaning of the bitpairs should be put right in your explanation, JA.
In Layer 1, a set (or foreground) pixel is encoded by %11, while a transparent one is encoded by %10.
For Layer 2, it will be %10 for a set pixel, while %01 does the job for a transparent one here.
Finally, Layer 3 is just like described: %11 is the bitpair for a set pixel, %00 for a transparent pixel.
The table given by JA is correct. It shows that Noobtracker's golden opcode trio always gives bitpairs that can be used to set the pixels in the standard encoding.
Now let's try to shorten it even more *evilgrinplusshallowlaughter*
CF
Oops! Sorry for the typo! |
| |
HCL
Registered: Feb 2003 Posts: 717 |
Are you guys trying to do what Graham did in Dawnfall (1995).. ;) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
that just shows how brilliant graham was 1995 goddamn |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting HCLAre you guys trying to do what Graham did in Dawnfall (1995).. ;) No.
Afaict, Dawnfall had 2 layers, either with one of them having 2 colours (with what appears to be some temporal blur from the previous frame), or both of them blended together.
Unless i missed something, there aren't 3 independent solid stacked layers. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZQuote:Can it get any more concise and elegant? :)
Noobtracker to the rescue :)
lda layer1 ; 10/11 (becomes 00/01)
and layer2 ; 01/10 (becomes 00/10)
ora layer3 ; 00/11
Brilliant! \=D/
Wonder if there are more suitable combinations than these bitpairs... |
| |
Martin Piper
Registered: Nov 2007 Posts: 645 |
Applying a bit twiddle before the logical operations is rather similar to how hardware design makes some logical operations simpler, use fewer gates or use gates of a particular type, by introducing not gates before or after.
Interesting.
If the layer values were coming from three routines that calculate bytes at a time for each layer, this would be quite quick for a nice effect. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Quote:Wonder if there are more suitable combinations than these bitpairs...
The most obvious thing would be to invert all values, and swap AND with OR (that always works) :)
lda layer1 ; 01/00 (becomes 11/10)
ora layer2 ; 10/01 (becomes 11/01)
and layer3 ; 11/00 (becomes 11/00)
(this opens the door for storing with fixed layer3 in X and SAX) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
and noobtracker was busy, so for the records:
; bg l1 l2 l3
; 00 < 01 < 10 < 11
;bg/fg
lda layer1 ;10/11
and layer2 ;01/10
ora layer3 ;00/11
; bg l1 l2 l3
; 00 < 10 < 01 < 11
;bg/fg
lda layer1 ;01/11
and layer2 ;10/01
ora layer3 ;00/11
; bg l1 l2 l3
; 01 < 00 < 10 < 11
;bg/fg
lda layer1 ;11/10
and layer2 ;01/10
ora layer3 ;00/11
; bg l1 l2 l3
; 10 < 00 < 01 < 11
;bg/fg
lda layer1 ;11/01
and layer2 ;10/01
ora layer3 ;00/11
; bg l1 l2 l3
; 01 < 10 < 00 < 11
;bg/fg
lda layer1 ;01/10
and layer2 ;11/00
ora layer3 ;00/11
; bg l1 l2 l3
; 10 < 01 < 00 < 11
;bg/fg
lda layer1 ;10/01
and layer2 ;11/00
ora layer3 ;00/11
(plus all the inverted versions, as said above) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Would be cool if the bottom layers 1 and 2 could be EORed.. =) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Are you crowdsourcing your coding now? :D |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZAre you crowdsourcing your coding now? :D Still beats AI when it comes to coding! :)
But seriously, was more like "bummer that it won't work with EOR".
(No formal proof but strong guts feeling.) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Please define what exactly you want it to do (making the second op an OR and then just use EOR is trivial...) |
| |
Mixer
Registered: Apr 2008 Posts: 422 |
This reminds me of the eor fill. |
| |
JackAsser
Registered: Jun 2002 Posts: |
Just had to check how I did my 4-layer chesszoomer in Super Larsson Bros back in 2008. Totally forgot how I did it and only remembered that I used 3 layers in the chars (Stacking MC layers) and one in sprites.
In that code I can only scale down to char-sized checkers and I only move in MC resolution, hence a checker-char for one layer can be one of 8 different chars. I use 8 512-byte tables to figure out the final pixel-values indexed by A+(B<<3)+(C<<6) which combines into ((A&~B)&0x55) | (B&0xaa) | C. I have two sets of these 8 tables, one for opaque rendering and one for additive blending. It's 8 tables because of 8 combinations of odd/even checkers for each of the three layers. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting JackAsser... tables to figure out ... You lost me there. :) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
"a checker-char for one layer can be one of 8 different chars."
I am lost already here. chars? |
| |
JackAsser
Registered: Jun 2002 Posts: |
Quote: "a checker-char for one layer can be one of 8 different chars."
I am lost already here. chars?
<Off-topic since this is a table based approach>
Checker-char, as the possible values for a byte in a scaled checker board line.
0: 11111111
1: 11111100
2: 11110000
3: 11000000
4: 00000000
5: 00000011
6: 00001111
7: 00111111
If you only scale down do char-size checkers with a motion in x of 2 pixels you have only these 8 combinations.
Three of those layers yields 8*8*8 = 512 combinations to stack them, or blend, or whatever operation you wanna do in those tables.
However, using the lda/and/or trick here is probably faster but requires different scalers to produce the correct bitpairs for each of the layers. |
| |
WVL
Registered: Mar 2002 Posts: 886 |
But dear Jackasser, with those same chars you can zoom down to 6 pixel wide chars. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
can we stop calling it chars ? maybe a stride ? |
| |
JackAsser
Registered: Jun 2002 Posts: |
Quote: But dear Jackasser, with those same chars you can zoom down to 6 pixel wide chars.
Yes yes I know but didn't bother to update the text. :) |
| |
JackAsser
Registered: Jun 2002 Posts: |
Quote: can we stop calling it chars ? maybe a stride ?
Technically they ARE chars, but only line 7 visible. :P |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Technically they ARE chars, but only line 7 visible. :P
technically they are bytes in a table, which you read to update the "chars". |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Jackasser might be referring to FPP though :) |
| |
HCL
Registered: Feb 2003 Posts: 717 |
Again.. Dawnfall surely has 3 independent layers.. I sneaked into the code but haven't quite figured out how it is done.. except that he is *not* drawing lines and eor-filling it :).. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
I find it hard to believe you dont know how it works, esp since Jackasser (your teammate) has released various effects showcasing the same tech and in my thinking this is common knowledge amongst the top coders :) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting HCLAgain.. Dawnfall surely has 3 independent layers.. If it has, why is there not a single effect that makes it very clear? :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
One variant of the rotzoomer has the bars rotating "over" each other (requiring layers) and not just "temporal blur". |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZOne variant of the rotzoomer has the bars rotating "over" each other (requiring layers) and not just "temporal blur". Have you checked the code? Layers, yes, but only two. Two of the three bars are rather strongly tied together, thus not independent. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Na, can't be bothered :)
I expect you to implement it for X though :=) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting chatGPZI expect you to implement it for X though :=) It's on my TODO list, but not placed very prominently. :) |
| |
HCL
Registered: Feb 2003 Posts: 717 |
Oh @Oswald, you're doing that trick on me again.. Ok, i will check the code again, and understand it, and then i will tell you exactly how it is done :D |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting HCLOh @Oswald, you're doing that trick on me again.. Ok, i will check the code again, and understand it, and then i will tell you exactly how it is done :D I may still have the "source" to E2E4K somewhere. =) |
| |
HCL
Registered: Feb 2003 Posts: 717 |
So, i've checked the code in Dawnfall again, and it turns out Krill was right!! First there is precalced "graphics" for different slopes, seems to be two pages ($200 byte) for each.. Perhaps 16 different versions of it for different zoom-factors i would guess..
Then the actual copying of gfx is different for all five versions of the effect, but *none* of them calculate more than two layers of "gfx" per iteration, and many of them reuse gfx from the last iteration.. like this:
First version (in hires):
lda gfx,x
sta ->
..
lda gfx,x
eor # <-
sta VisualBuffer,y
Second version (multicolor):
lda VisualBuffer,y
asr #$aa ; <- Effectively clears color 1 and then turns color 2 into color 1
ora gfx,x
sta VisualBuffer,y
Third version:
ldx VisualBuffer,y
lda TransferTable,x ; <- turns colors [0,1,2,3] into [0,0,1,2]
tsx
ora gfx,x
sta VisualBuffer,y
Fourth version:
ldx VisualBuffer+$780,y
lda TransferAndMirrorTable,x ; <- turns colors [0,1,2,3] into [0,0,1,2] and mirrors the byte
tsx
ora gfx,x
sta VisualBuffer,y
Fifth version:
..just like First version but multicolor.. and two versions of the gfx for color 1 and 2.
So.. sorry for interrupting this thread with something that was unrelated. Funny that i didn't figure this out earlier since that demo is almost 30 years old :P. But still, with the knowledge from this thread, we can now do it better with three independent layers!! |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
lda gfx,x
sta ->
..
lda gfx,x
eor # <-
sta VisualBuffer,y
this is done because no 3 index registers the two lda's need different offset, and also graham is trading speed for memory.
seeing the amount cycles wasted on this it should be no problem doing 3 or even 4 layers. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldseeing the amount cycles wasted on this it should be no problem doing 3 or even 4 layers. Every additional layer adds considerable cost, eating into the framerate.
It's basically just another 4.change cycles per layer and output byte, but that's the per-output-byte hot path. =)
Edit: And how would you render a 4th layer? ANDing out brick pixels again? Use some kind of dithering? |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
sure I'm not saying extra layer has no cost, I think doing it without much slowdown is possible.
lda layer1,x
ora layer2,y
sta temp
lda temp
ora layer3,x
sta visual,y
+4 cycles, less than 1/3 of a frame, most c64 sceners wouldnt notice such slowdown on a ~ 25 fps effekt, which is where dawnfall chessrot is in its fastest form. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldsure I'm not saying extra layer has no cost, I think doing it without much slowdown is possible. The original single hires checkerboard effect uses 50% CPU on each of the two stripe layers approx., so a third one would make it go from., e.g., 25 FPS to 16 FPS. Quite noticeable. :)
And for 3 stacked checkerboards you can expect a third of the original speed, around 8 FPS.
The question about the 4th layer was how you'd render it, given that the 3 colours plus background are already taken. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Quoting Oswaldsure I'm not saying extra layer has no cost, I think doing it without much slowdown is possible. The original single hires checkerboard effect uses 50% CPU on each of the two stripe layers approx., so a third one would make it go from., e.g., 25 FPS to 16 FPS. Quite noticeable. :)
And for 3 stacked checkerboards you can expect a third of the original speed, around 8 FPS.
The question about the 4th layer was how you'd render it, given that the 3 colours plus background are already taken.
ok you win, so it would be horribly slow, so for god's sake please nobody code it.
already with 3 layers it doesnt really add much visually, 4th can still eor ora or whatever despitve the screen having just 2 bit depth. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldso it would be horribly slow, so for god's sake please nobody code it. Doesn't need to be in the same size and resolution as Dawnfall did, does it? :)
And 3 stacked rotating zooming checkerboards do look quite good on other platforms (where speed isn't much of an issue). |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Quoting Oswaldso it would be horribly slow, so for god's sake please nobody code it. Doesn't need to be in the same size and resolution as Dawnfall did, does it? :)
And 3 stacked rotating zooming checkerboards do look quite good on other platforms (where speed isn't much of an issue).
you mean flying through 3 level deep rotating chessboards? 3 chessboards means 6 layers, doesnt feel its gonna fly not even in 4x4. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
Quote:3 chessboards means 6 layers
? |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Quote:3 chessboards means 6 layers
?
dawnfall chessboard is made of 2 rotating stripe layers, which are perpendicular to eachother. you need 6 load operations to make 3 chessboards this way. Krill called 1 stripe layer a layer :) |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswalddawnfall chessboard is made of 2 rotating stripe layers, which are perpendicular to eachother. you need 6 load operations to make 3 chessboards this way. Krill called 1 stripe layer a layer :) I differentiate between "stripe layers" (half checkerboards) and "pixel layers" (full checkerboards), admittedly somewhat confusingly.
Anyways, in 4x4 there'd be no speed problem, and with a Dawnfall-like 16x16 multicolour tiles square, there are also techniques to make it somewhat smoother despite a low overall framerate. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
"3 stacked rotating zooming checkerboards "
so what does this mean? 3 chessboard or 3 stripes ?
looks good on other system? you mean smth like 2nd reality? could emulate amiga bitplane motion "blur", yeah that would look ace, but needs high fps.
4x4 is also $0800 bytes fullscreen like a 16x16 char matrix. |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswald"3 stacked rotating zooming checkerboards "
so what does this mean? 3 chessboard or 3 stripes ? Which part of "checker""board" do you not understand? Forget the silly stripes for once, okay? :)
Quoting Oswaldlooks good on other system? you mean smth like 2nd reality? could emulate amiga bitplane motion "blur", yeah that would look ace, but needs high fps. Like the classic 2.5-D flight through the holes of checkerboards, but with added rotation about the depth axis.
Quoting Oswald4x4 is also $0800 bytes fullscreen like a 16x16 char matrix. $03e8 = 1000 bytes for 3+1 colours. |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
and which part of this question you did not understand?
"you mean flying through 3 level deep rotating chessboards?"
because answering "you are using confusingly stripes and chessboards" is not an answer to this yes/no question.
frankly its totally pointless and tiresome to conversate with you, you are just looking for argumentative victory points, instead of exchanging information.
I must admit Its a thing I am guilty of myself aswell, maybe I am just looking at a mirror here. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
lol |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting Oswaldfrankly its totally pointless and tiresome to conversate with you, you are just looking for argumentative victory points, instead of exchanging information. You might confuse me with somebody else, and i wasn't aware it's a competition, even after you brought up "you win" a couple of posts above. :)
Anyways, as for some information (or hints thereof): Some back-of-the-envelope calculations i made a while ago seem to indicate (if i interpret them right) that it's quite possible to get a decent frame-rate, by rolling out quite a bit more code and data than Graham could afford in a one-filer, and some eye-fooling partial update techniques. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11145 |
That only increases my expectations to see this implemented in your 4k for X :o) |
| |
Oswald
Registered: Apr 2002 Posts: 5026 |
Quote: Quoting Oswaldfrankly its totally pointless and tiresome to conversate with you, you are just looking for argumentative victory points, instead of exchanging information. You might confuse me with somebody else, and i wasn't aware it's a competition, even after you brought up "you win" a couple of posts above. :)
Anyways, as for some information (or hints thereof): Some back-of-the-envelope calculations i made a while ago seem to indicate (if i interpret them right) that it's quite possible to get a decent frame-rate, by rolling out quite a bit more code and data than Graham could afford in a one-filer, and some eye-fooling partial update techniques.
I also noticced in other threads you switched into this mode, you are only looking for your argument victory points and not for a meaningful conversation.
And for those points you are right no matter what.
Just look at our last dozen posts, I proposed the doability of 3 layers you say that would be too slow, but your 6 layers, now thats perfectly doable.
LOL |
| |
Krill
Registered: Apr 2002 Posts: 2854 |
Quoting OswaldI also noticced in other threads you switched into this mode, you are only looking for your argument victory points and not for a meaningful conversation. Please message me next time you notice this (and keep the fuss out of public threads), thanks. |