[CSDb] - User Forums - FASTER 3D GRAPHICS

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > FASTER 3D GRAPHICS

2004-10-26 06:24

Stingray
Account closed

Registered: Feb 2003
Posts: 117

FASTER 3D GRAPHICS

I've heard it said before that the way the VIC chip addresses memory (8x8 cells) makes it slower fo rendering graphics because of the extra calculations needed. So what way would you have had the Commodore engineers design an alternative addressing mode so that 3D graphics could be calculated quicker? I would realy appreciate your ideas on this.

2004-10-26 11:17

Oswald

Registered: Apr 2002
Posts: 5020

the best would be a column organized screen :) the first 0-199 bytes in a row makes the first char column of the screen, 200-399 the 2nd column, etc

so the upper left byte's addy is 0, the next to it is 200, the next to it is 400, etc.

The way todays speedy linedrawers, plotters, eorfillers,etc works begs for a mode like that.

is this for C= one ? :) will it have a blitter please ? :)

2004-10-26 12:47

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Thanks for the response Oswald, and no this isn't for C=ONE. I asked because for my final semester project in the advanced diploma course I'm doing I have designed a simple graphics cct for an 8051 micro (not my choice, the project had to be related to 8051). I have been considering adding RAS and CAS refresh and adding Phi1 and Phi2 signals to it so it can be interfaced to a 64. (I've also got some crazy idea where the VIC chip would be mounted on my cct board, my cct could sync it self to the VIC and gate data to it for normal VIC operation giving the 64 the extra graphics mode plus the normal operation). I do plane on at the very least interfacing my cct to a 64 and thought I would ask the question about 3D graphics because I would like my cct to be useful for something.

2004-10-27 04:43

White Flame

Registered: Sep 2002
Posts: 136

I'd say an easy speedup would be to make the address offset for each line a power of 2, instead of having to add multiples of 40 or 320 to move down. Either change the horizontal resolution to 256/512/whatever, or pad each line with extra data to fill out the power of 2, and use that padding space for extra sprites/fonts/palettes or something.

2004-10-27 08:18

Stingray
Account closed

Registered: Feb 2003
Posts: 117

White Flame thats an awesome idea. I had thought about doing something like that when Oswald mentioned doing 40 columns of 200 bytes but only because it's easy for me to redesign my cct doing it that way, I actually didn't think of it being a coding advantage and was going to go down a different path (adding 200 every column then subtracting 7799 at the end of each line or something equally bizarre and difficult). But thats a bonus now that youve pointed out that it's actually a coding advantage. This adds speed and makes my cct simpler (as long as I stick to reading one byte per cycle since now an image would take up 10K out of 16K)

2004-10-27 08:25

WVL

Registered: Mar 2002
Posts: 886

Even easier would be Oswalds idea, but then on a screen 320 (or any value)*256 pixels (instead of x200).

this way a simple inc/dec of the highbyte of the adress will move you left/right.

2004-10-27 09:16

Stingray
Account closed

Registered: Feb 2003
Posts: 117

at least none of the 10K would be wasted that way.

2004-10-27 10:29

Oswald

Registered: Apr 2002
Posts: 5020

wvl: you mean 40byte*256 do u ?:) or do u want a 256 color mode ? :)

2004-10-27 10:43

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I'm pretty sure he meant 320 x 256 pixels, anyway I'm trying to keep this simple so there will be no 256 color mode at least at the moment anyway. It's taken considerable time to design the cct i've got and so far it's pretty basic. The more complex I make it the longer it will take for me design it.

2004-10-27 10:57

hollowman

Registered: Dec 2001
Posts: 474

only 16 colors, but one byte per pixel is fine with me

2004-10-27 11:37

Stingray
Account closed

Registered: Feb 2003
Posts: 117

1 byte per pixel would be 64k and my cct will only be addressing 16k like the VIC or are you thinking my cct should have it's own onboard RAM or something?

2004-10-27 11:39

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I'ts a good idea and certainly would save some calculations.

2004-10-27 12:46

WVL

Registered: Mar 2002
Posts: 886

i mean 320x256 pixels :)

2004-10-27 20:31

MagerValp

Registered: Dec 2001
Posts: 1056

Yes, the C1 has a blitter.

2004-10-28 07:53

Stingray
Account closed

Registered: Feb 2003
Posts: 117

While trying to keep it simple, I have had an idea that should be easy to add to the design but not knowing much at all about 3d graphics I would like to know if it's worth including? My idea is to use either the V-blank area or the raster lines above and below the bitmap to move memory around while the graphics cct doesn't really have to be accessing memory anyway. To transfer 1 byte will take 2 clock cycles, 63 cycles per line minus 5 for refresh so thats 29 bytes per raster line. If I use just V-blank area (28 lines) it could transfer 812 bytes which would probably be reduced to 512 or 768 when put into my design. If I use all the raster lines above and below the image (minus the 4 lines I might need for ram access depending on the design) it could do 3132 bytes which would reduce down to 3072 (3k) when put into design. Using all the area above and below the display will give better results but may restrict me with some future enhancements I may whish to make to the circuit. This would actually be pretty simple to add to the design, I'm thinking a couple of addressable latches for writing the transfer pointers to (these are the bytes the programmer would write to), one latch for the cct to read to and write from (this holds the byte being transferred) and a n-bit ripple counter.

2004-10-28 09:13

WVL

Registered: Mar 2002
Posts: 886

instead of trying to make a blitter, you could rather add an option to have it clear memory. Clearing the gfx screen will CERTAINLY speed up 3d gfx a lot!

2004-10-28 10:50

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Thats not a bad idea, it would be almost as easy to make it fill a segment of memory with any byte including #$00, if that would be better? So you think this clear/fill thing would be better for 3D speed then data transfer?

2004-10-28 13:39

Oswald

Registered: Apr 2002
Posts: 5020

yep, awesome idea from wvl. how about adding an eorfiller aswell? thats another bottleneck of 3d gfx, and should be plain ass simple to realize.

2004-10-28 14:55

Stingray
Account closed

Registered: Feb 2003
Posts: 117

OK, data transfer is out and filler is in. I have two options with the filler.

A: Only fill 6K per frame

B: Add 30 badlines and steal about 1500 cpu cycles/frame

I don't plan on having any badlines for normal graphics access (at least not at the moment). If I did go for option B, the badlines would only be switched on for frames when filling was being carried out. Option A is simpler to design but is clearing 6k per frame acceptable?

Oswald, this could be a stupid question (I know next to nothing about 3D graphics) but do you mean for example- EOR 3K of memory with byte #$80? If so this would take 2 cycles per byte meaning 3 frames to EOR 8k using option A or 2 frames using option B. Is this good enough? You wouldn't be able to byte fill on the same frame you EOR fill (that would be pointless anyway, wouldn't it?). Just wondering, what would be more useful if I had to decide between one or the other, Byte fill or EOR fill?

I know I have asked a lot of questions and you fellows have helped me out heaps. I am starting to get an idea of what the spec will be. I must stress that this cct will be VERY low spec (at least for now), no frills at all, just quicker for 3D.

2004-10-28 17:01

WVL

Registered: Mar 2002
Posts: 886

What Oswald means is this :

lda column0

eor column1
sta column1

eor column2
sta column2

etc

eor column199
sta column199

if you run a routine like that, you only have to draw 2 pixels in a column to have it fill the lot.

the first pixel turns the pixel in the accumulator on, the 2nd pixel turns it off again => you filled a column, and pretty fast aswell!

eorfill and simple clearing will MAJOR speed things up.

ofcourse you need at least 2 banks for gfx.

edit :

even better is that it saves heaps of memory ;) a lot of coders have an eorfiller and normal filler like this unlooped in memory! Also note i only once did 3d graphics and very lousy at that.. dont take my word for anything ;)

2004-10-29 08:14

Oswald

Registered: Apr 2002
Posts: 5020

wvl: to be more precise:

lda 0
eor 1
sta 1

eor 2
sta 2

...

eor 199
sta 199

column #2:

lda 200
eor 201
sta 201

...

eor 399
sta 399

IMHO the lda at the top of the columns negligable, and the circuit can do just eor sta eor sta.. if the design is simpler that way. OR the lda might be replaced to load the accu with a fixed value. (preferably a choicable value).

How hard it would be to implement lines ? :)

2004-10-29 10:34

WVL

Registered: Mar 2002
Posts: 886

i think the first lda should be inside the eorfill itself. it makes no sense to have to w8 for the correct cycle the chip finished eorfilling one column, and then do a lda, and give the chip orders to start the next column.

it should be made easier -> the chip just has to do the first lda #0 by itself..

2004-10-29 11:15

Oswald

Registered: Apr 2002
Posts: 5020

the chip could do an lda #xx itself, at the top of each column

2004-10-29 12:58

WVL

Registered: Mar 2002
Posts: 886

yes, but also the chip has to do all columns by itself. it takes too much time to tell the chip to eorfill all columns individually.

2004-10-29 13:48

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Alright, so an EOR filler only does vertical lines? It looks like its only one access per cycle so therefore the cct could EOR one byte per cycle (same as byte filler). This should be pretty easy to design and would probably share a lot of common logic with the byte filler; it could possibly even be a mode for the byte filler to run in with a control bit that turns EOR on or off, how does that sound?

About my previous question of weather to go option A (fill 6k per frame) or option B (add in the 30 bad lines and fill 8k per frame). I'm leaning towards option A simply because it's easier to design. This would mean two frames to clear a screen and another two frames to EOR fill (It's important to remember that these fills are at no expense of cpu cycles since they are all done during the video ctt's memory access part of the clock cycle). Ive got know idea of what kind of frames per second we should be aiming for or even what typical frames per second are for C64 with full screen 3D. I really want to know is 4 frames to clear and EOR fill (plus all the normal number of cpu cycles per frame to do what ever it is 3D graphic coders do) going to cut it?

2004-10-29 17:52

WVL

Registered: Mar 2002
Posts: 886

idea :

why not do the eor-filling during display of the gfx data? so you can toggle a switch to have a normal display mode, or a eor-type display mode.

i imagine when in eor-display mode, instead of fetching a byte from memory to display it, you eor it with a byte in your internal buffer, and only then draw.

-> saves 2 frames, not? :)

during clearing in the other 2 frames, you should have enough time to calculate and draw nice 3d shapes, so with triple buffering, 25 fps should be possible in that case..

2004-10-30 02:11

Stingray
Account closed

Registered: Feb 2003
Posts: 117

So just use 40 bytes to hold the next EOR for each column, sounds like a better solution. Actually I think thats one of the best ideas so far. Wow 2 frames to clear and EOR, that does sound good. So the C64's memory won't be EOR filled but the display will be, could that be a problem to anyone?

Triple buffer? is that like 8k actual screen image, 8k image being prepared for screen and 8k image being cleared? I'm just trying to get a better understanding of whats going on from the coders perspective.

P.S. Just had an idea, if I made the 40 bytes addressable the coder could write to them before the screen is drawn. You can now EOR each column with whatever byte you want; you could even use interrupts to change the EOR byte at a specific point on the screen. Which brings me to my next question, what are the reasons for changing the EOR byte for different columns anyway?

2004-10-30 16:45

WVL

Registered: Mar 2002
Posts: 886

yes. one screen for displaying, one screen for drawing in and one screen for clearing.

about the eor-ing : it's vital you understand that you don't EOR with a fixed byte, but with the next column-byte!

let me give an example :

original data

0000
0011
1100
0000
0000
0000
0011
1100

which after eor-ing like

lda #0
eor column1
sta column1
eor column2
sta column2

will look like

0000
0011
1111
1111
1111
1111
1111
1100

-> the area between the pixels that were set is filled with that specific pattern (11 in this case)
there's no such thing as a specific EOR byte for each column, you just eor the data in memory with the next byte to display, which fills areas for you.

2004-10-31 00:26

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Yeah, What I wrote wasn't that clear. The 40 bytes are for setting what the first byte will EOR with at the start of each column. This byte will not EOR with every byte of the column but just the first byte and the 40 bytes are used to hold the result of that EOR and then the next, giving the same result as your example. Something Oswald said earlier sounded like it was desirable to be able to load the first EOR byte at the start of each column, which is what I was talking about. Did I misunderstand what Oswald was saying or is there a reason for doing this? Would this be used for filling in the background or something?

Another question: This kind of filling could be done in hardware for a screen that addresses the graphics data in rows instead of columns (It would just be a flip flop not an EOR) so with this in consideration would it be faster (simpler calculations) to have the screen made up of 200 rows rather then 40 columns?

e.g.
ROW1: byte0 byte1 byte2. . . . . . . . . . byte39
ROW2: byte64 byte65 byte66 . . . . . . . byte103
all the way down to row 200

rather then

ROW1:byte0 byte256 byte 512 . . . . . . . byte 9984
ROW2:byte1 byte257 byte 513 . . . . . . . byte 9985
all the way down to row 200

This is a rather important question and will have a major effect on which way I go with the design so feed back would be very helpful.

Another equally important question: Color information, I could make the screen out of layers of screen, example two screens are seen as one therefore giving 2 bit planes (4 colors). Or I could use 1 byte per pixel (like someone suggested earlier) of course filling would still be done in either option. It's just a question of what option is faster for the coders to do there 3D calculations in. The only problem I see is that with one byte per pixel the screen will have to be drawn a part at a time (or you use 64k for the image, not likely) and sent to the graphics cct, just like each bit plane would be sent individually to the cct.

2004-10-31 21:59

WVL

Registered: Mar 2002
Posts: 886

I think column-based memory would be faster..

2004-11-01 05:47

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I've been discussing that with the fellows on CBM hacking and they agree with you. It's also a little more memory efficient. I'm also leaning towards bit planes; 1 byte per pixel is going to get messy unless I give the cct it's own memoery that gets banked into C64 memeory map, in this case 1 byte per pixel could be worth doing. Which way would be better for 3D calculations.

2004-11-02 10:01

Oswald

Registered: Apr 2002
Posts: 5020

WVL's eor on the fly idea is awesome :)

column based screen is better. (you can adress 8 pixels in one column by one lda/sta using different bitmasks, you need 8 times the code to do that if row based)

"Something Oswald said earlier sounded like it was desirable to be able to load the first EOR byte at the start of each column, which is what I was talking about"

in some cases you might not want the eor filling to be started with 0. being able to choose the value adds some level of variability, and shouldnt be hard to implement imho.

Im against bit planes. Look at the amiga HW, it was something kewl at the start, but later became the biggest bottleneck of fast 3d gfx. IMHO 4 colours ought to be enough 4 everyone :) bitpairs, or simple hires is the fastest solution. Drawing into 2 bitplanes = twice as many write instructions, more adress calculations, etc, etc. Byte based pixels are faster. (you can also forget about lda ora sta, just lda sta, while at 2 bitplanes you do 2x lda ora sta)

2004-11-02 11:59

Graham
Account closed

Registered: Dec 2002
Posts: 990

oswald, you can eor with any value if you eor the first byte of your buffer with it. you dont need any other initialization than 0.

2004-11-02 12:56

Stingray
Account closed

Registered: Feb 2003
Posts: 117

It definitely looks like column based screen is the way to go. I thought bitplanes would be OK (If using only 2 or 4 bitplanes) If you want a particular color you just draw the same lines on the appropriate planes, not to much more calculation I wouldn't have thought, especially if clear and fill are being done in hardware.

I'm surprised you bought up bit pairs; I wonder what other people think of using bit pairs and if losing half horizontal resolution would bother them?

If there are enough cells on the CPLD I would like to include both bitplane and byte per pixel (is thee a proper term for this?), I know bitplanes are a bit slower but its going to be awkward dealing with a 64000 byte screen on the 64. The only practical way of doing it is switching between two banks while building up the screen with 32k in each bank, is that going to be acceptable?

Filling on a byte per pixel screen would be a bit diffrent (just solid fills of whatever color) and woudn't use EOR but would still be done on the fly.

2004-11-02 15:29

Oswald

Registered: Apr 2002
Posts: 5020

great, my post is lost.. anyway

graham: true

stingray:

byte per pixel = chunky mode

writing 2-4 pixels at a time can make the fastest existing lineroutines go 2-3 times slower. thats why I say no to planar.

how about a 4bit/pixel mode ? 2 pixels in a byte ? so the 32000 byte limit is solved.

switching between 2 banks is no problem, if the screen is halved vertically.

2004-11-02 20:48

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Chunky mode, thanks.

2 Pixels per byte sounds alright, and with chunky mode, the two banks would split the screen vertically.

I guess I could go either way, which way do you think would be fastest for coding with?

2004-11-03 07:12

Graham
Account closed

Registered: Dec 2002
Posts: 990

planar is completely useless if you have pixeldepths of 2, 4 or 8 bits per pixel. chunky is the way to go.

2004-11-03 07:35

Stingray
Account closed

Registered: Feb 2003
Posts: 117

So sounds like bit planes are a big NO. Only choice is between 1 byte per pixel (two banks for one image) or 4bits per pixel (and no need to bank switch). What do you think Graham? Live with the bank switching? how much diffrence would it make?

That brings me to my next question, If I end up going with chunky, the graphics memory will no longer be in columns of 8bits. What is the best way to address graphics data in chunky mode?

We are slowly narrowing the design down, which is good since I have to have a pretty certain idea where the design is heading before I start redesigning the cct.

2004-11-03 09:04

Graham
Account closed

Registered: Dec 2002
Posts: 990

packing as much pixels in a byte as possible is often a good idea. if you access the screen in blocks it doesn't hurt but even helps a bit. for single pixel access ofcourse one byte per pixel is the best solution.

2004-11-03 10:14

Oswald

Registered: Apr 2002
Posts: 5020

chunky 4bit/8bit modes are fine with me. How will be the colors defined?

in these modes the mem layout is not that a big question anymore.pc style linear is as fine as columns, but for some reason Id still vote for columns tho.

bank switching is ok, should be just a tiny problem when coding.

but how much memory will this have ? how about doublebuffer/tripplebuffer?

(how about blitter line drawing?:)

2004-11-03 10:55

Stingray
Account closed

Registered: Feb 2003
Posts: 117

How much memory? you will have plenty of banks to switch in and out of the 6510's 64k, well thats if my idea works. I will explain my idea for the memory banking in a post very soon. Triple buffering? not a problem at all - heaps of memory (but 64k max addreasable from 6510 at a time). You can even run code from the graphics cct's memory (when banked in).

2004-11-04 11:11

Oswald

Registered: Apr 2002
Posts: 5020

one more thing I forgot last time:

how about 2x2/4x4/8x8 modes ? we coders would love the hardware do these, instead of hacking the vic, and wasting cpu power :)

stingray, you havent responded my question, how colors would be defined ? ...16 shades of 1 color looks much nicer than "random" 16 color in any palettized order.

2004-11-04 12:58

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I'm not sure about colors yet. I'm not even sure how I will design the circuitry for the colors yet, I'm guessing that VIC has an Op Amp and had different resistor values to give different frequencies for the different colors? I would like to have 256 colors available (but I'm not promising anything).

2x2, 4x4, 8x8? does that mean like doubling pixel size?

2004-11-04 19:37

Cybernator

Registered: Jun 2002
Posts: 154

> 2x2, 4x4, 8x8? does that mean like doubling pixel size?

Considering _plain_ 2x2, 4x4 or 8x8, yes it's like doubling the pixel size.

@Oswald: What kind of VIC hacking do you need for 8x8? Chars are normally 8x8, rite?

2004-11-04 23:17

JackAsser

Registered: Jun 2002
Posts: 1989

http://www.pepto.de/projects/colorvic/

This link describes the colors of the C64. At the bottom of the document is an email from the dude who helped design the color circuitry in the VIC2-chip. Check it out, it might give some hints.

2004-11-05 08:28

Oswald

Registered: Apr 2002
Posts: 5020

stingray: yep, 2x2 is a pixel sized 2 times as wide and 2 times as tall as the smallest possible one, and so on..

cybernator: yep 8x8 doesnt needs hacking, was just lazy to write "except 8x8.."

jackasser:damn, I wanted to point out that link aswell :)

2004-11-05 13:31

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Thanks for the link. 2x2, 4x4 and 8x8, why would you want this? speed? effects?

Just incase you havn't noticed, I'm not in the Demo scene :)

2004-11-05 13:53

Graham
Account closed

Registered: Dec 2002
Posts: 990

speed and memory efficiency.

2004-11-05 14:37

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Thanks Graham. I had to give a preliminary demonstration of my 8051 hardware today (Final presentation will be in a couple of weeks). The cct is really low spec but should give you an idea of what I've got so far. The 8051 cct will be the basis of the C64 cct, just modified allot.

8051 GRAPHICS CARD SPECS

OUTPUT: PAL (none interlaced)
COLORS: 2 (B&W) Yeehhaaa
RESOLUTION: 320x200
GRAPHICS DATA: HORIZONTALY FORMATED
RAM: 8K
IC COUNT: 6 ICs (could be bought down to less but won't be)
LINES: 312
CYCLES PER LINE: 63

This graphics card is connected to an 8051 microcontroller board (with microcontroller, RAM, ROM, address decoder etc) to make a simple micro computer system. I have never heard of someone else interfacing an 8051 to a TV before, although maybe it's been done hundreds of times.

2004-11-06 11:29

Graham
Account closed

Registered: Dec 2002
Posts: 990

i had a nice idea: if you have different color depths like 1, 2 and 4 bits per pixel, it would be very nice to extend the adress lines to the "bottom" (below A0) to have some kind of soft scrolling.

2005-02-02 13:14

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I've been on holidays the last few days so I finally interfaced my graphics cct to my 64, now my C64 has a brand new graphics mode :) There is a small problem with the static ram that I still have to iron out, about 1 in every 1000 writes does a partial write (invalid data) to a random address as well as writing the correct byte at the correct address. But other then that it's working fine :)

2005-02-03 11:16

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Hey, I got the problem with that static RAM sorted out :)

I guess the next step for me is to decide what featues will make it into the redesign of the video cct and then actually start the redesign. So it will probably be a while before I have anything more to post.

2005-02-14 05:58

Death Demon
Account closed

Registered: Feb 2005
Posts: 68

I'm going through this trying to figure out exactly what you want to do. The question posed is really simple enough, but the responses thus far are confusing to me.

1) Do you want to come up with a memory controller for the VIC chip that manages memory transfers better?

2) What memory chips are you using? Are you able to use more than one?

3) Can you do front/back buffer rendering?

Things you should consider:

1) Memory chips have different latencies associated with addressing different segments of memory. Figure out what those crossover points are and map them to the application you are doing. For instance, it may make sense to address memory in "tiles" versus lines or columns in order to avoid page flips.

2) Using multiple memories may aid in that you can map alternating lines and use the LSB of the line number to indicate which memory you are reading from. This also helps with tile mode rendering or any other mode. It allows you to hide memory accesses for one RAM while you are processing the request from the other RAM.

3) Try to avoid full frame blits. It's best to use a pointer that changes to point at the new frame once it's ready for display.

2005-02-14 14:23

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Hey Death,

Q1: What I'm designing is a simple PAL (possibly NTSC later) graphics cct that will have it's own RAM. The idea is to have the VIC II still running as normal. The two ccts will be synced so that the raster will be at the same point weather being generated by my cct or the VIC II. The programmer can switch between VIC II (default) or my graphics cct. The main idea is to have a graphics cct in a C64 were the user would not even have to be aware of it's existence. I want this cct to be transparent to the user (no additional video cables, no extra switches or anything).

Q2: At the moment I'm just using an 8k static RAM but this is only for trialing my current design. When my design progresses there will be a very large amount of RAM for the graphics cct (It will be necessary for the graphic modes). The idea I'm thinking about for RAM is to be able to bank in 8k sections of graphics RAM (which there will be a lot of) to the 6510. Alright I'll try to explain this as best as I can. I'm thinking of designing the cct so the programmer can, for each 8k block (banked into the 6510) choose too either "Read C64 RAM / Write C64 RAM" (normal operation), "Read C64 RAM/ Write Graphics RAM", "Read Graphics RAM / Write Graphics RAM" or "Read Graphics RAM / Write C64 RAM". OK that might not have made sense, basically you can change weather the 6510 will read from it's C64 RAM or the graphics RAM and if it will write to C64 RAM or to the Graphics RAM. You could even write to both C64 RAM and Graphics RAM at the same time if you wanted to. Anyway thats what of got in mind, it will be very flexible for the programmer.

Q3: What is front/back buffer rendering?

Point 1: Could you go into a bit more detail with the tiles thing. Is it common for graphics ccts to use this method? Is the size of the tiles normally defined by the programmer or set in hardware?

Point 2: Do you mean so that something like a graphics processor can be accessing more then one RAM at a time?

Point 3: With the blitter thing, I agree with what you say but I dont think there was a lot of demand for a blitter, how important do you think having the blitter is?

If you havent noticed I dont know much about 3D graphics or rendering so sometimes I will need some stuff (even simple stuff) explained to me. Also I want to keep this cct very basic, I dont want this to be one of them projects that never get finished because the person doing it got to carried away and set themselves an unachievable goal (thats why Im making no promises with the specs).

Thanks Death, please keep the suggestions coming.

P.S. I've started work on the color cct, It may take me a while to complete this part of the cct since I probably won't have much spare time for a while.

2005-02-14 19:44

Death Demon
Account closed

Registered: Feb 2005
Posts: 68

The questions you're asking can't necessarily just be answered with a right or wrong answer. Mostly, it depends on your specific requirements. So I can give you some pointers for general system architecture with some caveats so that you can make an intelligent choice as to whether or not that particular application makes sense for your use or not.

If you are using an on-die 8K SRAM, then you should be able to access data per clock regardless of the address. There's really no speedup associated with paying attention to banks. You may want to look at the total number of available ports on the RAM, though. For instance, a dual ported RAM will allow write and read access on the same clock. A multi ported RAM can allow multiple addresses to be read or written on the same clock. The area increase is minimal. I'm not sure what your particular application has available though. If this is an ASIC you are designing, it should be easy enough to use whichever RAM you want.

Also, since you're using an internal SRAM, I don't see any real need to worry about how the addressing is lined up. As long as you make sure you are dealing with contiguous chunks, there's really not going to be an issue here. No slick addressing scheme is going to provide you any tricks. If you're dual ported, you may want to make sure that you have things set up so that your 8K blocks aren't in danger of being written while being read (write-thru hazard).

For the last question:

Q3) Front/back buffers are essentially two copies of the screen. You render to the back buffer while displaying the front buffer. This gives you one extra frame to render the scene. Once it's done, you flip a bit so that the back buffer "becomes" the front buffer and vice versa. Not sure if this application would be of use to you since I think you want something that's completely compatible with the C64 VIC.

1) Tile mode rendering is really a trick to help optimize the utilization of external RAMS. You set up your "tiles" in footprints that correspond either to texture formats or your rasterizer's output swath configuration as well as the bank layout for your external DRAMs. It doesn't apply to SRAMs. The goal with a tile mode design is to make sure that your rendering to memory and pulling from memory is done in a tile that fits into the fastest accessible block of memory. DRAMs have latency penalties for page faults. You're trying to avoid those. Again, none of this applies to static RAMs.

2) Yes. Most modern graphics will access multiple RAMs. You spread the frame buffer across these multiple RAMs so that you can pull X lines out of ram A up until a page fault, then start accessing lines X+Z from ram A which causes a page fault. While you're waiting for the page fault to complete, you start accessing Y lines ot of ram B up until the page fault at which point ram A should be ready. This is for optimizing external DRAM accesses, but if you are using internal RAMs, it could allow you to use single ported RAMs in which you could be rasterizing the next ten lines to RAM B while reading the first 10 lines to display out of RAM A.

3) Could be entirely unimportant. Dunno. Sprites should be done with blits though. Anything that's an "overlay", so to speak, should be done with blits. That's going to be a fairly optimal way of managing them. Should be much faster then the way C64 originally handled them. Been a long time, but I seem to remember a lot of latency around the actual drawing of the sprite.

2005-02-15 06:06

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Death,
I'm not planning on using Dual Port RAM and at the moment the design has been done in discrete logic (and a couple of GALS) on bread boards (very messy and quite large) and also in a CPLD, at some time in the future I will probably have to look at going to an FPGA or something. I would like to use SRAM but I havent looked at prices or availability yet so that could change. The Ram doesn't need to be fast as this cct is low performance, it only needs to run at 8Mhz (although I may change that latter but even then the dot clock will still be 8Mhz). The cct is similar to the VIC II cct and will have a very similar output (same sync signals, same boarder area, same resolution etc) but I'm planning to have 256 colors available and a new way of formatting the screen so that 3D graphics can be rendered a little bit quicker, also I'm planning on having the graphics cct able to do things like clear portions of memory fast (in C64 terms anyway). Like the VIC II, the cct will only need to access RAM for half every clock cycle, this will allow me to use the other half of the cycle to do the clearing and stuff like that (as long as the 6510 isn't using that same half of the cycle which it will sometimes, but no more then 1 out of 8 pixel clock cycles anyway). Even if I don't use the other half of the cycle I can use the cycles above and below the boarders for these things since the cct doesn't need to access RAM at that time. Another thing that seems to be in demand is for the hardware to be able to fill graphics as they are being drawn.

As for buffers, my idea is to have a pointer that points to the start of where the cct will get the screen data from in RAM. In other words you can get your screen data from anywhere in RAM just by changing the pointer (so you could build up 1 or more screens in RAM while displaying a completely different one). Is this the same as using buffers?

I should point out that none of these ideas are set in concrete.

2005-02-15 23:23

Death Demon
Account closed

Registered: Feb 2005
Posts: 68

That's what buffering the frame is, correct. Double buffering is the name given to using two buffers. Triple uses three, and so on. Front buffer is the one that's being displayed on screen, back buffer is the one that's being written to.

Your best bet is to go with what you got. It sounds like you've thought the system through. You can build it and start to see some drawbacks, things you would like to have done better. If you could model the system in C (write a software representation of what you're doing), then you could simulate the graphics engine to see if it will behave the way you think it will.

Using the SRAM has the benifit that you don't have to consider refresh cycles. Timing will be a bit more friendly as well. They are going to be more expensive, but you're just doing this as a project and not something you're going to sell millions of, so that's OK.

Good luck!

2005-02-16 13:48

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Hey, thanks Death,
A Question about solid filling, I was discussing this with someone on CBM Hackers and I asked about objects that are only partly on the screen and they suggested having larger area in memory then the area actually being displayed e.g. screen in memory could by 512x256 and the actual screen displayed would only 320x200. Allowing the hardware to fill the objects that only appear partly on screen. What do you think of this idea? and is method commonly used? I dont know much about how 3D images a calculated and drawn but there does seem to be demand for hardware filling.

2005-02-17 12:30

White Flame

Registered: Sep 2002
Posts: 136

No matter what virtual bitmap size you use (assuming you're not having 8GB of framebuffer RAM), you'll always hit cases where it's too small to include what you want to draw especially with perspective 3d renderers. You either need to clip before rasterizing, or have full-precision signed coords in the rasterizer that don't do writes when offscreen (slow but simpler).

2005-02-17 13:16

Stingray
Account closed

Registered: Feb 2003
Posts: 117

You got a good point there. Can you explain to me what coords are? I'm guessing that there some kind of marker for where to start and stop filling?

2005-02-17 22:22

White Flame

Registered: Sep 2002
Posts: 136

coords = coordinates, (x,y) values, etc

So instead of the rasterizer calculating screen ram addresses to fill between, it would keep x,y coordinates of the segments it needs to fill. As it draws the raster line segments that make up the polygons, it needs to check what parts of that segment of the polygon is actually onscreen, and only draw that.

This would require that the video processor have larger registers in the rasterizer, and that it would loop through and try to draw every segment of a polygon even if it's fully offscreen. But like I said, this could be easy to implement in hardware, and you wouldn't have to write a software clipper. But another 'but' would be that you still have to do near-plane clipping for 3d, so if you're already doing that, you might as well just do full view frustum clipping, and have the simpler onscreen-only rasterizer in hardware.

2005-02-18 07:10

Death Demon
Account closed

Registered: Feb 2005
Posts: 68

White Flame is pretty much dead on. It sounds like you want to do more 3D processing than simply providing a simplistic 2D graphics (C64 gfx) interface that can be a little bit faster. If you're talking about doing full frustum clipping, you're going to need to rethink things a little bit. Your computation bandwidth needs to be a bit higher than what I'm getting from this post. You'll need to do high enough precision calulations to generate the clip planes based on your viewport. Then you'll need to calculate planar intersections and generate new verteces for polygons that cross the clip planes. Then you'll have to rasterize things. I suppose I would also suggest doing some visibility culling as well so that you don't waste time rasterizing portions of polygons that you aren't going to see.

If you're looking to do full-on 3D in hardware, there's a lot more you'll need to do than what you've described here. And it's not really simple. If you're getting into frustum culling, that sort of implies that you're doing geometry as well. At least some of the area associated with generating the planar equations and doing the clipping would be used in other geometry operations.

2005-02-18 14:33

Stingray
Account closed

Registered: Feb 2003
Posts: 117

I don't want to do full on 3D in hardware, firstly because I wouldn't know how to and secondly I would never get the project finished. I guess I just want to do the simplest filling in hardware I can without it being completely useless. At first I envisaged the coder having a bitmap in memory filled with wire frames (with the wire frames drawn in the desired colors) and as my hardware draws the screen it fills in the wire frames. Seemed pretty simple but then I thought what about objects only part on the screen? how will my hardware handle that? and now I'm starting to think about what happens when an object is partly in front of another one? Is it possible to just let the coder take care of all this by having there code draw the wire frames which take all this into account and have my hardware just do a simple filling in of the appropriate area (much like the fill on a paint program) or would this be completely useless?

2005-02-19 00:08

White Flame

Registered: Sep 2002
Posts: 136

Here's some ideas to implement, in increasing complexity:

Fill a horizontal (or vertical, whichever is easier for you to implement) segment with a solid color. Even adding just this will speed up a C64ish 3d program tremendously vs software, you can draw any shape with a software loop of these, and having this would eliminate the programmer having to convert x,y coordinates into pixel memory addresses all the time.

Fill a segment as above, but with clipping done in hardware.

Add a z-buffer, and do depth testing for each pixel on the segment. :)

Try putting a gradient or fill pattern on the segment (depends on your color depth)

Fill a triangle, by drawing (clipped) horizontal segments.

Fill a polygon, given a sorted list of vertices for both the left & right side.

You'd also want rectangle fill & screen clearing in there. Don't know if you want to bother with bitblit and stretchblit, but those are something to consider as well.

2005-06-25 06:37

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Here is a stupid question. Does the vic II chip already have buffering in a sense? I mean in that you can change the bitmap postition (or the character memory position)with just writing to one register as long as you don't have to change whats in the color ram.

2005-06-25 13:16

PopMilo

Registered: Mar 2004
Posts: 145

Offcourse. Hires bitmap and charset position in memory are controlled via few memory locations. Bitmap can ocupy anyone of 8 blocks of 8Kb of memory (sounds bad :)) charset is the same thing, just the step is 2Kb.

2005-06-26 13:42

PopMilo

Registered: Mar 2004
Posts: 145

P.S. And Char screen memory can be set in 1Kb steps...

By the way I like what you are trying to do, keep up the good work!

2005-07-26 01:50

Stingray
Account closed

Registered: Feb 2003
Posts: 117

UPDATE ON 3D CCT.
I've settled of the final specs of the cct.

Columns with 256 byte offset
Memory fill (clear)
EOR fill (on screen draw)
2 buffers
16 colors (normal c64 palette, normal 4*8 cell limitations)

The cct has completely changed. I want to make the cct as simple and cheap as possible while including the main features that ppl want. There seemed to be three main features that kept coming up in the discussions, the column formatting, fast memory clear and EOR fill. I want the cct to be simple enough that anyone can buy the parts from an electronics shop and put it together. When the cct is finished I will release the schematic so that anyone who wants to build one (not for profit) can. Also if anyone wants to write some code for the cct might want to have a look at the schematic.

I have given my video ctt the flick, the cct now just uses the vic II and the RAM already inside the C64. In fact the cct will just be a bit of logic between the address and data buss of the C64 and the VIC II. This is a lot simpler. Basically when the VIC goes to get character data from xxxx location the cct jumps in and puts a different address on the buss corresponding to the column address. The memory fill will need address and data placed on the busses and a write signal during the Vic's part of the clock cycle. The EOR fill will use forty bytes and some EOR logic to put the appropriate byte on the data buss to VIC while drawing the screen. The If commodore where thinking about 3D graphics when they designed VIC II they probably would have added these simple features to the vic II but off course Commodore where not thinking about 3D graphics for the C64 back in 1982.

The column address formatting part of the cct is 90% finished.

Anyway, I would be interested to know what anyone thinks of those specs.

2005-07-26 06:44

Oswald

Registered: Apr 2002
Posts: 5020

it would have been nice to have line drawing, segment fill...

on the fly eor fill is hardcore :) wont it steal cycles from the cpu ? I mean if the cct fills the screen at each refresh, it would be nice if it wont steal each time cycles. If the case is so I would vote for only once fill in memory, then it wont again steal cycles in each frame. (leaving more time for the cpu to calc the next frame)

it would be also nice to have more than 2 buffers. guess its not much extra work to make it able to use all possible bitmap location.

2005-07-26 08:38

WVL

Registered: Mar 2002
Posts: 886

yes! EOR-on-the-fly :) that really speeds things up dude! :)

btw, could the cct clear frames in the background? (not stealing cycles from cpu?) if so, the c= only has to give orders to clear the screen, (while the screen clears it can do 3d rotations), when screen is cleared the cpu draws the lines, swaps the buffers and enables EOR-display mode..

real fast. cpu only has to do rotations and line drawing.. neat.

2005-07-26 09:43

Stingray
Account closed

Registered: Feb 2003
Posts: 117

The memory fill (Screen clear) doesn't use any cpu cycles (except a few to set it up). The VIC has about 10 000 wasted cycles per screen (a bit less with sprites) which will be used by the cct to write to memory. The VIC II doesn't have a W/R output so there are still a couple of things Ive got to work around for this but it should be very doable.

Also there will be a control bit that you can set to make the memory fill only fill the screen memory. In other words it will only fill 200 bytes of each column. So this way if you are clearing a screen there is no time wasted filling the none viewable portions of memory which exist due to the 256 offset of each column (so you can easily clear one buffer per frame). But if you just want to fill a large continuous chunk of memory this will also be possible. But no window fills.

The EOR on the fly (which will also have a control bit to enable it) should be very easy to implement in hardware (that's easy for me to say now, because I havent started designing it yet). It should just be a case of having 40 bytes. At the start of each new screen the forty bytes get cleared. As each byte of character data gets fetched by VIC the cct will intercept it and EOR the data with the corresponding byte for that column, draw that byte on the screen and leave the result in the corresponding byte waiting for the next line. Does that make any sense? will that do what you all mean by EOR fill?

2005-07-26 10:21

Oswald

Registered: Apr 2002
Posts: 5020

stingray: yes eor fill works that way, kewl solution :)

2005-07-26 10:37

Stingray
Account closed

Registered: Feb 2003
Posts: 117

awsome, thanks.

2005-07-26 10:38

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Quote: it would have been nice to have line drawing, segment fill...

on the fly eor fill is hardcore :) wont it steal cycles from the cpu ? I mean if the cct fills the screen at each refresh, it would be nice if it wont steal each time cycles. If the case is so I would vote for only once fill in memory, then it wont again steal cycles in each frame. (leaving more time for the cpu to calc the next frame)

it would be also nice to have more than 2 buffers. guess its not much extra work to make it able to use all possible bitmap location.

Good point about the buffers. With the addressing hardware for the column formatting the lower 8 bits give the row location (0-199) the next 6 bits give the column (0-39) leaving two bits. So thats four buffers for the column formatted screen (one in each bank). Well at least that is what I though until I started designing that part of the cct. The VIC sees the character rom In banks 0 & 2 which means my cct does as well. So that kills two of the buffers. You can still display the buffer but 16 of you columns will be character ROM :(

There are a couple of ways around this.

#1, add some logic between the PLA and the ram/rom control lines (at least I think that would fix it) but that could get very messy.

#2, Use 64k ram on the cct. Much simpler (this wouldn't add 64k to your C64, it would just reflect what is already in the 64's RAM). This would also make the cct a little more expensive.

If the extra work can be justified by the advantages I might consider doing it. I guess It all depends on how beneficial having the two extra buffers would be.

Oh yeh, and the EOR on the fly has no cpu overhead.

2005-07-26 10:59

Oswald

Registered: Apr 2002
Posts: 5020

leave the char rom stuff as it is. 2 visible buffers (excluding where char rom comes in) / vic bank is ok.

I thought only 2 fixed buffer is possible / whole 64k. That would be not good when it comes to designing memory usage.

btw, attribute memory reading (0400/d800) will be compatible?

2005-07-26 11:12

Stingray
Account closed

Registered: Feb 2003
Posts: 117

It is only two usable buffers for the entire 64k (one at 16k - 26k and one at 48K - 58k each taking up 10k instead of the normal 8k).

Hey, could you explain your question about "attribute memory reading" a bit more, I don't really understand what you mean but it sounds importaint.

2005-07-26 14:01

Oswald

Registered: Apr 2002
Posts: 5020

I mean that if you will translate the adresses when the vic wants to read color info from screen memory or d800.

btw would it be too big problem to use really onle 200 byte columns ? so more buffers would fit.

and if we stay with 256, it would be nice to be able to set where the 200 lines in that 256 starts, how about a wrap around system, so an offset 0-255 is definable into each 256 byte column ?

2005-07-27 16:56

A Life in Hell
Account closed

Registered: May 2002
Posts: 204

Quote: I mean that if you will translate the adresses when the vic wants to read color info from screen memory or d800.

btw would it be too big problem to use really onle 200 byte columns ? so more buffers would fit.

and if we stay with 256, it would be nice to be able to set where the 200 lines in that 256 starts, how about a wrap around system, so an offset 0-255 is definable into each 256 byte column ?

isn't $d800 actually physically inside the vic? that is to say, if you were going to translate accesses to that, wouldn't you need to translate them as cpuwrite->vic rather than vic->memread like the otherstuff?

just a thought that is probably embarrassingly wrong :)

2005-07-27 19:21

Oswald

Registered: Apr 2002
Posts: 5020

you're a bit right, prolly stringray wont translate d800 adresses, since those are done on a BUS dedicated to the vic. This means vic doesnt needs extra cycles to read d800, as it has an own bus to it.

2005-07-27 20:46

Graham
Account closed

Registered: Dec 2002
Posts: 990

There is only one bus to the VIC with 14 adress pins and 12 data pins.

2005-07-27 22:15

Oswald

Registered: Apr 2002
Posts: 5020

explain why it does not need extra cycles to read d800 ?

2005-07-28 00:34

Stingray
Account closed

Registered: Feb 2003
Posts: 117

All the reads from the video matrix (color ram and screen matrix) will be done in the normal way, The cct will only convert character fetches (the 8k bitmap data) to the column format. This isn't because it's not possible but only because it's extra work which at this stage I don't see enough benifit to justify it. Is only having the bitmap column formmated going to cuase any major problems?

2009-03-23 13:35

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: explain why it does not need extra cycles to read d800 ?

Sorry for unbury such an old thread, but to answer your question: The bus is 12-bits wide. At a badline it will steal 40 cycles and on each cycle it'll fetch 12 bits (8 bits of char data and 4 bits of color data from D8xx). Purely based on my assumptions of course... :)

2009-03-24 04:34

Martin Piper

Registered: Nov 2007
Posts: 634

If I remember correctly the colour RAM uses a 2114-30L which is 4-bit SRAM compared to the 8-bit DRAM main memory. Four bits of course provide the sixteen colours and are read by the VIC via a little bit of chip selection logic tied to the other address lines.

2009-04-05 11:24

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Everything going as scheduled :)
I have actually gotten someway on this cct but only in a very very early prototype form and with very limited features so far (but it is working). I have been very very busy but I'm hoping I will have some more time soon to finnish this off.

2009-04-05 12:01

Oswald

Registered: Apr 2002
Posts: 5020

so, after FIVE years, eor on the fly, and auto clr is done ? :)

2010-01-25 12:51

Stingray
Account closed

Registered: Feb 2003
Posts: 117

@Oswald

It will be smaller when completed, so it should fit nicely inside your C64, into the VIC II socket.

Screen is the C64 start up, color memory placed at $0400 (same as character memory) but I could place it anywhere. Doesn't have to be at $0400 offsets or anything, I could have made it start at $0401 if I wanted, so it will be possible to hardware scroll.

2010-02-26 16:24

Stingray
Account closed

Registered: Feb 2003
Posts: 117

The circuit now allows you place COLOR RAM anywhere in the 64's RAM.

This allows hardware scrolling of the COLOR data and no more 16k bank limitations (the COLOR RAM pointer is 16bit).

Because COLOR RAM is no longer 1k fixed at $D800, it is now possible (By forcing badlines) to have a unique COLOR nybble for EVERY 8*1 cell on the screen. In other words, it is possible to have images with color combinations in a single character cell that were not possible before.

Also, it is not necessary to change the COLOR RAM pointer if you are forcing badlines as you would usually be required to do for the Video Matrix pointer when forcing badlines for an FLI. So no cycles need to be wasted if you want to FLI? the COLOR RAM.

It is even possible to place COLORS on the VIC'S 4 bit color bus in the first three columns of a forced badline and for ALL CYCLES OUTSIDE THE BORDER area. However I am not sure what effect doing these things has on what you see on the screen as I am unsure what VIC does with whats on the color bus during the first 3 columns of a forced badline and also outside the border area.

Can anybody please tell me if there is or if they know of any advantage at all to being able to place colors on the VIC'S 4 bit color bus in the first 3 columns of a badline or outside the border area? (can this be used for anything?)

BTW, I do plan on implementing this same technic for the Video Matrix and the Bitmap data, if I am successful in doing this, it will mean full hardware scrolling of Video Matrix, bitmap and COLOR RAM. It will also mean FLI etc. can be done with out cpu overheads of switching the Video Matrix etc. (you will still need to force a badline though). Also no more 16k bank limitation.

My main goals are to have a circuit that very discretely fits in the C64 between the VIC socket and the VIC II itself and to mantain 100.00% compatibility (i.e. circuit is completely inactive unless invoked by software).

2010-02-28 00:17

Martin Piper

Registered: Nov 2007
Posts: 634

What happens with the sprite pointers? Hardware scrolling of screen data is nice but if there are sprite pointers in the middle of the screen data you'll get rubbish chars.

2010-02-28 05:21

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Martin, The sprites work the same they always have.

just to clarify, you would not normally be placing data on the VIC bus outside the boarder area, but if you want to for some reason, the circuit does makes it possible to do so.

2010-02-28 12:39

Martin Piper

Registered: Nov 2007
Posts: 634

What I mean is:
The VIC has sprite pointers that are just after the visible screen data and on a known fixed boundary after the start of the screen memory.

So lets say the VIC thinks it is displaying a screen starting at $400 the sprite pointers will be $7f8-$7ff.

However lets assume with your magic box the screen display fetch is actually hardware scrolled to start at $0700.
As each screen line is drawn then when we eventually get to displaying chars from $7f8 the screen data will start display from the sprite pointers. This produces weird graphics for the eight sprite pointers.

Unless the sprite pointers are continuously relocated to be $3f8 after the start of hardware scrolled screen memory. In which case the eight sprite pointers will have to be updated each time a hardware scroll happens.

Or unless the sprite pointers are fetched from some other memory location not related to screen address.

2010-03-01 06:35

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Sorry Martin, I thought you were refering to the sprite data fetches.

It's a good question. As the Sprite pointers are loaded in to the VIC the same way the sprite data is, the VIC will fetch the sprite pointers from the same location as usual (the last 8 bytes of the selected video matrix).

The color data pointer, bitmap data pointer and the Character data pointer used by the Magic Box (I really need a name for this circuit, how about "Vic Enhancer" for the moment?) are 16bit pointers that are independent of video banks and the Video Matrix pointer etc., so you could have Video Matrix at $0400 and the character data pointer at $8000 or whatever.

I will just point out that at the moment I only have this addressing scheme implemented and working for color data. I will also point out that you will not be stuck with a linear screen formatting.

2010-03-01 07:51

Martin Piper

Registered: Nov 2007
Posts: 634

Cheers. That makes sense.

As for a name "VIC-X"? :)

2010-03-01 08:00

WVL

Registered: Mar 2002
Posts: 886

How does it work?

I'm imagining that the VIC is asking for bytes from $d800, and that your circuitry changes this address and asks for bytes from colorpointer+adressasked-$d800 instead. And sends the value read back to the VIC.

So a kind of man-in-the-middle attack, where the VIC requests are being modified and 'false' data is being sent back to the VIC..

Is that how it works?

2010-03-01 08:55

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Martin, VIC-X as in VIC eXpander?

WVL, perfect analogy, the "MAN IN THE MIDDLE ATTACK"!

yeah, pretty much. The VIC-X or Vic Enhancer just sits between the Motherboard and the VIC. VIC puts out the address it wants, but the VIC-X / Vic Enhancer intercepts it and puts it's own address on the bus, allowing allot of freedom in how the C64 uses the VIC.

So with the use of forced badlines, it is now possible to have a unique COLOR nybble for EVERY 8*1 cell on the screen.

My C64 is currently putting a different nybble on the color bus for ever 8*1 cell, pretty cool! That's 8k (nybbles not bytes) of just color data! I have not written any code that takes advantage of this (that's well above my coding skill and I'm trying and convince Oswald to write an editer).

WVL, If you want me to, I can go more in depth with how the address pointer increments.

2010-03-01 11:44

WVL

Registered: Mar 2002
Posts: 886

Oh, that's OK. But now I understand how you plan to do a linear bitmap (by modifying the adress) and EOR-on-the-fly (by modifiying the data).

It's nice that you're still using the VIC, but now with a smart adress-modifier between the VIC and the bus. Much better idea than creating a video chip by yourself.

2010-03-01 12:54

Martin Piper

Registered: Nov 2007
Posts: 634

Quoting stingray

Martin, VIC-X as in VIC eXpander?

Exactly my thinking.

2010-03-01 13:37

JackAsser

Registered: Jun 2002
Posts: 1989

Regarding 3D-graphics and EOR-filling and what you would need.

1) Ways of setting up A^B = C (i.e. place A, B and C in memory and EOR-fill it).
2) Width, Height and Stride (bytes per line) for A, B and C.
3) Vertically EOR-fill that memory.
4) Clearing

With the above it would then be possible to HW-accelerate normal filled vectors and more over it would support the dithered EOR-filling used in f.e. Edge of Disgrace, Natural Wonders and Andropolis.

If you implement that, I can gladly "accellerate" my portal engine for you if you wanna test it.

Example:

* Set width & height to 256x128, horizontal linear mode
* Set stride of A to $20
* Set stirde of B and C to $28
* Point A to $2000 (line buffer)
* Point B to $6000 (bitmap)
* Point C to $6028 (bitmap, one row down)
* Start EOR-fill
* Clear A

This is actually exactly what you would do on an Amiga using the blitter...

For dithered mode you would do:

* Set width & height to 256x64, horizontal linear mode
* Set stride of A to $40
* Set stirde of B and C to $50
* Point A to $2000 (line buffer)
* Point B to $6000 (bitmap)
* Point C to $6050 (bitmap, two rows down)
* Start EOR-fill pass 1
* Point A to $2020 (line buffer, one row down)
* Point B to $6028 (bitmap, one row down)
* Point C to $6078 (bitmap, three rows down)
* Start EOR-fill pass 2
* Clear A

/JackAsser

2010-03-02 13:46

Stingray
Account closed

Registered: Feb 2003
Posts: 117

@Martin
What about for VIC eXtreme?
or even VIC UNLEASHED for names?

@WVL
From what I understand (in regards to FLI bug) the color port and data port on the VIC are actually open to receive data on the VIC? The only reason why you get undesired data is just because VIC isn't seeing the data that you would want it to, but VIC is actually display what it see's on the bus during the FLI bug area?

If my understanding is correct, that means that with this circuit (intercepting the data and address bus), ppl will no longer have to worry about FLI bug. Plus you get to FLI the COLOR RAM.

2010-03-02 14:41

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Quoting JackAsser

Regarding 3D-graphics and EOR-filling and what you would need.

1) Ways of setting up A^B = C (i.e. place A, B and C in memory and EOR-fill it).
2) Width, Height and Stride (bytes per line) for A, B and C.
3) Vertically EOR-fill that memory.
4) Clearing

With the above it would then be possible to HW-accelerate normal filled vectors and more over it would support the dithered EOR-filling used in f.e. Edge of Disgrace, Natural Wonders and Andropolis.

If you implement that, I can gladly "accellerate" my portal engine for you if you wanna test it.

I checked out the demo's you referenced. The Doom maze is very very impressive.

*EOR-fill (on the fly)
*memory fill
*memory clear
*memory copy
*column based screens (256 byte column offsets)
*row based screen (40 or 256 byte row offsets)
*COLOR RAM anywhere (allows FLI of COLOR RAM)
*Removal of 16k bank limitation
*Plus a couple of extra little tricks i may include as a suprise

This is pretty much the spec I'm working to, so it sounds like it will a least do part of what you need.

BTW, in regards to memory fill/clear/copy circuit, do you think this function is enough to be classed as a simple Blitter?

I have read your post several times, I almost understand it I think.

Are you using 4k at $2000 (representing 256*128) to draw the lines?

You are then EOR filling a bitmap $6000?

what is A^B=C all about? is that A to the power of B = C?

I think I kinda get it, kinda don't, could you please break it down for me a bit more?

As I said, that Doom maze was very very impressive, I would love to make this project help accelerate that kind of thing or even same speed but larger window. Is that maze using your portal engine?

2010-03-02 15:00

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: Quoting JackAsser
Regarding 3D-graphics and EOR-filling and what you would need.

1) Ways of setting up A^B = C (i.e. place A, B and C in memory and EOR-fill it).
2) Width, Height and Stride (bytes per line) for A, B and C.
3) Vertically EOR-fill that memory.
4) Clearing

With the above it would then be possible to HW-accelerate normal filled vectors and more over it would support the dithered EOR-filling used in f.e. Edge of Disgrace, Natural Wonders and Andropolis.

If you implement that, I can gladly "accellerate" my portal engine for you if you wanna test it.

I checked out the demo's you referenced. The Doom maze is very very impressive.

*EOR-fill (on the fly)
*memory fill
*memory clear
*memory copy
*column based screens (256 byte column offsets)
*row based screen (40 or 256 byte row offsets)
*COLOR RAM anywhere (allows FLI of COLOR RAM)
*Removal of 16k bank limitation
*Plus a couple of extra little tricks i may include as a suprise

This is pretty much the spec I'm working to, so it sounds like it will a least do part of what you need.

BTW, in regards to memory fill/clear/copy circuit, do you think this function is enough to be classed as a simple Blitter?

I have read your post several times, I almost understand it I think.

Are you using 4k at $2000 (representing 256*128) to draw the lines?

You are then EOR filling a bitmap $6000?

what is A^B=C all about? is that A to the power of B = C?

I think I kinda get it, kinda don't, could you please break it down for me a bit more?

As I said, that Doom maze was very very impressive, I would love to make this project help accelerate that kind of thing or even same speed but larger window. Is that maze using your portal engine?

It could indeed be called a simple blitter.

At $2000 is the line buffer yes. Typically y-linear, or column based. That simplifies line drawing alot.

A,B and C are simply the two input sources and the destination. A^B=C means: At C store A[j] eor B[k].

The maze is using the portal engine yes.

Typically an unrolled eor-filler on the C64 is something like (which converts column based buffers to bitmap based direcly aswell):

.repeat 32,column
.repeat 128,row
lda linebuffer+column*128 + row ;column based
eor bitmap+column+(row&$f8)*320+row&7 ;bitmap based
sta bitmap+column+((row+1)&$f8)*320+(row+1)&7 ;bitmap based one row down
.endrep
.endrep

Now, for dithering you'd need to loop this twice, but every second row. Even rows in the first pass, then odd rows in the second pass.

2010-03-02 15:02

JackAsser

Registered: Jun 2002
Posts: 1989

row&$f8 => row/8 ofcourse. Silly me...

2010-03-02 15:18

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Ok, I can see the convertersion of column based to bitmap based.

Still don't fully get how the filler is working, you have the lines in the line buffer, you EOR with bitmap and store in the next row, right? Is the bitmap clear when you start the EOR fill?

2010-03-02 15:21

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: Ok, I can see the convertersion of column based to bitmap based.

Still don't fully get how the filler is working, you have the lines in the line buffer, you EOR with bitmap and store in the next row, right? Is the bitmap clear when you start the EOR fill?

No, I'm just tired and lost. Here's the proper example:

.repeat 32,column
lda #0
.repeat 128,row
eor linebuffer+column*128 + row ;column based
sta bitmap+column+(row/8)*320+(row&7) ;bitmap based
.endrep
.endrep

:)

2010-03-02 15:34

Stingray
Account closed

Registered: Feb 2003
Posts: 117

ok, thanks.

So portal enigine draws the lines in a 256*128 column based format.

The portal engine then EOR fills it onto a bitmap

Is that right?

2010-03-02 15:39

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: ok, thanks.

So portal enigine draws the lines in a 256*128 column based format.

The portal engine then EOR fills it onto a bitmap

Is that right?

Correct. In two passes for dithering. Even row and odd rows seperatly.

However for a hw-assisted eor-fill I guess you don't have the luxury of having an A-register to use as temp. I.e. each store has to be stateless or?

If that is the case then you must do something similar to

lda linebuffer+row*128+column
eor bitmap+yaddayadda
sta bitmap+yaddayadda+one row down

And the engine must initialize the first row of the bitmap to the init values for the eor-filler (similar to the lda #0).

2010-03-02 15:47

Stingray
Account closed

Registered: Feb 2003
Posts: 117

The EOR-filler was to work by:

40 registers (in hardware), lets say they all start with #$00

The 40 bytes are for setting what the first byte will EOR with at the start of each column. This byte will not EOR with every byte of the column but just the first byte and the 40 bytes are used to hold the result of that EOR and then the next.

e.g.

byte for first column first row of bitmap is EOR with register 0, and result is stored in register 0.

byte for first column 2nd row of bitmap is EOR with register 1, and result is stored in register 1.

So on and so on, then VIC starts on next row, excepte registers now have a value in them.

repeat 200 times

screen done.

2010-03-02 15:54

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: The EOR-filler was to work by:

40 registers (in hardware), lets say they all start with #$00

The 40 bytes are for setting what the first byte will EOR with at the start of each column. This byte will not EOR with every byte of the column but just the first byte and the 40 bytes are used to hold the result of that EOR and then the next.

e.g.

byte for first column first row of bitmap is EOR with register 0, and result is stored in register 0.

byte for first column 2nd row of bitmap is EOR with register 1, and result is stored in register 1.

So on and so on, then VIC starts on next row, excepte registers now have a value in them.

repeat 200 times

screen done.

Sounds good. And for dithered filling the engine have to update those registers prior to the second pass.

2010-03-02 15:57

Stingray
Account closed

Registered: Feb 2003
Posts: 117

do you mean that you need to initialize the registers to specific numbers at the start of the screen draw?

The EOR is taking place as the sceen is drawn.

The memory does not get touched. (EOR fill is done with no cpu cycles!)

2010-03-02 16:05

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: do you mean that you need to initialize the registers to specific numbers at the start of the screen draw?

The EOR is taking place as the sceen is drawn.

The memory does not get touched. (EOR fill is done with no cpu cycles!)

Well, for dithered eor-filling you need to fill the screen in two passes. And the initial state for those registers are most often 0, but for objects partly outside the screen the state can be different.

So if you wanna support dithered eor-filling in one pass directly in the screen refresh you need to remap the addresses so that it uses the first 40 registers during the even 100 rows, then the second 40 registers during the odd 100 rows.

2010-03-02 16:12

Stingray
Account closed

Registered: Feb 2003
Posts: 117

ok, so a second set of 40 registers that can be used on every second row would do the trick?

Can you give simple explanation of dithering to me?

I can see the visual effect in the Demo, looks like you get diffrent shades, but how does it do that?

2010-03-02 18:18

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: ok, so a second set of 40 registers that can be used on every second row would do the trick?

Can you give simple explanation of dithering to me?

I can see the visual effect in the Demo, looks like you get diffrent shades, but how does it do that?

Yeps, a second set of registers for the odd rows would do the trick indeed.

Normal eor filling:

Line buffer:
11100000
00011100
00000011
00000000
00000000
00000000
00000000
00000000

After eor-filling:
11100000
11111100
11111111
11111111
11111111
11111111
11111111
11111111

Dither eor-filling, line buffer (2x thick lines, odd/even with 50% checker pattern):
10100000
01010100
00001010
00000001
00000000
00000000
00000000
00000000

Eor-Filling, even lines:
10100000
01010100
10101010
00000001
10101010
00000000
10101010
00000000

Eor-filling, odd lines also:
10100000
01010100
10101010
01010101
10101010
01010101
10101010
01010101

2010-03-02 19:07

PopMilo

Registered: Mar 2004
Posts: 145

@stingray: How hard would it be to implement graphic mode that would do something like 'repeat whole line N times' ?
To make 160x100, or 80x50 possible in pure hardware (no double STA's, and no cpu cycle steeling, and less memory needed).

I guess you can trick VIC to take data from same memory as last line?

It would be usefull for 3d...

2010-03-04 03:20

Stingray
Account closed

Registered: Feb 2003
Posts: 117

PopMilo,
It would be possible, would you really want to use the low res mode though?

2010-03-04 03:25

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Quoting JackAsser

Yeps, a second set of registers for the odd rows would do the trick indeed.

So if I include the extra circuitry will you make your portal engine more awesome for it?

BTW Any chance of getting a bigger 3D area? that would be a cool Doom maze :)

2010-03-04 03:41

Stingray
Account closed

Registered: Feb 2003
Posts: 117

BTW, we really need a name for this project.

Ideas so far:

VIC X (for VIC eXpanded or VIC eXtreme)
VIC ENHANCER
VIC UNLEASHED
VIC INTERCEPTOR
VIC AAA (Awesome Addressing Add-on)
BAD VIC
or
Alien VIC!

Or we could even name it after someone, like in honor of someone who has made massive contributions to the C64. Or maybe just name parts of the circuit in honor of some ppl. Like maybe we could name the blitter circuit the ....?

2010-03-04 05:33

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: Quoting JackAsser
Yeps, a second set of registers for the odd rows would do the trick indeed.

So if I include the extra circuitry will you make your portal engine more awesome for it?

BTW Any chance of getting a bigger 3D area? that would be a cool Doom maze :)

Sure, why not. However, for rapid testing I suggest to make a patch to VICE to test out your ideas, before actually implementing them in the HW. That would probably help you in prototyping, and it would certainly help me not coding in blind. :)

2010-03-04 09:18

Frantic

Registered: Mar 2003
Posts: 1627

I vote for "BAD VIC". :)

It has a nice 80s vibe to it. :)

2010-03-04 09:36

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: I vote for "BAD VIC". :)

It has a nice 80s vibe to it. :)

Maybe "Notorious V.I.C"?

2010-03-05 07:48

PopMilo

Registered: Mar 2004
Posts: 145

Quote: PopMilo,
It would be possible, would you really want to use the low res mode though?

Yes, at least in port of Yoomp from Atari :)

Eor filling doesn't help in that case but repeating lines without cpu time would save thousands of cycles each frame...

2010-03-06 09:26

Krill

Registered: Apr 2002
Posts: 2847

Quote: Yes, at least in port of Yoomp from Atari :)

Eor filling doesn't help in that case but repeating lines without cpu time would save thousands of cycles each frame...

I think what this implies is actually having display lists akin to what's implemented on Atari8. (See http://en.wikipedia.org/wiki/ANTIC)

You set a memory pointer in the VIC to fetch register offset, write cycle number (i.e., the x position defined as the offset to the previous write) and value (3 bytes or less if restrictions like writing to $d011/16/17/18 (2 bits) only are applied), and write data given by that display list to the referenced VIC registers at the specified x position. (Re repeating lines: One of the extra registers would then determine which of the 8 lines per char line you want to display.)

You could actually use the so far unused border cycles reading the colour data 4-bit bus for that, although that'd make generating these lists a little harder and halve the maximum throughput. Alternative list modes would define fixed register or offset definitions outside the list to be able to trade off between list size and access range/write density.

The benefits are obvious: Having more VIC register writes per scanline, specifically in the bad lines, would delight every coder fancying smart VIC raster trickery and open up a whole new range of oldskool effects and cpu-cycle-cheap display modes :)

2010-03-06 10:54

Oswald

Registered: Apr 2002
Posts: 5020

yes, a display list would be bloody cool.

2010-03-06 11:31

Krill

Registered: Apr 2002
Posts: 2847

Yes, and while we're at it, why not have an easy sideborderless mode where the expansion would flip the $d016 38/40 column bit at the right x positions twice each line automatically? Or the same with $d017 to stretch sprites? Or similar with $d011 for easy FLI/repeated badlines/linecrunching? Oh the possibilities! :)

2010-03-06 14:17

Martin Piper

Registered: Nov 2007
Posts: 634

I have a suspicion that having a display list writing to VIC registers during bad lines might not be so easy. This is because the VIC is controlling the whole of the memory bus and bandwidth, i.e. both phis, for the period of the bad line fetches. This is why the CPU is paused during a bad line.

2010-03-06 15:41

Krill

Registered: Apr 2002
Posts: 2847

This is true, but there are still 23 cycles left for register writes during a badline, and with something like interleaved CPU and display list register writes, more is possible than with only the CPU writes in the cycles left.

2010-03-06 16:21

PopMilo

Registered: Mar 2004
Posts: 145

"Display list" is maybe little to much but that is up to stingray to say :)

But those 20+ cycles could definitely be used in some way.

I thought about few bites somewhere in memory that would say to "VIC-III" :) how many lines to repeat.

Values would be 0,1,3 and would only manipulate data on address bus. One line down from bad line VIC would think its getting data from next 40bytes, while VIC-III would change those addresses back to previous 40 bytes... and so on...

Great idea with putting new hardware between VIC and rest of system !!! :)

I don't remember, is VIC in a socket or soldered ?
How hard will it be to insert this contraption ? :)

2010-03-07 11:42

Stingray
Account closed

Registered: Feb 2003
Posts: 117

First of all thanks for all the input, the input you guys have made has shaped the direction of this project and given me the inspiration to see this project through to the end.

Quoting PopMilo

I don't remember, is VIC in a socket or soldered ?

The VIC is socketed on every C64 (thank you Commodore), as far as I know anyway. I believe the SID is also socketed on every C64.

Quoting PopMilo

Yes, at least in port of Yoomp from Atari :)

Eor filling doesn't help in that case but repeating lines without cpu time would save thousands of cycles each frame...

I checked out Yoomp, very cool game. Are you going to port it? If you are going to port it I will try and help with the hardware. That game would be great for C64 & with nice SID music.

Quoting Krill

I think what this implies is actually having display lists akin to what's implemented on Atari8. (See http://en.wikipedia.org/wiki/ANTIC)

Quoting Oswald

yes, a display list would be bloody cool.

I have had this in mind for some time, I have been reluctant to include it in the spec for fear of disappointing ppl in the case that I don't not implement it in the end.

I was leaving this as on of those things I could do at the end after having done everything else I have already committed to doing with the project. But when I hear you guys and Krill say stuff like the followingQuoting Krill

Yes, and while we're at it, why not have an easy sideborderless mode where the expansion would flip the $d016 38/40 column bit at the right x positions twice each line automatically? Or the same with $d017 to stretch sprites? Or similar with $d011 for easy FLI/repeated badlines/linecrunching? Oh the possibilities! :)

Really inspires me to make sure I include this kind of thing, in fact I am now thinking I will do it, permitting I have enough real estate in the CPLD and enough time (I have now imposed a Deadline of sorts for this project on myself).

Quoting Martin Piper

I have a suspicion that having a display list writing to VIC registers during bad lines might not be so easy. This is because the VIC is controlling the whole of the memory bus and bandwidth, i.e. both phis, for the period of the bad line fetches. This is why the CPU is paused during a bad line.

You are 100% spot on Martin, while AEC is low, Vic can not be written to.

BTW, Can we call this something other then a Display List? We can come up with a better name then what those Atari guys use can't we? I have never really thought of what I had i mind as a Display List but more of a Direct Loader. Lets say the there are 19656 cycles per screen (the first being at the very start of the first raster). You can say, ok load VIC register $21 with #$01 on cycle 3150 etc.. Keeping in mind two things, that you can not do this on a bad cycle + you can't have 6510 write to VIC at the same time (If both write to VIC at once 6510 will be ignored). You will also gain a few cycles on a badline for Direct Loading as you could still Direct Load while BA is low.

Just to put it a simpler way, you can load any value into any VIC register on any of the 19656 cycles that make up a screen, as long as it's not a badcycle.

It would even be possible to change the background color (or any VIC register) on every cycle!! as long as not on a badcycle. The more I think about the more we need this, in fact I'm kinda thinking now that If I don't include this I am kind of wasting the project.

Is "Direct Loader" a good enough name for this part of the circuit?

Also will this allow PopMilo to port his game? I guess he would use the Direct Loader to FLD??

2010-03-07 11:56

Stingray
Account closed

Registered: Feb 2003
Posts: 117

VIC X (VIC eXpanded / VIC eXtreme): 2 VOTES
BAD VIC : 1 VOTES
Notorious V.I.C : 1 VOTES
Alien VIC : 1 VOTES (My own vote)
VIC ENHANCER : 0 VOTES
VIC UNLEASHED : 0 VOTES
VIC INTERCEPTOR : 0 VOTES
VIC AAA (Awesome Addressing Add-on) : 0 VOTES

VIC-X is winning!

Need a couple more ppl to vote on a name.

BTW, if I am a bit slow at responding for a few days, I'm not being rude, I am just really getting stuck in to this project ;)

2010-03-07 12:27

Graham
Account closed

Registered: Dec 2002
Posts: 990

Atari8 display lists do not have the ability to write to any register. You are mixing up Amiga Copper with A8 display lists.

2010-03-07 12:29

Martin Piper

Registered: Nov 2007
Posts: 634

Quoting stingray

Is "Direct Loader" a good enough name for this part of the circuit?

It is. Or "Copper". :)

I would definitely use the system for everything related to VIC effects. Opening the borders, multiplexing really tight sprite formations, nice expanded graphics screens. Leaving the CPU free to do all the calculation, it would be marvelous.

2010-03-07 12:52

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Quoting Graham

Atari8 display lists do not have the ability to write to any register. You are mixing up Amiga Copper with A8 display lists.

Yep, I probably am. Thanks Graham.

2010-03-07 13:25

Oswald

Registered: Apr 2002
Posts: 5020

"Is "Direct Loader" a good enough name for this part of the circuit?"

yeah, but the data Direct Loader loads I'd still call a display list :) maybe register list, thats a bit closer to whats happening.

2010-03-07 13:37

Graham
Account closed

Registered: Dec 2002
Posts: 990

A display list is just what the name says: a list of modes to display. No registers involved. Some circuit doing register loads is not a display list.

2010-03-07 13:51

Oswald

Registered: Apr 2002
Posts: 5020

graham, for god's sake please stop being mr smartass.

1. nobody said display list can write to video regs
2. it doesnt matter if strictly speaking registers are involved or not what happens is almost the same. (and I suggested register list because in stingray's implementation registers would be involved!)
3. a display list is more than JUST a list of modes. (interrupts, bitmap start addy, blank lines, scrolling)

2010-03-07 14:55

Krill

Registered: Apr 2002
Posts: 2847

While "VIC-X" sounds cool in English, it's definitely got some drawbacks if you understand German. But then there's a classic demo effect called "wanking", too.. :D

"VIC-X" is still the best name so far.. :)

2010-03-07 14:56

PopMilo

Registered: Mar 2004
Posts: 145

Quoting stingray

...Just to put it a simpler way, you can load any value into any VIC register on any of the 19656 cycles that make up a screen, as long as it's not a badcycle...

This sounds like 'keep hardware simple and let the software do all the extra stuff', and I like it :)

This would make sprite multiplexing easy :)

I would only add that if it is possible you should make more registers available for change. For example some kind of relative address offset that would be one byte and would be added to current VIC memory fetch address in every cycle.

Then It would be possible to duplicate lines with only changing that offset once each line - 200 cycles for making Nx100 resolution instead of thousands needed so far...

Wait .... If that register is internal on your device than these kinds of 'internal writes' maybe don't need to take cycles from CPU ?

Any help from hardware will boost any existing 3d project out there and inspire new ideas :)

2010-03-07 16:35

Graham
Account closed

Registered: Dec 2002
Posts: 990

@Oswald: A lot of posts refer to doing "display lists" for register writes. Just read some of the recent posts.

2010-03-07 16:44

QuasaR

Registered: Dec 2001
Posts: 145

Me likes Notorious V.I.C. but maybe it's too long... ;)

2010-03-07 16:57

Oswald

Registered: Apr 2002
Posts: 5020

Quote: @Oswald: A lot of posts refer to doing "display lists" for register writes. Just read some of the recent posts.

I believe people simply spared the time writing this: "(yes I know that a display list does not write registers, but I am just using loosely this term because it comes close to this thing)"

I would think Martin & Krill are smart enough to know what atari display lists are exactly.

2010-03-07 20:23

Krill

Registered: Apr 2002
Posts: 2847

Yeah, without knowing both the Atari ANTIC and Amiga Copper lists in detail, I think our register list here is something in between the two. It writes actual registers and does not just switch modes, but only on the video chip (plus DMA for the 16kB of RAM visible to it), but cannot write the registers of the other chips on the bus.

But that's an academic discussion, so we should rather discuss the best way to implement this list so the overhead required to both execute and generate it is minimal.

2010-03-07 21:04

Oswald

Registered: Apr 2002
Posts: 5020

how about this:

have a list of registers and values to write for each line, and you can specify horizontally the cycle where it should start stuffing the regs. HW would buffer up the needed regs and values one line before.

list would look like:

$horizontal,$reg,$val,$reg,$val,$ff
$horizontal,$reg,$val,$reg,$val,$ff
$horizontal,$reg,$val,$reg,$val,$ff
$ff

$ff= stop/do nothing

$horizontal each line could be skipped and just set up by the cpu once, for most stuff it would be a constant anyway I guess.

$ff could be skipped aswell if we maintain a constant nr of regs&vals. some dummy color reg writes wont hurt.

2010-03-07 21:46

Krill

Registered: Apr 2002
Posts: 2847

There is a variable number of possible write cycles though, depending on which sprites are enabled and badline or not. And the chip also needs to fetch that list. Upload it to some chip-internal RAM or DMA or colour bus?

2010-03-08 01:05

Martin Piper

Registered: Nov 2007
Posts: 634

Quoting Graham

A display list is just what the name says: a list of modes to display. No registers involved.

Not correct. A "display list" can be for example a sequence of commands to draw vector lines on a display.

Quoting Graham

Some circuit doing register loads is not a display list.

Also not correct. Graham, I come from the old days when a "display list" was any method, hardware or software, used to optimise writing to display hardware registers. On various systems writing to the display hardware registers in a short amount of time was advantageous because then the writes could be squeezed in the VBLANK or HBLANK and remove visible artifacts.

For example during my time at Argonaut Games the term "display list" was used quite a lot in this context. I particularly remember SEGA Triforce arcade hardware mentioning display lists.

So while Atari could be argued to have used the term "display list" early on the term was also used in relation to other systems during the old games programming days.

2010-03-08 01:48

Stingray
Account closed

Registered: Feb 2003
Posts: 117

VIC X (VIC eXpanded / VIC eXtreme): 3 VOTES
Notorious V.I.C : 2 VOTES
BAD VIC : 1 VOTES
VIC III: 1 VOTES
Alien VIC : 1 VOTES (My own vote)
VIC ENHANCER : 0 VOTES
VIC UNLEASHED : 0 VOTES
VIC INTERCEPTOR : 0 VOTES
VIC AAA (Awesome Addressing Add-on) : 0 VOTES

Added VIC III (I think PopMilo suggested this)

2010-03-08 04:26

Conjuror

Registered: Aug 2004
Posts: 168

I vote for VIC eXtreme

2010-03-08 09:28

Skate

Registered: Jul 2003
Posts: 491

VIC-X is nice. VIC eXpanded makes more sense than VIC eXtreme to me. So my vote goes to "VIC eXpanded".

2010-03-08 17:01

PopMilo

Registered: Mar 2004
Posts: 145

Quoting stingray

...
Added VIC III (I think PopMilo suggested this)

Yes I did, but I like "VIC eXpanded" more :)
So my vote goes to "VIC X".

2010-03-08 17:09

Shadow
Account closed

Registered: Apr 2002
Posts: 355

The chip in the Commodore 65 was called VIC-III, so perhaps better leave that name to history.

2010-03-10 17:00

Iapetus/Algarbi/Wood

Registered: Dec 2004
Posts: 71

Great stuff stingray,

My vote goes to VIC-X

2010-08-21 15:50

Stingray
Account closed

Registered: Feb 2003
Posts: 117

The prototype PCB is in production :)

2010-08-21 16:48

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: The prototype PCB is in production :)

What are the final specifications? Or that's up to how you program the PIC or whatever?

2010-08-22 09:03

Stingray
Account closed

Registered: Feb 2003
Posts: 117

A fair amount of the logic is done but I can't guarantee a final spec until the project is completely finished (it also comes down to how much logic I can fit), so please don't hold me to anything.

The final spec should look something like the following:

EOR FILL (on the fly, no cpu overhead!!)
EOR FILL DITHERED (on the fly, no cpu overhead!!)
COLOUMN AND ROW FORMATED SCREENS
HARDWARE SCROLL
DIRECT LOADER
TEXAN MODE (Demo Coders will love this)

Plus you will be able to use the VIC to output more colors (due to movable color RAM) in each 8*8 cell (or 4*8 cell) then ever before! The best C64 graphics are still to come!

Using the direct loader will allow you to change more VIC registers per line and also reduce CPU overhead.

Hardware scroll = brilliant bitmap graphics for games (even better when combined with FLI of the extra color RAM!).

The EOR FILLS and column based screen are going to give you a tone more CPU cycles to do your 3D calculations with! Plus using Direct Loader and Texan mode will give you a some extra cycles per frame then usual. We are still to see the fastest 3D on the C64!

This VIC add-on board is going to, for the first time, open up the full power of the VIC chip. When the DIRECT LOADER is used in conjunction with the TEXAN mode I think ppl are going to come up with some amazing stuff and really start to realise and unleash the potential of the VIC.

Also, remember this VIC add-on is being designed to fit discretely inside your C64, to maintain 100% compatibility with all software and to maintain the essence of a C64 (you can think of this as the mod for C64 purists). The add-on has to be activated (by the coder) otherwise it just sits there and it's C64 as usual. This add-on is also being designed so that you will not have to hack or damage your C64's circuit board to fit it. I would also like (once the project is finished) for there to be emulation of it in VICE for everyone to use.

I have been doing this project in what little free time I have available hence it taking so long, I will try to post some photos soon.

A very big thank you to everyone who has helped me with this, when I started to do this project I really didn't have any idea at all about how 3D graphics are done.

2010-08-23 00:41

Frantic

Registered: Mar 2003
Posts: 1627

"The best C64 graphics are still to come!"
"We are still to see the fastest 3D on the C64!"

Yeah, right.. ;) I mean, I am not trying to make you less happy about your achievements with that hardware or anything. Good work with that, I suppose! Neither am I some sort of purist (that you mention) that don't like when people experiment with new hardware. I just have a slightly hard time to swallow the verbal formulation (i.e. not the project as such) that this is "C64 graphics", since the "improvements" are in fact wholly due to additional hardware. Of course the VIC is still involved, and so forth, but if any kind of hardware expansion to the c64 would still count as "C64" then I guess a C64 could be anything. ...like a space-ship, a cyborg, or a laser cannon, or why not a washing machine? Indeed, from that perspective it certainly looks like the most amazing part of the C64's life may be yet to come. :)

2010-08-23 03:58

Stingray
Account closed

Registered: Feb 2003
Posts: 117

That's right, it won't be a stock C64 it will be a C64 with a VIC-X installed. Definitely laser cannon, C64 would never be a washing machine.

2010-08-23 11:52

Skate

Registered: Jul 2003
Posts: 491

Quote: That's right, it won't be a stock C64 it will be a C64 with a VIC-X installed. Definitely laser cannon, C64 would never be a washing machine.

c64 IS a washing machine. it's a brain washing machine. that's why we still use it after ~30 years.

happy to see some progress in this 6 years old project.

2010-08-23 13:48

Stingray
Account closed

Registered: Feb 2003
Posts: 117

Quote: c64 IS a washing machine. it's a brain washing machine. that's why we still use it after ~30 years.

happy to see some progress in this 6 years old project.

LOL "brain washing machine". Yes 6 years :( yeh, what can I say, As long as I beat the release of Pinball Dreams :)

2010-08-23 13:53

Stingray
Account closed

Registered: Feb 2003
Posts: 117

You are meant to lose the bottom line from the polygons when EOR filling right??

2010-08-23 19:07

Skate

Registered: Jul 2003
Posts: 491

afaik, that's right.

2010-08-24 21:54

PopMilo

Registered: Mar 2004
Posts: 145

Great news!

I like that you are not trying to generate new type of palette or resolutions or something like other 8bit add-ons that change the spirit of them completely...

Tweeking address and data buses on fly and doing blitter stuff is enough for years of experimenting :)

ps. What is TEXAN mode ?

2010-08-25 14:23

Martin Piper

Registered: Nov 2007
Posts: 634

Good luck Stingray. :)

2010-09-07 10:37

Stingray
Account closed

Registered: Feb 2003
Posts: 117

This is a shot of the EOR filler working. The first picture is with out the EOR filling enabled, second shot is the same screen but with the VIC-X doing EOR fill.

2010-09-07 10:39

Frantic

Registered: Mar 2003
Posts: 1627

Aha! I see this is hardware is suitable for nazi propaganda! ;)

2010-09-07 10:54

Stingray
Account closed

Registered: Feb 2003
Posts: 117

LOL, inspired by Wolfenstein :)

2010-09-07 12:21

TWW

Registered: Jul 2009
Posts: 541

I'll take one. When do you expect to have it in mass-production?

;)

2010-09-07 13:05

Stingray
Account closed

Registered: Feb 2003
Posts: 117

At this rate 2020, which really sux cus I think the world is meant to end in 2012 or something? Hopefully soon but just depends on how much free time I get :)

2010-09-07 14:25

encore

Registered: Aug 2010
Posts: 61

Just came to think that this could make Stunt Car Racer look like how it did on the Amiga/Atari version (and probably increase framerate too). The Vic-X concept was new to me but sounds very interesting.

2010-09-07 14:41

Oswald

Registered: Apr 2002
Posts: 5020

stunt car does not use eor filling, it fills only the sky, and even that on a char by char basis, when it hits the road then goes pixel by pixel...

2011-05-13 07:38

Fresh

Registered: Jan 2005
Posts: 101

I followed this thread waiting to see the birth of a new hw but it looks like too much time has passed since last Stingray's post...
It sounded very promising, I really hope this thing still being under development.
... any news about this lil beauty?

2011-05-13 09:30

WVL

Registered: Mar 2002
Posts: 886

Jackasser : how many fps could your 3d engine put out if you'd have had EOR-on-the-fly and clear-memory?

2011-05-13 09:37

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: Jackasser : how many fps could your 3d engine put out if you'd have had EOR-on-the-fly and clear-memory?

WVL: dunno to be honest, not that much I think. When the FPS is slow most time is spent in math and line drawing anyway. Using the c128 version u get a frame rate counter. Stand close to a wall so that you only see one complete solid, then u get the EOR+CLEAR fps basically. When the FPS drops it's because of line drawing and math, not filling and clearing.

2011-05-13 10:29

WVL

Registered: Mar 2002
Posts: 886

yesyes, i understand clearing and EOR take the same time always... I can code too, ya know :D

let's calc..

eor+sta takes 6 cycles per byte (if you plot inside the eor-speedcode, that is). 'dumb' clearing takes 4 cycles/byte (storing 0 in the eor-speedcode).

that's 10 cycles per byte. Let's say half of the screen get's updated, then you have 40*200/2 = 4000 bytes = 40.000 cycles saved. So that's about 2 frames.

Now let's do some maths..

fps	frames/update	VICX frames/update	VICX fps	speed increase
1	50,0		48,0			1,0		4%
2	25,0		23,0			2,2		9%
3	16,7		14,7			3,4		14%
4	12,5		10,5			4,8		19%
5	10,0		8,0			6,3		25%
6	8,3		6,3			7,9		32%
7	7,1		5,1			9,7		39%
8	6,3		4,3			11,8		47%
9	5,6		3,6			14,1		56%
10	5,0		3,0			16,7		67%
11	4,5		2,5			19,6		79%
12	4,2		2,2			23,1		92%
13	3,8		1,8			27,1		108%
14	3,6		1,6			31,8		127%
15	3,3		1,3			37,5		150%
16	3,1		1,1			44,4		178%
17	2,9		1,0			50,0		194%
18	2,8		1,0			50,0		178%
19	2,6		1,0			50,0		163%
20	2,5		1,0			50,0		150%
21	2,4		1,0			50,0		138%
22	2,3		1,0			50,0		127%
23	2,2		1,0			50,0		117%
24	2,1		1,0			50,0		108%
25	2,0		1,0			50,0		100%

No idea how many fps it can do now though, but this should give a good idea about the increase in speed.

I'd call anything above 10fps good for c64 purposes.. considering that, it's a bit disappointing that only the 7-9 fps get boosted to 10fps..

2011-05-13 13:30

chatGPZ

Registered: Dec 2001
Posts: 11119

Quote:

I followed this thread waiting to see the birth of a new hw but it looks like too much time has passed since last Stingray's post...
It sounded very promising, I really hope this thing still being under development.
... any news about this lil beauty?

i think VICX turned into Alienflash somehow: http://www.lemon64.com/forum/viewtopic.php?t=36830

2011-05-13 18:44

JackAsser

Registered: Jun 2002
Posts: 1989

Quote: yesyes, i understand clearing and EOR take the same time always... I can code too, ya know :D

let's calc..

eor+sta takes 6 cycles per byte (if you plot inside the eor-speedcode, that is). 'dumb' clearing takes 4 cycles/byte (storing 0 in the eor-speedcode).

that's 10 cycles per byte. Let's say half of the screen get's updated, then you have 40*200/2 = 4000 bytes = 40.000 cycles saved. So that's about 2 frames.

Now let's do some maths..

fps frames/update VICX frames/update VICX fps speed increase 1 50,0 48,0 1,0 4% 2 25,0 23,0 2,2 9% 3 16,7 14,7 3,4 14% 4 12,5 10,5 4,8 19% 5 10,0 8,0 6,3 25% 6 8,3 6,3 7,9 32% 7 7,1 5,1 9,7 39% 8 6,3 4,3 11,8 47% 9 5,6 3,6 14,1 56% 10 5,0 3,0 16,7 67% 11 4,5 2,5 19,6 79% 12 4,2 2,2 23,1 92% 13 3,8 1,8 27,1 108% 14 3,6 1,6 31,8 127% 15 3,3 1,3 37,5 150% 16 3,1 1,1 44,4 178% 17 2,9 1,0 50,0 194% 18 2,8 1,0 50,0 178% 19 2,6 1,0 50,0 163% 20 2,5 1,0 50,0 150% 21 2,4 1,0 50,0 138% 22 2,3 1,0 50,0 127% 23 2,2 1,0 50,0 117% 24 2,1 1,0 50,0 108% 25 2,0 1,0 50,0 100%

No idea how many fps it can do now though, but this should give a good idea about the increase in speed.

I'd call anything above 10fps good for c64 purposes.. considering that, it's a bit disappointing that only the 7-9 fps get boosted to 10fps..

The speed up would be greater actually since the C64-version didn't have memory for a fully unrolled filler and clearer. The C128-version have that otoh + relocatable ZP and stack optimizations so I guess with VICX it would be somewhat close to the C128-version except for not using the 2mhz-mode.

2011-05-13 19:55

Oswald

Registered: Apr 2002
Posts: 5020

I guess you could go fullscreen without loosing much speed.

Refresh

Subscribe to this thread: