Subject: VESA SVGA - line code and info
Date: 6 Feb 1995 18:39:08 GMT

Hello everyone!

This is a mini-tutorial, and code, relating to VESA SVGA programming.  The
code includes a line procedure which is based upon Bresenham's algorithm.  It
is not blazingly fast, but hopefully it'll work on all SVGA cards with VESA
support, and it is pretty compact - no special cases for slopes.

Many people try to begin programming in SVGA modes straight from mode 13h, or
Xmode variants.  They quickly encounter the problem of only 64K of vid mem
being accessable - falls a little short of the 300K required for 640x480x8bit!
The special address space A000h - AFFFh must be mapped to different parts of
the video memory to make use of it.  This can be done via VESA functions.  If
you don't yet have 'vesasp12.txt' (PCGPE contains this document) then I
suggest you get it from x2ftp.oulu.fi /pub/msdos/programming/specs/vesasp12.
This document details the VESA BIOS extensions used to get info on video
modes, set video modes, pan across a larger virtual screen, and set the CPU
window (A000-AFFF) to map to different places in video mem.

Even though VESA provides a common interface for SVGA cards, there are still
some specifics that have to be dealt with.  The 'granularity' of the window
is the smallest amount by which it can be moved.  A 64KB granularity with
1MB video memory means the CPU window can be mapped to one of 16 'chunks' in
this memory.  A 4KB granularity has more potential mappings - the window is
still 64KB in size, but it can be positioned on any 4K boundary in video.  I
know granularitys of 4K, 16K, 32K, and 64K exist.  Some cards are switchable
(actually the only chipset I'm familiar with that has this option is Cirrus
Logic - defaults to 4K, can be set to 16K.  I think this is necessary for
accessing >1MB).

I see two ways to manage this discrepancy.  Code can assume 64K granularity
always, and the 'bank-switching' routines make sure the window is moved by this
ammount (4K gran would require inc/dec by 16).  The other way is to deal with
each granularity differently - this is how the line code provided below
operates.

Finer granularity can speed up rendering.  Line drawing will be used to
illustrate.  The linear start address is calculated.  The low 16 bits of this
address are mappable to the 64K window.  The high order bits can be used to
locate the position of the window.  With a 64K granularity, the high word is
our window location, and the low word is the displacement into the window.
With 4K gran, The low 12 bits are the offset, and remaining high bits are the
window location.  If a line begins near the end of a 64K aligned chunk (linear
position 123840, say), and continues down a short distance, It'll cross a 64K
boundary.  With 64K gran, the window will have to be moved.  Using a 4K
granularity, the initial offset into a window can be kept below 4096.  So,
lines that aren't too long can always be kept within the starting window.

Another advantage that fine granularity provides is easier alignment with the
edge of the screen.  With a horizontal resolution of 640, 32 lines takes up 
20KB, which is divisible by 4KB.  If all windowing is then limited to be
aligned on these 20KB bounds, one will never have to worry about overflowing
past the end of the window while drawing across a scan-line.  The windows
are positioned so that the 'bottom' of the window is on these 20K bounds.
Inner rendering loops that move across a horizontal line don't bother with
checking for a 'page-cross'.  The outer loop checks for overflow when it moves
down to the next scan-line.  A 64K granularity doesn't align until 512
vertical lines (320KB), which means the inner loop must check for 
page-crossings within the scan-line.

Note: one way to create easy alignment with the edge is to change the length
of a scanline to a power of 2 (say 1024).  This wastes video memory, but
it can be well worth it.  Check vesasp12.txt for setting this.

The line procedure, below, does take advantage of positioning the top of the
window as close to the top of the line as it can.  Thus window moving for
mid-length lines is reduced for cards that have smaller granularities.  It
does not take advantage of alignment with the screen edge.  The code is made
to be fairly 'straight-forward', not much fancy is done - it's just simple,
flexible, small, and I hope easy to understand.  One easy optimization to add
is to check if the endpoints lie in different window addresses - if not, a
routine without a page-cross check can be called; otherwise the standard
routine is called.

Careful eyes may notice that lines are always rendered from top to bottom, but
I have a macro to move the CPU window UP!  A situation where this is needed:
A window begins 382 pixels across on a scan-line.  A line is started just two
pixels into the window (at 383).  The endpoint is on the far left of the screen
(0), and 5 pixels down from start.  The line is going to begin with a string of
pixels straight to the left - passing BACKWARD through the window boundary.
This occurance requires the 'PageUp' macro.  If alignment is done with the
screen edge, this isn't necessary.


This code is provided for learning purposes, and may be used in any fashion
desired - it's free!  If the code doesn't work for you, please let me know.  I
haven't had opportunity to test it on other systems.  It didn't get a rigorous
test on mine either - paging is untested.  Conversion to other resolutions is
pretty simple.  The linear address calculation is all that has to be modified
(I think!?) - 'bx' may be too small at higher resoultions - use 'ebx'.

This can be assembled with:.tasm /m2 /ml <filename>
....tlink /3 <filename>
Or pieces can be extracted, and interfaced to whatever you wish,
however you wish.

-Anthony Tavener 'Daoloth of MetaSentience'
-cs94169@cs.ualberta.ca (Temporary - friend's account)

---CODE BEGIN---
.486
code.segment para public use16
.assume.cs:code

PgDown..macro
.push.bx
.push.dx
.xor.bx,bx
.mov.dx,cs:winpos
.add.dx,cs:disp64k
.mov.cs:winpos,dx
.call.cs:winfunc
.pop.dx
.pop.bx
..endm

PgUp..macro
.push.bx
.push.dx
.xor.bx,bx
.mov.dx,cs:winpos
.sub.dx,1
.mov.cs:winpos,dx
.call.cs:winfunc
.add.di,cs:granmask
.inc.di
.pop.dx
.pop.bx
..endm

.mov.ax,seg stk.;\
.mov.ss,ax..;.set up program stack
.mov.sp,200h..;/

.call.GetVESA..;init variables related to VESA support

.mov.ax,4f02h.;\
.mov.bx,0101h.;.VESA mode 101h (640x480x8bit)
.int.10h..;/

.mov.ax,0a000h
.mov.ds,ax

.mov.eax,10h..;\
.mov.ebx,13h
.mov.ecx,20bh.;test Lin procedure
.mov.edx,1a1h
.mov.ebp,21h
.call.Lin..;/

.mov.ax,4c00h
.int.21h

GetVESA..proc
;This is just a hack to get the window-function address for a direct call,
;and to initialize variables based upon the window granularity.
.mov.ax,4f01h..;\
.mov.cx,0101h
.lea.di,buff...;.use VESA mode info call to..
.push.cs...;.get card stats for mode 101h
.pop.es
.int.10h...;/
.add.di,4
.mov.ax,word ptr es:[di].;get window granularity (in KB)
.shl.ax,0ah
.dec.ax
.mov.cs:granmask,ax..; = granularity - 1 (in Bytes)
.not.ax
.clc
GVL1:.inc.cs:bitshift..;\
.rcl.ax,1...;.just a way to get vars I need :)
.jc.GVL1...;/
.add.cs:bitshift,0fh
.inc.ax
.mov.disp64k,ax
.add.di,8
.mov.eax,dword ptr es:[di].;get address of window control
.mov.cs:winfunc,eax
.ret
buff..label.byte
..db.100h dup (?)
..endp

Lin..proc
;Codesegment: Lin
;Inputs: eax: x1, ebx: y1, cx: x2, dx: y2, bp: color
;Destroys: ax, bx, cx, edx, si, edi
;Global: winfunc(dd),winpos(dw),page(dw),granmask(dw),disp64k(dw),bitshift(db)
;Assumes: eax, ebx have clear high words

.cmp.dx,bx...;\
.ja.LinS1...;.sort vertices
.xchg.ax,cx
.xchg.bx,dx...;/

LinS1:.sub.cx,ax...;\
.ja.LinS2...;.calculate deltax and
.neg.cx...;.modify core loop based on sign
.xor.cs:xinc1[1],28h..;/

LinS2:.sub.dx,bx...;deltay
.neg.dx
.dec.dx

.shl.bx,7...;\
.add.ax,bx...;.calc linear start address
.lea.edi,[eax][ebx*4].;/

.mov.si,dx...;\
.xor.bx,bx
.mov.ax,cs:page.;\
.shl.ax,2..;.pageOffset=page*5*disp64K
.add.ax,cs:page
.mul.cs:disp64k.;/
.push.cx...;.initialize CPU window
.mov.cl,cs:bitshift..;.to top of line
.shld.edx,edi,cl
.pop.cx
.add.dx,ax
.and.di,cs:granmask
.mov.cs:winpos,dx
.call.cs:winfunc
.mov.dx,si...;/

.mov.ax,bp
.mov.bx,dx

;ax:color, bx:err-accumulator, cx:deltaX, dx:vertical count,
;di:location in CPU window, si:deltaY, bp:color

LinL1:.mov.[di],al...;\
.add.bx,cx
.jns.LinS3
LinE1:.add.di,280h
.jc.LinR2...;.core routine to
.inc.dx...;.render line
.jnz.LinL1
.jmp.LinOut
LinL2:.mov.[di],al..;\
xinc1..label.byte
LinS3:.add.di,1..;.this deals with
.jc.LinR1..;.horizontal pixel runs
LinE2:.add.bx,si
.jns.LinL2..;/
.jmp.LinE1...;/

LinR1:.js.LinS7...;\
.PgDown....;.move page down 64k..
.mov.ax,bp
.jmp.LinE2
LinS7:.PgUp....;.or up by 'granularity'
.mov.ax,bp
.jmp.LinE2...;/

LinR2:.PgDown....;\
.mov.ax,bp...;.move page down 64k
.inc.dx
.jnz.LinL1...;/

LinOut:.mov.cs:xinc1[1],0c7h
.ret
..endp

winfunc..dd.?.;fullpointer to VESA setwindow function
winpos..dw.?.;temp storage of CPU window position
granmask.dw.?.;masks address within window granularity
disp64k..dw.?.;number of 'granules' in 64k
page..dw.0.;video page (0,1,2 for 1MB video)
bitshift.db.0.;used to extract high order address bits..
....;\ for setting CPU window
..ends

stk.segment para stack use16 'STACK'
..dw.100h dup (?)
..ends
..end
---CODE END---


