
              

                    Perspective Transforms

               


        By Andre Yew (andrey@gluttony.ugcs.caltech.edu)




    This is how I learned perspective transforms --- it was

intuitive and understandable to me, so perhaps it'll be to

others as well.  It does require knowledge of matrix math

and homogeneous coordinates.  IMO, if you want to write a

serious renderer, you need to know both.


   First, let's look at what we're trying to do:

               S (screen)

               |    * P (y, z)

               |   /|

               |  / |

               | /  |

               |/   |

               * R  |

             / |    |

            /  |    |

           /   |    |

   E (eye)/    |    | W

---------*-----|----*-------------

         <- d -><-z->


   E is the eye, P is the point we're trying to project, and

R is its projected position on the screen S (this is the point

you want to draw on your monitor).  Z goes into the monitor (left-

handed coordinates), with X and Y being the width and height of the

screen.  So let's find where R is:


    R = (xs, ys)


    Using similar triangles (ERS and EPW)


    xs/d = x/(z + d)

    ys/d = y/(z + d)

    (Use similar triangles to determine this)


    So,


    xs = x*d/(z + d)

    ys = y*d/(z + d)


    Express this homogeneously:


    R = (xs, ys, zs, ws).


    Make xs = x*d

         ys = y*d

         zs = 0 (the screen is a flat plane)

         ws = z + d


    and express this as a vector transformed by a matrix:


    [x y z 1][ d 0 0 0 ]

             [ 0 d 0 0 ]    =  R

             [ 0 0 0 1 ]

             [ 0 0 0 d ]


    The matrix on the right side can be called a perspective transform.

But we aren't done yet.  See the zero in the 3rd column, 3rd row of

the matrix?  Make it a 1 so we retain the z value (perhaps for some

kind of Z-buffer).  Also, this isn't exactly what we want since we'd

also like to have the eye at the origin and we'd like to specify some

kind of field-of-view.  So, let's translate the matrix (we'll call

it M) by -d to move the eye to the origin:


    [ 1 0 0  0 ][ d 0 0 0 ]

    [ 0 1 0  0 ][ 0 d 0 0 ]

    [ 0 0 1  0 ][ 0 0 1 1 ]  <--- Remember, we put a 1 in (3,3) to

    [ 0 0 -d 1 ][ 0 0 0 d ]       retain the z part of the vector.


    And we get:


    [ d 0 0  0 ]

    [ 0 d 0  0 ]

    [ 0 0 1  1 ]

    [ 0 0 -d 0 ]


    Now parametrize d by the angle PEW, which is half the field-of-view

(FOV/2).  So we now want to pick a d such that ys = 1 always and we get

a nice relationship:


    d = cot( FOV/2 )


    Or, to put it another way, using this formula, ys = 1 always.


    Replace all the d's in the last perspective matrix and multiply

through by sin's:


    [ cos 0   0    0   ]

    [ 0   cos 0    0   ]

    [ 0   0   sin  sin ]

    [ 0   0   -cos 0   ]


    With all the trig functions taking FOV/2 as their arguments.

Let's refine this a little further and add near and far Z-clipping

planes.  Look at the lower right 2x2 matrix:


   [ sin sin ]

   [-cos 0   ]


   and replace the first column by a and b:


   [ a sin ]

   [ b 0   ]

   [ b 0   ]


   Transform out near and far boundaries represented homogeneously

as (zn, 1), (zf, 1), respectively and we get:


   (zn*a + b, zn*sin) and (zf*a + b, zf*sin).


   We want the transformed boundaries to map to 0 and 1, respectively,

so divide out the homogeneous parts to get normal coordinates and equate:


    (zn*a + b)/(zn*sin) = 0 (near plane)

    (zf*a + b)/(zf*sin) = 1 (far plane)


   Now solve for a and b and we get:


   a = (zf*sin)/(zf - zn)

     = sin/(1 - zn/zf)

   b = -a*zn

   b = -a*zn


   At last we have the familiar looking perspective transform matrix:


   [ cos( FOV/2 ) 0                        0            0 ]

   [ 0            cos( FOV/2 )             0            0 ]

   [ 0            0 sin( FOV/2 )/(1 - zn/zf) sin( FOV/2 ) ]

   [ 0            0                    -a*zn            0 ]


   There are some pretty neat properties of the matrix.  Perhaps

the most interesting is how it transforms objects that go through

the camera plane, and how coupled with a clipper set up the right

way, it does everything correctly.  What's interesting about this

is how it warps space into something called Moebius space, which

is kind of like a fortune-cookie except the folds pass through

each other to connect the lower folds --- you really have to see

it to understand it.  Try feeding it some vectors that go off to

infinity in various directions (ws = 0) and see where they come

out.

