                       

                       
                       
                       
                       
                     The 80x86 Microprocessor Family Story
                     -------------------------------------
                            Written by Brad Proctor
                                  
                                  Release 2
                              
                              September 16, 1996

                            Compuserve: 71534,2302
                      Internet: 71534.2302@compusere.com



                   []======================================[]
                               COPYRIGHT NOTICE

                       Copyright (C) 1996 by Brad Proctor.
                              All Rights Reserved.
                   []======================================[]

   





   Introduction
   ------------

   The information in this text is accurate to the best of my ability.  If
you do however find information in this text to be false,  please e-mail me
the correct information, as I would like to update this information.  Also
I would like to know anyhow out of curiosity. I will be adding periodically 
to the text with new information as I learn more.  My e-mail address will 
change soon, which I will post in the next document.  But whatever e-mail 
is sent to the address above will be forwarded to me.  I hope you find this 
text informative and helpful. 8-P




   
   The Assembly Line
   -----------------

   There is a company named ABC that makes hats. In the beginning, ABC only
had one worker.  This worker was required to do all the parts of the assembly
of the hat, from getting the matterial off of the trucks to loading the hats
onto the other trucks. After the first year, ABC was able to hire an worker
for each step in the process of the assembly line.  This speed up the 
production of the hats considerably.  The only problem was that the worker
that took the material off the truck to the beginning of the assembly line
was not fast enough.  This caused the other workers to run out of work to do
while the material was being unloaded. The ABC company was doing well by the
end of the second year, so ABC decided to come up with a way of overcomming 
their problem.  They decided build a machine that took the material off of 
the truck and placed it into the assembly line automatically.  This got rid 
of the problem of the workers running out of work to do.  There was still a
problem though.  Ever once in a while a hat would require special material
that took longer for the machine to fetch, causing still a hold up in the 
assembly line.  The ABC company decided that there was no way around this but
still the demands of theirs customers was so much they needed a faster way to
make hats.  Their solution was to build a second assembly line.  Since the 
hats had to be loaded onto the trucks in a certain order they had to make 
sure that the hats where made in the right way.  If one hat was on the first
assembly line and another hat was made on the other they must not interfere
with eachother or they would get loaded onto the truck in the wrong order. 
This did speed up there producion by quite a bit.  There was a problem though.
The assembly line were set up so that if the a special material was needed
thus stoping one of the lines, the other line would have to stop also because
the types of hats needed to be loaded onto the truck in the certain order.
   But yet again, by the end of the year, they needed to create more hats 
to meet the demains of the customers.  So, ABC decided to try to come up with
a solution to their problem.  This time they added a special feature that 
allowed the workers to order the special material when needed so that they 
could continue to work while the material was being fetched.



   CPU History
   -----------

   The story of the growth of the 80x86 microprocessor family is very closely
related to this little story.  Originally, the 8088 was like the ABC company
with only one worker.  Each part of the execution of an instruction was done
one at a time, by only one 'worker'.  Each part takes one clock cycle.  thus
on a 5Mhz-8Mhz bus this is pretty slow.  It was an 8 bit processor and 
consisted of aproximetely 30,000 transistors.  Only the bare minimum was built 
into the processor.  The next step up was the 8086, which with a lower number
sounds like it should be a step down instead of a step up but the differents
is that the 8086 had a 16-bit bus.  All others where essentailly the same. 
After the 8086 came the 80286, this processor was similar to the second year
at the ABC company.  This processor had the capability to do each part in the
processing sequence separetly.  This was quite an improvement over the 8088 
and the 8086.  Also the 80286 runs at 8Mhz-16Mhz which did help out quite a
bit.  The 80286 had about 100,000 transistors, which is about 3 times that
of the 8088/8086, as you will see, each new generation of microprocessors
triples in the amount of transistors.  The 80286 was a 24-bit microprocessor.
The next step up was the 80386.  This processor was similar in design of the
80286 but was a 32-bit processor which evened things out much better than
the 24-bit processing of the 80286.  This is because a byte is 8-bits and 
on the 80286 this means it could handle 3 bytes at a time.  Kind of an uneven
way of doing things.  But on the 80386 this is cured with 32-bits, which
means 4 bytes at a time.  The 80386 also has increased clock speeds from 
16Mhz-40Mhz.  Also the 80386 has about 300,000 transisors, thus tripling the
number once again.  The next processor is was the 80486.  This 80486 is 
similar to the next year at the ABC company.  The ABC company added the 
machine to gather the material and put it at the beginning of the assembly 
line.  The 80486 does a similar thing by using cache memory to fetch the code
from DRAM and load it into the faster, cache memory.  The 80486 has increased
clock speeds at 20Mhz-66Mhz.  There is another special thing that was 
introduced with the 80486, which is clock doubling.  On special 80486 
processors the internal processing can be faster than the master clock by a
factor of 1.5, 2.0, 3.0  This speeds up internal computations by the master
clock times the factor.  Athough, whenever the processor must recieve data
from the DRAM, it fetches it as the speed of the master clock.  Another 
special feature of the 80486 is that most of them were built with an internal
math coprocessor.  SX meaning no coprocessor, and DX meaning a coprocessor is
built in.  When the processor has clock speeds faster than the standard clock
they are represented by the factor at the end of the name.  for example,
a 80486 with a built in coprocessor, with clock doubling would have the name
80486dx2, or clock tripling would give the deceiving name 80486dx4, yes, thats
dx4, not dx3.  A clock speed factor of 1.5 would be named dx2 also.  With the
clock factor in consideration the 80486 clock speeds vary from 20Mhz-133Mhz.
Also the 80486 has about 1,000,000 transistors, which is around triple that
of the 80386.  The next processor is the Pentium, or 80586.  This CPU is 
similar to the next year of the ABC company.  The ABC company created a second
assembly line for faster production.  The 80586 does things in a similar way.
This processor has a method of being able to execute two instructions at a 
time.  This process is call superscaler.  The 80586 examines an instruction
and determines if the instruction has to executed in order, or if it can be
executed at the same time as another instruction.  If the instruction can be
executed seperatly than that instruction is placed into the 'V' pipeline. 
and the other instruction is placed in the 'U' pipeline and both instructions
are executed at the same time.  If order does matter, then the instructions
are executed one at a time through the 'U' pipeline.  The 80586 also has a
method of branch prediction.  The 80586 predicts ahead of time whether a 
branch will be taken or not.  The CPU fetches the instruction from whichever
side of the branch that was predicted.  If the branch prediction fails then
the processing of the next instruction is terminated and the next instruction
is loaded from the correct side of the branch.  This is a major improvement
over past processors in which ever branch would stop everthing and a new set
of instructions would have to be fetched.  The 80586 runs with clock speeds
of 60Mhz-166Mhz.  The 80586 uses the same clock speed technology as the 80486
and all 80586 processors have built in coprocessors.  Hence, the SX and DX
names where dropped.  The 80586 has about 3,000,000 transistors, which again
is about triple that of it predicesor.  The Pentium Pro, or 80686, uses 
methods similar to the ABC company in the final year of the ABC story.  The 
80686 is the newest processor currently available (as of 9/96), and has the
capability to determine if an instuction can wait while the data needed is
fetched.  If it can wait, then it steps out of line while the other 
instructions are executed, then, when the data is available from DRAM, the
instruction is executed.  The 80686 uses the superscalar methods presented
in the 80586 along with the branch prediction.  The 80686 has clock speeds
of 166Mhz-200Mhz (as of 9/96), and even though I am not positive, I am 
assuming that the 80686 has approximately 9,000,000 to 10,000,000 transistors.




   CPU Notes
   ---------


The 80486 processor has five steps to processing an instruction.  This makes
it possible for the 80486 to execute one instruction per clock cycle.  The 
80486 also contains an on-chip cache, or 'L1' cache, that is 8k in size.  The
cache can be used for loading code and data, but data has priority.  The 
80586 also has an on-chip cache, 8k cache for code and 8k cache for data.  
Here is the five steps to instruction execution for the 80486 and higher 
CPU's.


Prefetch (PF)
-------------
   This is where the instruction is loaded into the storage area that holds
the instructions before they are loaded into the CPU.  The prefetch area on 
the 80486 is a 2-byte by 2-deep area.

Decode (D1)
-----------
   This is where the CPU decodes the value of the instruction and decides
how to execute the instruction.

Address Generation (D2)
-----------------------
   This is where the CPU calculates the effective address and the linear
address in parallel.  This is calculated in 1 clock cycle. Unless the 
indexing address is used, in which case this will keep the address in D2 
for a period of 2 clock cycles.

Execution (E)
-------------
   The machine operations are executed.

Write Back (WB)
---------------
   The Write Back stores the answer to the execution in whatever register
or data it needs to be stored in.  



   The 80586 and 80686 processors use this five step executions method along 
with the branch prediction and the superscalar technologies.  When writting
code for the superscalar chips, it is best to use instructions that are 
pairable.  Pairable instructions mean that the two instructions can be 
executed in either order and make no difference to the program flow.  When 
instructions are paired.  The CPU can execute both instructions at the same 
time.  One instruction would be loaded into the 'U' pipeline while the other
into the 'V' pipeline.  These CPU's therfore have the capablilty to add three
numbers, in only one clock cycle.


   80x86 Registers
   ---------------

   The 8086 had 16-bit registers, each of the general purpose registers, 
which have a 'X' at the end are 16-bits, these registers can be split into
two 8-bit registers each.  they have either an 'H', or a 'L'.  The 'H' means
that it is the higher 8 bits, and the 'L' means that it is the lower 8 bits
of the 'X' register.  Intel came up with names to describe the intended use
of the registers.  Isn't that cute.


               16 bits
      -------------------------
  AX  |    AH     |    AL     |  Accumulator
      -------------------------
  BX  |    BH     |    BL     |  Base
      -------------------------
  CX  |    CH     |    CL     |  Count
      -------------------------
  DX  |    DH     |    DL     |  Data
      -------------------------

               16 bits
      -------------------------
      |          DI           | Destination Index
      -------------------------
      |          SI           | Source Index
      -------------------------
      |          BP           | Base Pointer
      -------------------------
      |          SP           | Stack Pointer
      -------------------------

      -------------------------
      |          CS           | Code Segment 
      -------------------------
      |          DS           | Data Segment
      -------------------------
      |          ES           | Extra Segment
      -------------------------
      |          SS           | Stack Segment
      -------------------------


               16 bits
      -------------------------
      |          IP           | Instruction Pointer
      -------------------------

               16 bits
      -------------------------
      |         Flags         | Flags
      -------------------------

   The 80386 and 80486 have 32-bit registers these registers are all 
accessable by using the 'E' on the beginning.  When the 'E' is placed on the
beginning the register must be 32-bits.  By this I mean you cannot use 
something like 'EAH' or 'EAL' it must be 'EAX'.  The segment registers are 
the only registers to maintain 16-bits.  All of the 16-bit registers are still
useable just like on the 80386/80486 precessors.



                           32 bits
      ------------------------------------------------
 EAX  |                       |    AH     |    AL    | Accumulator
      ------------------------------------------------
 EBX  |                       |    BH     |    BL    | Base
      ------------------------------------------------
 ECX  |                       |    CH     |    CL    | Count
      ------------------------------------------------
 EDX  |                       |    DH     |    DL    | Data
      ------------------------------------------------
                                                         
                           32 bits
      ------------------------------------------------
      |                      EDI                     | Destination Index
      ------------------------------------------------
      |                      ESI                     | Source Index
      ------------------------------------------------
      |                      EBP                     | Base Pointer
      ------------------------------------------------
      |                      ESP                     | Stack Pointer
      ------------------------------------------------

      -------------------------
      |          CS           | Code Segment 
      -------------------------
      |          DS           | Data Segment
      -------------------------
      |          SS           | Stack Segment
      -------------------------
      |          ES           | Extra Segment
      -------------------------
      |          FS           | Extra Segment
      -------------------------
      |          GS           | Extra Segment
      -------------------------


                           32 bits
      ------------------------------------------------
      |                      EIP                     | Instruction Pointer
      ------------------------------------------------
      |                     Flags                    | Flags
      ------------------------------------------------



   Real vs. Protected
   ------------------

   The 80286 processor was the first to have the capability to switch from
real mode to protected mode.  Before then, there was no other mode but real
mode.  When the 80286 was designed the 8088/8086, was very popular and there
where many programs already written for it.  This required that the 80286 be
able to run these programs.  So, Intel gave the 80286 the switching capability
to change modes.  Real mode is the design of the 8088/8086.  

Real Mode Addressing

   In real mode, the memory is accessable up to 1Mb.  The full address is 
actually 20-bits, The physical address is calculated by taking the segment
and shifting it 4 times.  Which is equivalent to adding a zero onto the end
of the value if it is in hex.  Then adding the offset.  For example, If the 
segment was 1002h, and the offset was 3333h, then after shifting the segment 
4 times we would have 10020h, then we add the offset to get 13353h.  The 640k
barrier is the limit to the amount of memory that a program can be loaded 
into.  Real mode achitecture allows you to access extended memory, but does
not allow you to load you program into that memory.  Only data can be stored
outside of the 640k limit.  This means that your programs cannot be larger 
than 640k in size.  

Protected Mode Addressing

   In protected mode, the memory is no longer limited to 640k, you can put 
your code or data in any area of memory you wish.  The memory is accessable
up to 4Gb.  Protected mode uses descripor tables to access memory.  In these
descriptor tables there is a granularity bit.  If the bit is not set, the 
memory is accessable in 1 byte units with the maximum being 1Mb, this is 
similar to the way real mode memory is accessed.  When the bit is set memory 
is accessable in 4k units.  Making it posible to access up to 4Gb.  Memory is
not accessed with segments and offsets but one number.  For example, If the 
memory address was 400h and the granularity bit was not set, this would be
equvalent to the real mode address of 0000:0400, but if the granularity bit 
was set then we would have to first multiply the 400h by 4096, which would 
give us 4,188,304. The address of 401h would give 4,192,400. 

------------------------------------------------------------------------------
Copyright (C) 1996 by Brad Proctor





