Central processing unit

(Redirected from Central Processing Unit)

This article needs to be cleaned up to conform to a higher standard of quality.
This article has been tagged since August 2005.
See Wikipedia:How to edit a page and Category:Wikipedia help for help, or this article's talk page.

A central processing unit (CPU) refers to part of a computer that interprets and executes instructions and data contained in software. The more generic term processor is sometimes used to refer to a CPU as well; see processor (disambiguation) for other uses of this term. Microprocessors are CPUs that are manufactured on integrated circuits, often as a single-chip package. Since the mid-1970s, these single-chip microprocessors have become the most common and prominent implementations of CPUs, and today the term is almost always applied to this form.

The term "Central processing unit" is, in general terms, a functional description of a certain class of programmable logic machines. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term and its acronym have been used since the early 1960s.

Contents

History

Image:IBM 603 multiplier.jpg
IBM 603 vacuum tube multiplier. Similar units were included as part of early electronic computers.
Main article: History of computing hardware

Prior to the advent of machines that resemble today's CPUs, computers such as ENIAC had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed program computers" since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (program) executing device, the earliest devices that could rightly be called CPUs came with the advent of the stored program computer. The idea of a stored program computer was already present during the design of ENIAC, but was not used in that computer due to speed considerations. Before ENIAC was even completed, on 1945-06-30 mathematician John Von Neumann published the paper entitled First Draft of a Report on the EDVAC, which outlined the design of a stored program computer that would eventually be completed in August 1949. [1] EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high speed computer memory, rather than being specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With Von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory.

It should be noted that while Von Neumann is most often credited with the design of the stored program computer due to his design of EDVAC, others before him such as Konrad Zuse had suggested similar ideas. Additionally, the so-called Harvard architecture of the Harvard Mark I, which was completed before EDVAC, also utilized a stored-program design using punched paper tape rather than electronic memory. The key difference between the Von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily Von Neumann in design, but elements of the Harvard architecture are commonly seen as well.

Being digital devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. During the height of electromechanical and electronic computers, electrical relays and vacuum tubes (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building direct current sequential logic circuits out of relays requires additional hardware to cope with the problem of contact bounce. While vacuum tubes don't suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning due to the slow contamination of their cathodes that occurs when the tubes are in use. Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing unit so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster, but less reliable than electromechanical (relay based) computers. Tube computers like EDVAC tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I failed very rarely. In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs. Clock signal periods ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the switching speed of the switching devices they were built with.

Discrete component transistor CPUs

The design and complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tubes and electrical relays. With this improvement, more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete transistor components. In 1964, IBM introduced its System/360 computer architecture, which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a microprogram, which still sees widespread usage in modern CPUs (often called "microcode"). [2] The System/360 architecture was so popular that it dominated the mainframe computer market for the next few decades and left a legacy that is still continued by similar modern computers like the IBM zSeries. In the same year (1964), Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8. DEC would later introduce the extremely popular PDP-11 line that was eventually moved to manufacture on integrated circuits once these became practical. While discreet component transistor CPUs were in heavy usage, new high performance designs like SIMD (Single Instruction Multiple Data) vector processors began to appear. These early experimental designs later gave rise to the era of specialized supercomputers like those made by Cray Inc.

Transistor based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lowered power consumption, transistors also allowed CPUs to operate at much higher speeds due to the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period.

Microprocessors

The most recent technological improvement that has affected the design and implementation of CPUs came in the mid-1970s with the microprocessor. Since the introduction of the first microprocessor (the Intel 4004) in 1970 and the first widely-used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other implementations. While the previous generation of CPUs was integrated as discrete components on one or more circuit boards, microprocessors are manufactured onto compact integrated circuits (ICs), often a single chip. The smaller transistor sizes mean faster switching time due to decreased gate capacitance. This has allowed synchronous microprocessors to utilize clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity of and number of transistors in a single CPU has increased dramatically. This trend has been observed by many and is often described by Moore's law, which has proven to be a fairly accurate model of the growth of CPU (and other IC) complexity to date.

While the complexity, size, construction, and general form of CPUs has changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as Von Neumann stored program machines.

As the aforementioned Moore's law continues to hold true, concerns about the limits of integrated circuit transistor technology have become much more prevalent. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer as well as expand the usage of parallelism and other methods that extend the usefulness of the classical Von Neumann model.

CPU operation

The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. Herein we are discussing devices that conform to the common aforementioned Von Neumann architecture. The program is represented by a series of numbers that are kept in some kind of computer memory. There are three steps that nearly all Von Neumann CPUs use in their operation, fetch, decode, and execute.

The first step, fetch, involves retrieving an instruction (which is a number or sequence of numbers) from program memory. The location in memory is determined by a program counter, which stores a number that identifes the current location in this sequence. In other words, the program counter keeps track of the CPU's place in the current program. Having been used to fetch an instruction, the program counter is incremented by the number of memory units fetched.

Image:Mips32 addi.png
Diagram showing how one MIPS32 instruction is decoded.

The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's Instruction set architecture (ISA). Often, one group of numbers of the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. The operands may contain a constant value in the instruction itself (called an immediate value), or a place to get a value: a register or a memory address. In older designs the portions of the CPU responsible for instruction decoding were unchangable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is often rewritable and can be modified to change the way the CPU decodes instructions even after it has been manufactured.

Image:CPU block diagram.png
Block diagram of a simple CPU

After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are "connected" (by a switching device such as a multiplexer) so they can perform the desired operation. If, for instance, an addition operation was requested, an ALU will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set (see the discussion of integer precision below). Various structures can be used for providing inputs and outputs. Often, relatively fast and small memory areas called CPU registers are used when a result is temporary or will be needed again shortly. Various forms of computer memory (for example, DRAM) are also often used to provide inputs and outputs for CPU operations. These types of memory are much slower compared to registers, both due to physical limitations and because they require more steps to access than the internal registers. However, compared to the registers, this external memory is usually more inexpensive and can store much more data, and is thus still necessary for computer operation.

Some types of instructions manipulate the program counter. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs. [3] Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later instruction to determine program flow.

After the execution of the instruction, the entire process repeats, with the next instruction cycle normally fetching next-in-sequence instruction due to the incremented value in the program counter. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as a 'single cycle data path,' which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontrollers).

Design and implementation

Main article: CPU design

Integer precision

The way a CPU represents numbers is a design choice that affects the most basic assumptions about how the device functions. Some early digital computers used the common decimal (base ten) numeral system to internally represent numbers. Other computers have used more exotic numeral systems like ternary (base three). By far, the most common numeral system used in CPUs is the binary (base two) system. Nearly all modern CPUs represent numbers in binary form, each digit being interpreted from some physical quantity such as "high" and "low" voltage.

Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a 'bit' refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "bit width," "data path width," or "integer precision" when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 28 or 256 discrete numbers. Integer precision can also affect the number of locations in memory the CPU can "address" (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet (8 bits), the maximum quantity of memory that CPU can address is 232 octets, or 4 GiB. This is a very simple view of CPU address space, and many modern designs use much more complex addressing methods in order to locate more memory with the same integer precision.

Higher levels of integer precision require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and generally expense. It is not at all uncommon, therefore, to see 4 or 8 bit microcontrollers used in modern applications, even though CPUs with much higher precision (such as 16, 32, 64, even 128 bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra precision (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit precisions, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370 used a CPU that was primarily 32-bit, but it used 128-bit precision inside its floating point units to facilitate greater accuracy and range in floating point numbers. Many later CPU designs use similar mixed bit width, especially when the processor is meant for general purpose usage where a reasonable balance of integer and floating point capability is required.

Clock rate

Main article: Clock rate

Most CPUs, and indeed most sequential logic devices, are synchronous in nature. That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a "clock signal," usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal. This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a transistor count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has previously been compensated for by the addition of instruction pipelining in superscalar CPUs.

Pipelining alone does not solve all of the drawbacks of globally synchronous CPUs, though. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led to the requirement in many modern CPUs to be provided with multiple identical clock signals rather than a single signal that would be significantly delayed if it drove all the switching elements. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch, regardless of whether or not they are being used at that time. In general, a component that is switching uses more energy than a switching element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.

One method of dealing with the switching of unneeded components is a technique called clock gating which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, "clockless" (or asynchronous) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous. For example, using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous components, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers.

Scalar, superscalar, and pipelining

SIMD and vector processors

Parallelism

Main article: Parallel computing

See also

Notes

  1. ^  Some early computers like the Harvard Mark I did not support any kind of "jump" instruction, effectively limiting the complexity of the programs they could run. It is largely for this reason that these computers are often not considered to contain a CPU proper, despite their close similarity as stored program computers.

References

  1. ^  von Neumann, John (1945). "First Draft of a Report on the EDVAC." Moore School of Electrical Engineering, University of Pennsylvania.
  2. ^  Amdahl, G. M., Blaauw, G. A., and Brooks, F. P. Jr. (1964). "Architecture of the IBM System/360." IBM Research.
  3. ^  MIPS Technologies, Inc. (2005). "MIPS32® Architecture For Programmers Volume II: The MIPS32® Instruction Set." MIPS Technologies, Inc..

External links

CPU designers:

Others:

da:Central Processing Unit de:Central Processing Unit es:CPU fa:پردازنده fr:Processeur ko:중앙처리장치 hr:Procesor it:CPU he:מעבד la:Processor centralis hu:CPU ms:Unit Pemproses Pusat nl:Processor id:CPU ja:CPU no:CPU pl:Jednostka centralna (procesor) pt:CPU sk:CPU sl:Procesor fi:Suoritin sv:CPU th:หน่วยประมวลผลกลาง vi:CPU zh:中央处理器 de:Prozessor eo:Procezilo lv:Procesors lt:Procesorius hu:Processzor no:Prosessor pt:Processador ru:Процессор sk:Procesor sr:Процесор