Affordable kilo-instruction processors

Tesis doctoral de Miquel PericÁ s Gleim

Several motives explain the slowdown of high-performance single-thread processor development. On the one hand, aggressive techniques such as superpipelining or out-of-order execution have a considerable impact on power consumption and design complexity. On the other hand, the increment in processor frequencies has led to a large disparity between processor speed and memory access time. Although cache memories considerably reduce the number of accesses to main memory, the remaining accesses introduce latencies large enough to considerably decrease performance. Conventional techniques such as out-of-order execution, while effective in hiding l2 cache accesses, cannot hide latencies this large. Queues of hundreds of entries and thousands of registers would be necessary in order to prevent execution from stalling in the event of a l2 cache miss. Unfortunately, current technology cannot efficiently implement such structures monolithically, as access latencies would considerably increase, as would power consumption and area consumption. in this thesis we studied techniques that allow the processor to continue processing instructions in the event of main memory accesses. The conditions for such a processor to be implementable are that it should be based on structures of conventional size and that it should feature simple control logic. The challenge lies in being able to design a distributed processor with simple control. The design of this processor has been approached by analyzing the behavior of a processor with infinite resources. We have observed that execution follows a very interesting pattern based on execution locality. In numerical codes we observed that over 70% of all instructions do not depend on memory accesses. This is interesting since it shows that there is always a large portion of instructions that can be executed shortly after decode. This allows us to propose a new kind of processor with two execution units. The first unit, the cache processor, processes memory-independent instructions at high speed. The second unit, the memory processor, processes instructions that depend on main memory accesses, but using relaxed scheduling logic, which allows it to scale to thousands of in-flight instructions. This proposal, which receives the name of decoupled kilo-instruction processor (d-kip), has several advantages. On the one hand it allows the construction of a kilo-instruction processor based on conventional structures and, on the other hand, it simplifies the design as the interaction between both execution units is minimal. in this thesis two implementations for this kind of processor are presented: the original d-kip and the flexible heterogeneous multicore (fmc). The performance of these proposals is analyzed and compared to other proposals that increase memory-level parallelism, such as prefetching or runahead execution. It is observed that the fmc processor performs at the same level of a conventional processor with a window of around 1500 instructions. Further, the integration of the fmc processor into a multicore/multiprogrammed environment is studied. This thesis concludes with the proposal of a two-level load/store queue for this kind of processor.

 

Datos académicos de la tesis doctoral «Affordable kilo-instruction processors«

  • Título de la tesis:  Affordable kilo-instruction processors
  • Autor:  Miquel PericÁ s Gleim
  • Universidad:  Politécnica de catalunya
  • Fecha de lectura de la tesis:  09/12/2008

 

Dirección y tribunal

  • Director de la tesis
    • Mateo Valero Cortés
  • Tribunal
    • Presidente del tribunal: eduard Ayguadé parra
    • theo Ungerer (vocal)
    • gurindar singh Sohi (vocal)
    • osman Unsal García (vocal)

 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Scroll al inicio