|
本帖最后由 lizoyu 于 2010-11-24 21:51 编辑
Echelon:NVIDIA 和其他厂商一起合作的 ExaFLOPS 系统架构代号
Flops:每秒峰值速度/每秒浮点运算次数
HPC:高性能运算
各种数量级列表
Name | FLOPS | | 1024 | | 1021 | | 1018 | | 1015 | | 1012 | | 109 | | 106 | | 103
| 目前Flops最高的是中国的天河一号,2.507 PFlops
可以看到Echelon是为ExaFlops时代的HPC所设计的.
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cycle—the equivalent of 20 teraflops on a chip. A chip with just eight of the cores would someday power a handset, Dally said.
The Echelon chip packs just twice as many cores as today's high-end Nvidia GPUs. However, today's cores handle just one double precision floating-point operation per cycle, compared to four for the Echelon chip.,
Many of the advances in the chip come from its use of memory. The Echelon chip will use 256 Mbytes of SRAM memory that can be dynamically configured to meet the needs of an application.
For example, the SRAM could be broken up into as many as six levels of cache, each of a variable size. At the lowest level each core would have its own private cache.
The goal is to get data as close to processing elements as possible to reduce the need to move data around the chip, wasting energy. Thus SMs would have a hierarchy of processor registers that could be matched to locations in cache levels. In addition, the chip would have broadcast mechanisms so that the results of one task could be shared with any nodes that needed that data.
Echelon的每秒浮点运算次数:20TFlops.
Echelon拥有128个SM,每个SM有8个内核,共计上千个内核,数量两倍于目前的 Fermi
但是Echelon每一个内核的双精度计算能力都是Fermi的4倍.
片上SRAM有256MB,可按应用分级分配,每一级的容量均可不相同,最靠近SP的一级可以作为私有内存使用.
Echelon的广播机制允许任务的结果能够被任何需要的节点共享.
它将采用 CPU ISA
(文章转自GZ顶级研讨区,稍作修改)
注:256M的是SRAM,就是平常所说的缓存,而不是普通的DRAM(内存).
要打造256M的SRAM,大概需要128亿晶体管.
那么128亿晶体管是什么概念?下面给一组数据.
6870-18亿晶体管.GTX580-30亿晶体管.
而且128亿只是SRAM部分的而已.
不过对于他的能力,配备256M的SRAM也是很正常的.
|
|