NVIDIA GPU Deep Learning（深層学習）開発環境、構築、最適化情報

法人、ビジネス、教育機関、政府機関、NPO等向けウイルス対策セキュリティソフト：Avast Business Cybersecurity Solutions

NVIDIA^® GPU Deep Learning （深層学習）開発環境構築情報

こちらは、「NVIDIA^® Deep Learning （深層学習）」を開発するPC（ワークステーション、サーバー）に必要な開発環境を構築する方法の概略や、構築に参考となる情報を集めたページとなります。
各々の情報のサイト・リンク情報、NVIDIA^® CUDA、CUDA GUP ドライバー、NVIDIA^® DIGITS、cuDNN、フレームワーク（Caffe、theano、torch、BIDMach）などをダウンロード、インストールする方法の概略情報があります。
既に Deep Learning の開発環境をお持ちの方も、また、これから導入を考えておられる方にも、何かのご参考になれば幸いです。
インテル^® Xeon^® プロセッサや、Xeon^® Phi™ x200 プロセッサ（Knights Landing）を使用した場合の Deep Learning 開発に関しての情報はこちらをご覧ください。
（2017/04/08 更新）

GPU ハードウェア要件 Turing Architecture GPUs	プロセッサ	ドライバー	CUDA Toolkit	DIGITS	cuDNN
Frameworks
Caffe	theano	torch	BIDMach	Keras	Another Frameworks

Arcbrain Deep Learning GWS Middle Tower Chassis

株式会社アークブレインでは、Deep Learning（深層学習）を研究するためのカスタム・オーダーメイドのワークステーション（GPU × 1～4）や、サーバー（GPU × 1～4）を販売しております。

Intel^® Xeon^® Scalable Processor 搭載
Arcbrain　オリジナル　サーバー、ワークステーション製品最新ラインナップ

GPUを使用せず、Intel^® Xeon^® / Core™ Processor、インテル^® Parallel Studio、インテル^® MKL（Math Kernel Library）の DNN（Deep Neural Network）プリミティブ、インテル^® Distribution for Python^® による Deep Learning（深層学習）の開発環境を構築することも可能です。

お気軽に弊社までお見積り依頼をお願いいたします。
ご希望の仕様に合った構成のカスタマイズに、ご対応させていただきます。

NVIDIA^® Accelerated Computing Developer Program
	NVIDIA^® Deep Learning の開発環境（NVIDIA^® CUDA、NVIDIA^® DIGITS™、cuDNN等）を入手するためには、まず、Accelerated Computing Developer Program に登録する必要があります。 https://developer.nvidia.com/accelerated-computing-developer

GPU バードウェア要件 - Hardware Requirements

NVIDIA^®
GPU

Compute
Capability

NVIDIA^® GPU（Graphics Processing Unit）のハードウェア要件としまして、Deep Learningに必要な、DIGITS、cuDNN や Caffe を動作させるためには、CUDA　のバージョンは 7.0 以上、 Kepler microarchitecture 以降のアーキテクチャの GPU で、Compute Capability が 3.0 以上である必要があります。
Fermi アーキテクチャーの GPU は、Compute Capability が 2.0 / 2.1 であるため、残念ながら Deep Learning 用には使用することができません。
Tesla GPU である C2075 / C2070 / C2050 は、Fermi アーキテクチャーであり、Compute Capability が 2.0 であるため、同様に Deep Learning 用には使用することができません。
GeForce GPUs with Kepler or higher アーキテクチャー (CUDA 7.5 Installation Guide)
CUDA 7.0 and a GPU of compute capability 3.0 or higher are required. (cudnn_install.txt)
Kepler は 3.0 ～ 3.5、Maxwell は、5.0 以上となりますので、勿論 Deep Learning 開発様に使用することが可能です。
Pascal アーキテクチャーは、Compute Capability が 6.0 以上となります。
Turing アーキテクチャーは、Compute Capability が 7.0 以上となります。

https://developer.nvidia.com/cuda-gpus

http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capability

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

※NVIDIA^® Deep Learning （深層学習）開発環境に対応したGPUの例
Tesla K20 (3.5) / K40 (3.5) / K80 (3.7) / M40 (5.2)
Quadro 410 (3.0) / K420 (3.0) / K600 (3.0)
Quadro K2000 (3.0) / K4000 (3.0) / K4200 (3.0) / K5000 (3.0) / K5200 (3.2) / K6000 (3.5)
Quadro K620 (5.0) / K1200 (5.0) / K2200 (5.0)
Quadro M2000 (5.2) / M4000 (5.2) / M5000 (5.2) / M6000 (5.2)
Quadro NVS 510 (3.0) / 810 (5.0)
Jetson TK1 (3.2) / Tegra K1 (3.2) / Tegra X1 (5.2)
GT 640 [GDDR5] (3.5) / GTX 650 (3.0) / GTX 660 (3.0) / GTX 670 (3.0)
GeForce GTX 750 (5.0) / GTX 760 (5.0) / GTX 770 (5.0) / GTX 780 (5.0)
GeForce GTX 950 (5.2) / GTX 960 (5.2) / GTX 970 (5.2) / GTX 980 (5.2)
GeForce GTX TITAN (3.5) / GTX TITAN Black (3.5) / GTX TITAN Z (3.5) / GTX TITAN X (5.2)
NVIDIA Quadro M6000 24GB (5.2, 24GB GDDR5, 3072 Core, 384bit, 317GB/s, 250W, ECC対応, DVI-Ix1, DP1.4x4)

● NVIDIA^® Turing™ Microarchitecture RTX GPU 性能比較一覧表
RTX GPU Specification Comparison List（2019年7月22日現在）　【Turing Microarchitecture 詳細情報】

Model	GPU Number	Memory	RTX-OPS ROPs (T)	SM Count	CUDA Core (=SM*64 =FP32 Core =INT32 Core)	RT Core (=SM)	Tensor Core (=SM*8)	FP64 Core (=SM*2)	GPU Clock (MHz)	Bus Width (bits)	Band Width (GB/s)	FP64 double (GFLOPS)	FP32 float (GFLOPS)	FP16 falf (GFLOPS)	FP32/16 Tensor (TFLOPS)	INT32 (TIPS)	INT8 (TIPS)	INT4 (TIPS)	TDP (W) [PIN]	NVLink SLI Bridge (GB/s)	Price (US$)
・NVIDIA^® GeForce^® RTX GPU　 https://www.nvidia.com/en-us/geforce/20-series/ TECHPOWERUP　https://www.techpowerup.com/gpu-specs/（リリース順）
TITAN RTX	TU102	24GB GDDR6	96	72 12*6	4608 [64]*72[SM]	72	576	144?	1350 - 1770 (OC)	384 32*12	672	509.8	16,312	32,625	130.5	16.3	261	522	280 [8+8]	✔ 100	2,499
GeForce RTX 2080 Ti Founders Edition	TU102	11GB GDDR6	78 ～ 88	68 12*6-4	4352 [64]*68[SM]	68	544	136?	1350 - 1635 (OC)	352 32*(12-1)	616	420.2	13,448	26,895	56.9	14.2	227.7	455.4	260 [8+8]	✔ 100	1,199
GeForce RTX 2080 Ti	TU102	11GB GDDR6	76 ～ 88	68 12*6-4	4352 [64]*68[SM]	68 12*6-4	544	136?	1350 - 1545	352 32*(12-1)	616	420.2	13,448	26,895	53.8	13.4	215.2	430.3	250 [8+8]	✔ 100	1,199
GeForce RTX 2080 Super	TU104	8GB GDDR6	64	48 8*6	3072 [64]*48[SM]	48 8*6	384	96?	1650 - 1815	256 32*8	495.9	348.5	11,150	22,300	44.6	11.2	180.2	360.5	215 [6+8]		699
GeForce RTX 2080 Founders Edition	TU104	8GB GDDR6	64	46 8*6-2	2944 [64]*46[SM]	46	368	92?	1515 - 1800 (OC)	256 32*8	448	314.6	10,068	20,137	42.4	10.6	169.6	339.1	225 [6+8]	✔ 50	799 - 999
GeForce RTX 2080	TU104	8GB GDDR6	64	46 8*6-2	2944 [64]*46[SM]	46 8*6-2	368	92?	1515 - 1710	256 32*8	448	314.6	10,068	20,137	40.3	10.0	161.1	322.2	215 [6+8]	✔ 50	799 - 999
GeForce RTX 2070 Super	TU104	8GB GDDR6	64	40 12*4-8	2560 [64]*40[SM]	40	320	80?	1605 - 1770	256 32*8	448	283.2	9,062	18,120	36.0	9.1	144	288	215 [6+8]	-	499
GeForce RTX 2070 Founders Edition	TU106	8GB GDDR6	45 ～ 64	36 12*3	2304 [64]*36[SM]	36	288	72?	1410 - 1710 (OC)	256 32*8	448	233.3	7,465	14,930	31.5	7.9	126	252.1	185 [8]	-	499 - 599
GeForce RTX 2070	TU106	8GB GDDR6	42 ～ 64	36 12*3	2304 [64]*36[SM]	36	288	72?	1410 - 1620	256 32*8	448	233.3	7,465	14,930	29.9	7.5	119.4	238.9	175 [8]	-	499 - 599
GeForce RTX 2060 Super	TU106	8GB GDDR6	64	34 12*3-2	2176 [64]*34[SM]	34	272	64?	1470 - 1650	256 32*(12-4)	448	224.4	7,181	14.360	28.7	7.2	144.8	249.6	160 [8]	-	399
GeForce RTX 2060	TU106	6GB GDDR6	37 ～ 48	30 12*3-6	1920 [64]*30[SM]	30	240	60?	1365 - 1680	192 32*(12-6)	336	201.6	6,451	12,902	25.8	6.5	103	206	160 [8]	-	349
GeForce RTX 2050	TU106	4GB GDDR6	32	14 12*1+2	896 [64]*14[SM]	14	112	28?	1515 - 1695	128 32*(12-8)	224	94.52	3,037	6,075	12.1	3.0	48	97	75 ～ 100? [8?]	-	200 - 250?
・NVIDIA^® Quadro^® RTX GPU　http://www.nvidia.co.jp/object/quadro-jp.html TECHPOWERUP　https://www.techpowerup.com/gpu-specs/（リリース順）
Quadro RTX 8000	TU102	48GB GDDR6	86～96	72 12*6	4608 [64]*72[SM]	72	576	144?	1440 - 1730	384 32*12	672	509.8	16,312	32,625	130.5	16.3	261	522	250? [8+8?]	✔ 100	9,999
Quadro RTX 6000	TU102	24GB GDDR6	84 ～96	72 12*6	4608 [64]*72[SM]	72	576	144?	1440 - 1730	384 32*12	576	509.8	16,312	32,625	130.5	16.3	261	522	295 [8+8]	✔ 100	6,299
Quadro RTX 5000	TU104	16GB GDDR6	64	48 8*6	3072 [64]*48[SM]	48	384	96?	1620 - 1815	256 32*8	448	348.5	11,151	22,303	89.2	11.2	178.4	356.8	265 [6+8]	✔ 50	2,299
Quadro RTX 4000	TU104	16GB GDDR6	43	36 8*4	2034 [64]*36[SM]	36	288	72?	1215 - 1710	256 32*8	416	246.2	7,880	15,759	60	7.9	120	240	160 [8]	-	899
・NVIDIA^® Tesla^® (Turing™) GPU　https://www.nvidia.com/en-us/data-center/tesla/ TECHPOWERUP　https://www.techpowerup.com/gpu-specs/（Turing™ リリース順）
Tesla T4 (Passive)	TU104	16GB GDDR6	64	40 8*(6-1)	2560 [64]*40[SM]	40	320	80?	585	256 32*8	320	254.4	8.141	65,126	65	8.1	130	260	70 [8?]	-	3,500?
・NVIDIA^® Tesla^® (Volta) GPU　https://www.nvidia.com/en-us/data-center/tesla/ TECHPOWERUP　https://www.techpowerup.com/gpu-specs/（Volta リリース順）
Tesla V100 PCIe 32GB (Passive)	GV100	32GB GDDR5	128 (ROPs>	80	5120 [64]*80[SM]	-	640	2560	1280 - 1380	4096	897.0	7,066	14,131	28.262	112	28	130	260	250 [8+8]	✔ 200 2.0 4 bricks	12,000～
Model	GPU Number	Memory	RTX-OPS ROPs (T)	SM Count	CUDA Core (=SM*64 =FP32 Core =INT32 Core)	RT Core (=SM)	Tensor Core (=SM*8)	FP64 Core (=SM*2)	GPU Clock (MHz)	Bus Width (bits)	Band Width (GB/s)	FP64 double (GFLOPS)	FP32 float (GFLOPS)	FP16 falf (GFLOPS)	FP32/16 Tensor (TFLOPS)	INT32 (TIPS)	INT8 (TIPS)	INT4 (TIPS)	TDP (W) [PIN]	NVLink SLI Bridge (GB/s)	Price (US$)
NVIDIA Turing Microarchitecture Whitepaper Turing™ Architecture GPU の Compute Capability は、7.0 以上となります。 RTX-OPS 計算式 = (TENSOR * 20%) + (FP32 * 80%) + (RTOPS * 40%) + (lNT32 * 28%) （ RTX2080 Ti の場合、RTX-OPs = 77.92 = (14 * 80%) + (14 * 28%) + (100 * 40%) + (114 * 20%) ） NVIDIA^® SLI NVLink™ に対応させるためには、NVLink 対応のGPU、SLI に対応したマザーボード（X299等+SLI対応BIOS）、アプリケーション、ドライバーのSLI設定等が必要となります。

● NVIDIA^® Volta™ アーキテクチャー GPU
TITAN V (7.0, 12GB HBM2, CUDA 5120 Core, Tensor 640 Core, 1200 - 1455 MHz, 3072 bit, 652.8 GB/s, TDP 250W)

● NVIDIA^® Pascal™ アーキテクチャー GPU
・NVIDIA^® GeForce^® GPU http://www.nvidia.co.jp/object/geforce_family_jp.html
CUDA を高速化、最適化するためは、Pascal Architecture に対応した最新の CUDA + ドライバーが理想的
GeForce GTX 1050 (6.1, 4,2GB GDDR5, 768,640 Core, 1354 - 1445 MHz, TDP 75W～)
GeForce GTX 1060 (6.1, 6,3,2GB GDDR5, 1280,1152,1024Core, 1506 - 1708 MHz, TDP 120W～)
~~GeForce GTX 1070 (6.1, 8GB GDDR5, 1920 Core, 1506 - 1708 MHz, TDP 150W～200W) 1809 EOL~~
~~GeForce GTX 1080 (6.1, 8GB GDDR5, 2560 Core, 1607 - 1733 MHz, TDP 180～200W) 1809 EOL~~
~~NVIDIA TITAN X (6.1, 12GB GDDR5X, 3584 Core, 1417 - 1531 MHz, TDP 約250W～) 1809 EOL~~
（GTX の付いた GTX TITAN X は、Pascal ではなく、Maxwell アーキテクチャーとなりますので、ご注意ください）

GeForce TITAN Xp (6.1, 12GB GDDR5x, 3840 Core, 1,582 MHz, 384bit, 547.7GB/s, 12TFLOPS, 7680x4320@60Hz, TDP 250W)
GeForce GTX 1080 Ti (6.1, 11GB GDDR5, 3584 Core, 1480 - 1582 MHz, 352bit, 484GB/s, TDP 250W)
GeForce GTX 1050 Ti (6.1, 4GB GDDR5, 768 Core, 1290 - 1390 MHz, 128bit, TDP 75W)

・NVIDIA^® Quadro^® GPU http://www.nvidia.co.jp/object/quadro-jp.html
NVIDIA Quadro GV100 (6.0, 32GB HMB2, 5120 Core, 4096bit, 870GB/s, 250W, ECC対応, DP 1.4x4)
NVIDIA Quadro GP100 (6.0, 16GB HMB2, 3584 Core, 4096bit, 732GB/s, 235W, ECC対応, DVI-Ix1, DP 1.4x4)
NVIDIA Quadro P6000 (6.1, 24GB GDDR5X, 3840 Core, 384bit, 433GB/s, 250W, ECC対応, DVI-Ix1, DP 1.4x4)
NVIDIA Quadro P5000 (6.1, 16GB GDDR5X, 2560 Core, 256bit, 288GB/s, 180W, ECC対応, DVI-Ix1, DP 1.4x4)
NVIDIA Quadro P4000 (6.1, 16GB GDDR5, 1792 Core, 256bit, 243GB/s, 105W, ECC対応, DP 1.4x4)
NVIDIA Quadro P2000 (6.1, 5GB GDDR5, 1024 Core, 160bit, 140GB/s, 75W, ECC対応, DP 1.4x4)
NVIDIA Quadro P1000 (6.1, 4GB GDDR5, 640 Core, 128bit, 80GB/s, 47W, ECC対応, Mini DP 1.4x4)
~~NVIDIA Quadro P600 (6.1, 2GB GDDR5, 384 Core, 128bit, 64GB/s, 40W, ECC対応, Mini DP 1.4x4) EOL~~
NVIDIA Quadro P400 (6.1, 2GB GDDR5, 256 Core, 64bit, 32GB/s, 30W, ECC対応, Mini DP 1.4x3)

・NVIDIA^® Tesla™ GPU http://www.nvidia.co.jp/object/quadro-jp.html
~~Tesla V100 （7.0, 5120 Core, 7TFLOPS(DP), 14TFLOPS(SP), 18.7TFLOPS(HP), 112TFLOPS(DL), 16GB CoWoS HBM2 with ECC, 4096 bit, 1245-1380MHz, 900 GB/s) EOL~~
Tesla V100 （7.0, 5120 Core, 7TFLOPS(DP), 14TFLOPS(SP), 18.7TFLOPS(HP), 112TFLOPS(DL), 32GB CoWoS HBM2 with ECC, 4096 bit, 1230-1380MHz, 900 GB/s）
~~Tesla P100 for PCIe-Based （6.0, 4.7TFLOPS(DP), 9.3TFLOPS(SP), 18.7TFLOPS(HP), 16 or 12GB, 720 or 540 GB/s, ECC対応） EOL~~
~~Tesla P100 for NVLink-Optimized （6.0, 5.3TFLOPS(DP), 10.6TFLOPS(SP), 21.2TFLOPS(HP), 16GB, 720GB/s, ECC対応）~~
Tesla P40 （6.1, 12TFLOPS(SP), 47TOPS(INT8), 24GB, 346GB/s, 250W, ECC対応）
Tesla P4 （6.1, 5.5TFLOPS(SP), 22TOPS(INT8), 8GB, 192GB/s, 50/75W, ECC対応）
（SP:Single Precision 単精度, DP:Double Precision 倍精度, HP:Half Precision 半精度）

※NVIDIA^® Deep Learning （深層学習）開発環境に非対応なGPUの例
Tesla C2050 (2.0) / C2070 (2.0) / C2075 (2.0) / M20xx (2.0)
Qudro Plex 7000 (2.0)
Qudro NVS 310 (2.0) / NVS 315 (2.0) / NVS 4200M (2.1) / NSV 5200M (2.1)
GeForce GT 430 (2.1) / GT 430 (2.1) / GT 440 (2.1) / GTS 450 (2.1) / GTX 460 (2.1)
GeForce GT 550 Ti (2.1) / GT 560 Ti (2.1) / GTx 570 (2.0) / GTx 580 (2.0) / GTX 590x (2.0)
GeForce GT 610 (2.1) / GT 620 (2.1) / GT 630 (2.1) / GT 640 [GDDR3] (2.1)
(GeForce GT 640 GDDR3 は、 NVIDIA^® のサイトでは 2.1 となっていましたが、 ELSA GeForce GT 640 LP 2GB GD640-2GERGL は、CUDA 「Device Query」でチックしたところ 3.0 でした)
GeForce GT 730 [DDR3,128bit] (2.1)

搭載してあるNVIDIA^®製GPUの種類は、以下のコマンドにより確認ができますので、もし、NVIDIA^®製GPUを搭載されている場合は、確認してみてください。

$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b80 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)

$ nvidia-smi                      (要 CUDA インストール)
Thu Sep 22 22:23:07 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 640      Off  | 0000:01:00.0     N/A |                  N/A |
| 30%   35C    P0    N/A /  N/A |      0MiB /  1997MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+

$ $ ./deviceQuery                (要 CUDA Sample インストール)
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8110 MBytes (8504279040 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 
                                                 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device 
	 simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime 
Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS

CPU Central Processing Unit	■ プロセッサのCore数、Thread数プロセッサの能力は、勿論、高性能であることが理想的ですが、重たい計算の殆どは GPU で処理されるため、そこそこの性能があれば、問題ありません。マルチGPUシステムの場合でも、1 GPU 当たり、1 Thread でも何とか動作してくれ、2 Thread あれば、問題ないと思われます。開発環境を構築するときに、ソースをマルチコアを使って高速にコンパイルすることもありますので、できればインテル@ Core™ i7 クラスのプロセッサがあれば、理想的だと思われます。決して、インテル@ Core™ i5 クラスのプロセッサでは全く使えないというわけではありませんが、Thread数が8から4に減少してしまいますので、コンパイル時間は、最大２倍の時間がかかってしまう恐れがあります。インテル@ Core™ i7 プロセッサには、通常のデスクトップ用の第６世代 Core™ i7-6600 番台（Skylake）と、ハイエンド・デスクトップ用の Core™ i7 6000 番台（Broadwell-E）があります。デスクトップ・サーバー用に、Intel^® Xeon^® Processor E3 v5 Family (Skylake、E3-12xx v5) というプロセッサがありますが、デスクトップ用の第６世代 Core™ i7-6600 番台（Skylake）と比較した場合、メモリがECC UDIMM（Error Check and Correct Unbuffered Dual Inline Memory Module）に対応しており、メモリの信頼性が高いというのが、大きな特徴ですが、プロセッサとしての性能はほとんど変わりありません。 ■ プロセッサの「PCI Express レーンの最大数」注意点としまして、通常のデスクトップ用のプロセッサは、「PCI Express レーンの最大数」が、16しかない点です。 GPUを２本から、最大４本搭載可能なマザーボードがありますが、４本搭載した場合は PCI Express x4 での動作となってしまいます。（ x4 * 4 GPU = x16）尚、PCI Express Gen.(Generation)3.0 は、1レーンあたり 1GB/sec.（一方向あたり）の転送帯域がありますので、x16 の場合は、16GB/sec. という転送帯域となります。マザーボードに、PCI Express Gen.3.0 x16 のスロットが、2本以上あったとしても、２本のGPUを刺したり、GPU以外の PCI Express カードを取り付けた場合、x8 にレーン数が減ってしまいます。（PC起動時に、マザーボードが自動設定）。また、スロットによっては、形状は x16 ですが、x8 でしか動作しない場合があり、取付時には x16 にGPUが搭載されたかどうかを確認する必要があります。 x8 の動作となりますと、プロセッサとGPU間のデータ転送速度が半分に落ちてしまいます。一方、ハイエンド・デスクトップ用のインテル^® Core™ i7 プロセッサは、最下位のクラスのものを除いて、インテル@ Xeon@ プロセッサと同様に、「PCI Express レーンの最大数」は40もあります。（Broadwell-E　/　EPの場合）また、メモリは Quad Channel (メモリは４枚単位で増設) に対応しており、デスクトップ用プロセッサの Dual Channel (メモリは２枚単位で増設) の倍のメモリ転送速度があります。そのため、2本のGPU迄であれば、各GPUは x16 の速度で動作させることが可能です。ここで注意していただきたいのですが、一番下位のクラスの Core™ i7- 6800K プロセッサは、「PCI Express レーンの最大数」が28しかありませんので、２本目のGPUは、PCI Express x8 でしか動作できません。 NVIDIA DIGITS DEVBOX （Deep Learning 開発用のリファレンスPC）で採用されているような、ASUS X99-E WS というマザーボードには、４本の PCI Express 3.0 x16 スロットがありますが、もし、４本のGPUを搭載した場合は、x16 モードでは動作せず、x8 モードで動作することになります。（40 - (8 * 4) = 残り 8 Lane） ●ハイエンド・デスクトップ用のインテル@ Core™ i7 プロセッサの比較 http://ark.intel.com/ja/compare/94456,94196,94188,94189 http://ark.intel.com/ja/products/codename/80341/Broadwell-E Intel^® Xeon^® Processor E5 v4 Family の場合は、一番最下位のクラスの「E5-1620 v4」や「E5-2603 v4」でも、「PCI Express レーンの最大数」は40あります。 http://ark.intel.com/ja/products/family/91287/Intel-Xeon-Processor-E5-v4-Family#@Server DP（Dual Processor）対応のワークステーション、サーバーであれば、40×2=80Lane になりますので、４本のGPUを全て、PCI Express x16 モードで動作させることが可能となります。（80 - (16 * 4) = 16 Lane）また、Intel^® Xeon^® Processor E5 v4 Family は、ECCに対応した RDIMM(Registered DIMM) に対応しており、１プロセッサあたり、何と 1536GB ものメモリ（128GB 3DS LRDIMM: 128GB × １２枚）を搭載することが可能です。 2CPUであれば、最大3072GBものメモリを搭載することができます。 ■ プロセッサのキャッシュ容量 CPUとGPU間のデータ転送を考慮しますと、プロセッサのキャッシュ容量は、なるべく多い方が理想的です。一言にキャッシュと言っても、キャッシュには、１次キャッシュ（32KB程度）、２次キャッシュ（256KB程度）、３次キャッシュ（8～55MB程度）がありますが、通常は容量の一番多い３次キャッシュのことを言います。通常のデスクトップ用の第６世代 Core™ i7-6600 番台（Skylake）のプロセッサのキャッシュは、僅か 8MB しかありません。 Intel^® Xeon^® Processor E3 v5 Family (Skylake、E3-12xx v5) も同様です。ハイエンド・デスクトップ用の Core™ i7 6000 番台（Broadwell-E）の場合は、最下位の i7-6800K でも 15MB あります。（i7-6850K も同じ 15MB）上位クラスの i7-6900K は 20MB、非常に高価ですが、最上位の i7-6950X は 20MB もあります。インテル^® Xeon^® プロセッサ（Broadwell-EP）の場合、UP(Uni Processor)対応の Xeon^® E5-1620 v4 / E5-1630 v4 は、僅か 10MB しかありません。できれば、15MBある E5-1650 v4 や、20MB ある E5-1660 v4 以上を選択したいところです。 DP(Dual Processor)対応の Xeon^® E5-26xx v4 プロセッサの場合は、一番最下位の E5-2603 v4 でも 15MB のキャッシュがあります。中には、E5-2623 v4 のように 10MB しかないものもかありますが、E5-2630 v4 以上のプロセッサは、キャッシュが 25MB （最大55MB）もあり、CPU～GPU間の速度は向上するものと考えられます。 ■ プロセッサの動作周波数プロセッサの動作周波数のことが、一番最後になってしまいましたが、勿論速ければ速い方が理想的です。しかしながら、深層学習の計算は、プロセッサに処理時間のかかる浮動小数点演算を大量に行わせる訳ではありませんので、あまり大きな問題ではありません。それよりも、GPUとの間で高速にデータをやりとりすることが重要となりますので、Thread数とか、キャッシュ容量の方を優先してください。 ■ まとめ（プロセッサ） GPUの場合は、Compute Capability は 3.0 以上という条件がありますが、プロセッサの方は厳密な規定はありません。 https://developer.nvidia.com/deep-learning-getting-started 開発環境を構築するのに少々時間はかかってしまい、決して快適な環境とは言えませんが、極端な話、インテル^® Core™ i3 プロセッサや、これ以下の仕様の Pentium^®、Celeron^® プロセッサでも、Deep Learning 開発マシンを構築することは可能です。そのため、第６世代 Core™ i7-6600 番台（Skylake）や、Intel^® Xeon^® Processor E3 v5 Family (Skylake、E3-12xx v5)でも、GPUは一枚しか搭載しないという条件であれば、全く問題なく Deep Learning 用のPCとして使うことが可能です。しかしながら、最初から GPU を 2枚は搭載したいという場合や、将来、２枚に増設するかもしれないという場合は、［PCI Express レーンの最大数］が40レーンあるハイエンド・デスクトップ用プロセッサ（Core™ i7-6800K、5820K を除く）や、 Intel® Xeon® Processor E5-2600 v4 Product Family を選択してください。 GPU を 3枚、4枚搭載したいという場合は、 Intel® Xeon® Processor E5-2600 v4 Product Family を２基搭載した DP（Dual Processor）サーバーや、100万円を軽く超えてしまうかなり高価なサーバーとなりますが、 Intel® Xeon® Processor E5-4600 v4 Product Family を４基搭載した MP（Multi Processor ）サーバーが必要となります。 ※ 参考ページ A Full Hardware Guide to Deep Learning (Tim Dettmers) ※ GPUを搭載していないけど、何とか CPUだけで Deep Learning を体験してみたいという方は、こちらを参考にしてみてください。 GPUなしのNVIDIA DIGITS3で始めるDeepLearning ソースからのコンパイルが必要ですが、DIGITS 4.0 も、GPU無しで動作するはずです。代表的な Deep Learning の Frameworks である Caffe も、デフォルトでは、GPU無しで動作する設定になっています。 ※ インテル^® Xeon^® プロセッサや、Xeon^® Phi™ x200 プロセッサ（Knights Landing）を使用した場合の Deep Learning 開発に関しての情報はこちらをご覧ください。先頭へ戻る

NVIDIA^® CUDA GPU driver

最新 CUDA GPU ドライバーのインストール

NVIDIA^® CUDA をインストールする前に、必ず NVIDIA^® のドライバーのバージョンを確認してください。
最新のアーキテクチャーである Pascal Architecture を採用した GPU を使用した場合、このアーキテクチャーに対応した最新のドライバーをインストールしませんと、ディープラーニングを高速化、並びに最適化することができないことがあります。
また、Pascal Architecture の GTX 1080 のような最新のGPU を Linux^® や Microsoft^® Windows^® をインストールした直後は、GPUのドライバーが自動的には入らず、文字入力も操作が重たくなってしまい、満足にできないようなことになってしまうことがあります。
NVIDIA^® CUDA をインストールしますと、CUDAに付属の少し古いドライバーに戻ってしまいますが、OSのアップデートが一通り終わりましたら、画面がサクサク動いてくれるように、まず、最新のGPUドライバーをインストールすることをお勧めいたします。

$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.35 Mon Jul 11 23:14:21 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

各GPU用の最新の Long Lived Driver は以下より、ダウンロードすることができます。

http://www.nvidia.co.jp/Download/index.aspx?lang=jp

Warning: Use of undefined constant php - assumed 'php' (this will throw an Error in a future version of PHP) in /virtual/htdocs/public_html/support/NVIDIA/Deep_Learning/Driver/NVIDIA_Deep_Learning_common_Driver.php on line 29 ●GeForce / Quadro 用 Version 384.69 2017.8.22 Linux 64-bit Japanese 77.06 MB

NVIDIA-Linux-x86_64-384.69.run 2017.8.22

http://www.nvidia.co.jp/download/driverResults.aspx/123199/jp

•Added support for the following GPU(s):
　Quadro P4000 with Max-Q Design
•Fixed an intermittent hang when using Vulkan to present directly to display with the VK_KHR_display extension. SteamVR was particularly affected by that hang.
•Disabled G-SYNC in desktop environments, such as Budgie, that use libmutter-0.so.
An existing rule to disable G-SYNC for libmutter.so no longer applied after the library was renamed to libmutter-0.so.
•Updated nvidia-installer to label kernel modules with SELinux file type 'modules_object_t'. Some system SELinux policies only permit loading of kernel modules with this SELinux file type.
•Removed support for checking for and downloading updated driver packages and precompiled kernel interfaces from nvidia-installer. This functionality was limited to unencrypted ftp and http, and was implemented using code that is no longer actively maintained.

●GeForce / Quadro 用 Version 375.39 2017.2.14 Linux 64-bit Japanese 73.68 MB

NVIDIA-Linux-x86_64-375.39.run 2017.2.14

http://www.nvidia.co.jp/download/driverResults.aspx/114941/jp

•Added support for the following GPU(s):
　Quadro GP100
　Quadro P4000
　Quadro P2000
　Quadro P1000
　Quadro P600
　Quadro P400
　Quadro M1200
　Quadro M2200
•Fixed a bug that caused system hangs when resuming from suspend with some GPUs.
•Fixed a regression that could cause corruption when hot-plugging displays.
•Fixed a regression that prevented systems with multiple DisplayPort monitors from resuming correctly from suspend.

●GeForce / Quadro 用 Version 367.44 2016.8.23 Linux 64-bit Japanese

NVIDIA-Linux-x86_64-367.44.run 2016.8.23

http://www.nvidia.co.jp/download/driverResults.aspx/106804/jp
•Added support for the following GPUs:
TITAN X (Pascal)
GeForce GTX 1060 6GB
GeForce GTX 1060 3GB
•Fixed a regression that caused applications using indirect GLX to crash.
•Fixed a regression introduced in 367.35 that caused the first modeset of the X server to display blank if the features requested in the X configuration file enabled the X driver's composition pipeline. This would be triggered, e.g., by MetaMode tokens such as ForceCompositionPipeline, ForceFullCompositionPipeline, Rotation, Reflection, and Transform.

●Tesla、CUDA Toolkit 7.5 用 Version 352.99 2016.8.1 Linux 64-bit Japanese

NVIDIA-Linux-x86_64-352.99.run 2016.8.1

http://www.nvidia.co.jp/download/driverResults.aspx/105667/jp

『NVIDIA^® DIGITS™』をインストルする前提条件として以下のように、NVIDAドライバーのバージョンは、346またはそれ以降である必要があると規定されています。
NVIDIA driver version 346 or later.
If you need a driver go to
http://www.nvidia.com/Download/index.aspx

Linux x86_64/AMD64/EM64T ドライバー
●最新(2017/09/02時点)のLong Lived Branch バージョン: 384.69 2017.8.22
•Added support for the following GPU:
　Quadro P4000 with Max-Q Design

●最新(2017/04/08時点)のLong Lived Branch バージョン: 375.39 2017.2.14
•Added support for the following GPU(s):
　Quadro GP100, P4000, P2000, P1000, P600, P400, M1200, M2200

●最新の Short Lived Branch バージョン: 378.13 2017.2.3
•Added support for the following GPU(s):
　Quadro P3000
　Quadro GP100
　Quadro P4000
　Quadro P2000
　Quadro P1000
　Quadro P600
　Quadro P400
　Quadro M1200
　Quadro M2200

Linux AMD64 Display Driver Archive
http://www.nvidia.co.jp/object/linux-amd64-display-archive-jp.html
384.59 2017/07/24
375.82 2017/07/24

Ubunt に nvidia の最新のGraphics Driver をインストール場合、必ず PPA(Personal Package Archives) repository を使用して、最新のドライバーをインストールしてください。
Linux の kernel には、標準で nouveau という NVIDIA の互換ドライバー（3Dのアクセラレーション無し）が組み込まれてあり、これが使用されています。
Runlevel 3 にて、

NVIDIA-Linux-x86_64-375.39.run

(GeForce Linux 64bit 用ドライバー) のようなドライバーを直接インストールしないでください。
nouveau が影響して、X-Window が正常に起動しなくなってしまいます。
また、ubuntu の場合、〔System Tools〕 → 〔システム設定(System Settings)〕 → 〔Additional Drivers〕で、グラフィックス・ドライバーのバージョン管理をしていますが、直接インストールしてしまいますと、矛盾が生じてしまいます。

こちらのサイトに解説されているように、Repositoty を使ってインストールしますと、Ubuntu に正しく NVIDIA^® のドライバーをインストールすることができます。

http://ubuntuhandbook.org/index.php/2016/07/nvidia-367-35-how-to-install/

Graphics Driver Team has made the new driver release into PPA, available for Ubuntu 16.04, Ubuntu 15.10, Ubuntu 14.04, Ubuntu 12.04, and the next Ubuntu 16.10.

Follow the steps below to add PPA and install the driver:
1. Add Graphics Drive PPA, by opening terminal (Ctrl+Alt+T) and running the command:

$ sudo add-apt-repository ppa:graphics-drivers/ppa

Or run the commands below one by one in terminal:

$ sudo apt update
$ sudo apt install nvidia-367

Finally restart your computer and done.

最後に、PCを再起動して、NVIDIA^®の最新のCUDA GPUドライバーのインストール完了となります。
確認のため、インストールされた NVIDIA のドライバーのバージョンを確認してください。

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)

先頭へ戻る

NVIDIA^® CUDA　Toolkit https://developer.nvidia.com/cuda-toolkit
CUDA Toolkit	GPU（Graphics Processing Unit）は、本来、Microsoft^® DirectX^® や OpenGL^® といったグラフィックス・ライブラリーによる描画を高速化するための、数百Coreからなる描画専用のプロセッサでしたが、それではもったいないということで、通常のプロセッサと同じように、汎用計算をさせることをできるようにしたものが CUDA（Compute Unified Device Architecture）です。 CUDA 1.0 は、2007年6月のリリースとなります。（CUDA Toolkit Archive） CUDA は、NVIDIA^®が提供する並列コンピューティングアーキテクチャ、統合開発環境です。 nvcc（NVIDIA CUDA Compiler）というコンパイラや、ライブラリなどから構成されています。
CUDA Toolkit	CUDA Toolkit 7.5 (Does not support for Pascal based GeForce GTX 1080) https://developer.nvidia.com/cuda-downloads NVIDIA CUDA Installation Guide for Linux http://docs.nvidia.com/cuda/cuda-installation-guide-linux/	CUDA Toolkit 8.0 https://developer.nvidia.com/cuda-downloads (support for Pascal Architecture GPU like GTX 1080) •Pascal Architecture Support •Out of box performance improvements on Tesla P100, supports GeForce GTX 1080 ベータ版、RC（Release Candidate）をダウンロードする場合。 https://developer.nvidia.com/cuda-release-candidate-download （要 NVIDIA® Accelerated Computing Developer Program にメンバー登録） CUDA QUICK START GUIDE (PDF) DU-05347-301_v8.0 \| September 2016 NVIDIA CUDA TOOLKIT 8.0 (PDF) RN-06722-001_v8.0 \| May 2016 Release Notes for Windows, Linux, and Mac OS	CUDA Toolkit 9.0 （2017年9月） https://developer.nvidia.com/cuda-downloads 最適化とヒューリスティクス（体験学習）により、最大5倍に高速化されたライブラリ。 Volta GPU、NVLinkおよびHBM2では、HPCアプリケーションを最大1.5倍高速化 CUDA Toolkit 9.0 Release Notes NInstalling the CUDA Toolkit (Youtube) Introduction to CUDA Getting Started with CUDA Discover CUDA 9 Capabilities
CUDA Toolkit 対応 OS	RHEL 7 / 6 CentOS 7 / 6 Ubuntu 14.04 LTS Ubuntu 15.04	RHEL 7 / 6 CentOS 7 / 6 Ubuntu 14.04 LTS (～2019/04) Ubuntu 16.04 LTS　(～2021/04) 14.04 の最新バージョンは、14.04.4 LTS、kernel のバージョンは、4.2.0-42-generic です。（14.04 の初期 kernel は 3.13） 16.04 の最新バージョンは、16.04.1 LTS、kernel のバージョンは、4.4.0-36-generic です。	RHEL 7 / 6 CentOS 7 / 6 Ubuntu 16.04 LTS　(～2021/04) 16.04 の最新バージョンは、16.04.1 LTS、kernel のバージョンは、4.4.0-36-generic です。 Ubuntu 17.04 LTS　(～2018/07) 17.04 のサポート期間は、16.04よりも短くなります。
CUDA Toolkit 対応 OS	CUDA をインストールしますと、たとえ CUDA 9.0 であっても、GPUのドライバーは少し古いバージョンに戻ってしまうことがあります。 $ cat /proc/driver/nvidia/version 上記のコマンドにより、GPUのドライバーのバージョンを確認し、もし、最新のGPUドライバーではない場合は、下記のように、最新のGPUドライバーを再インストールしてから、PCを再起動してください。（Ubuntu の場合） $ sudo apt-get install nvidia-384 --reinstall （384は、2017年10月7日時点での、最新ドライバーのバージョンです） $ sudo reboot もし、何かの理由により、上書き再インストールで、動作がおかしい場合は、一度、NVIDIAのドライバーを完全にアンインストールしてから、新規にインストールしなおしてください。 $ sudo apt-get --purge remove nvidia-* 先頭へ戻る

NVIDIA^® DIGITS™
The NVIDIA^® Deep Learning GPU Training System

NVIDIA^®
DIGITS™

DIGITS4 MNIST Job Information, Parse Folder (Train/Val)

NVIDIA DIGITS　4　MNIST　Create DB (train), Input File (before shuffling)

NVIDIA DIGITS　4　MNIST　Create DB (val), Input File (before shuffling)

NVIDIA DIGITS　4　MNIST　LeNet - Job　Directory, Network(train/val, deploy, original), Solver, Raw Caffe Output, Dataset

VIDIA DIGITS　4　MNIST　LeNet - Loss (train/val), Epoch, Accuracy, Trained Model, Select Model, Test a single, Test a list of

NVIDIA DIGITS　4　MNIST　LeNet - Image path, Upload Image, Show visualizations and statistics, Classify One

NVIDIA DIGITS　4　MNIST　LeNet - Image Classification, Model, Predictions, Statistics, Visualization, data shape, Mean, Std deviation, scaled

NVIDIA DIGITS　4　MNIST　LeNet - conv1, Weights(Convolution Layer), 520 learned parameters, Activation, pool1, conv2

NVIDIA DIGITS　4　MNIST　LeNet - pool2, activation, data shape, Mean, Std deviation, ip1, Weights(Innter Production Layer), learned parameters, Activation

NVIDIA DIGITS　4　MNIST　LeNet - softmax, Activation, Totals, data shape, Mean, Std deviation, Total learned parameters

NVIDIA DIGITS　4　MNIST　Classify One Image GoogleLeNet Image Classification Model

NVIDIA^® が作成、公開した、ディープラーニングのモデルをトレーニングするためのWebアプリ（オープンソース）

https://developer.nvidia.com/digits

Interactively manage data and train deep learning models for image classification without the need to write code.

Download

https://developer.nvidia.com/rdp/digits-download

DDIGITS™ に対応している Deep Learning Frameworks は、2016年9月現在、NVCaffe（本家のBVLC/caffeからフォーク）に加えて、touch 等に対応しています。
尚、DIGITS™ をインストールするためには、Caffe のインストールが必須条件となります。（torch のインストールは、任意となりますが、インストールしないと不具合が発生することが多く、基本的に同時インストールをお勧めします）
基本的には、GPUが必須ですが、Caffe も含めて、一応、GPU無しでも開発環境は構築可能です。（無しだと、実用的な速度が出ません）（参考ページ）

DIGITS™ 2　(2015/09/08)

DIGITS™ v2.0.0
for Ubuntu 14.04 LTS

NVIDIA driver version 346 or later.

https://github.com/NVIDIA/DIGITS/releases/tag/v2.0.0

Getting Started

https://github.com/NVIDIA/DIGITS/blob/digits-2.0/docs/GettingStarted.md

Installation Instructions

https://github.com/NVIDIA/DIGITS/blob/digits-2.0/docs/WebInstall.md

DIGITS™ 1　

2015/06/26

DIGITS™ v1.1.2
for Ubuntu 14.04 LTS

DIGITS 3　(2016/02/10)

DIGITS v3.0.0
for Ubuntu 14.04 LTS

https://developer.nvidia.com/rdp/digits-download

Getting Started

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/GettingStarted.md

Installation Instructions

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md

https://github.com/NVIDIA/DIGITS/blob/master/docs/UbuntuInstall.md

DIGITS 2.0までは、インストーラーによりインストールしましたが、DIGITS 3.0 からは、NVIDIA CUDA 7.5 リポジトリと Machine Learning リポジトリを登録して、インストールする方法に変更となりました。

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md#repository-access

https://github.com/NVIDIA/DIGITS/releases/tag/v3.0.0

DIGITS™ 4　(2016/08/10)

DIGITS™ v4.0.0 for
Ubuntu 14.04 LTS (～2019/04)

DIGITS v4.0.0 は、2016年8月10日にリリースされました。

NVIDIA CUDA Repository for Linux

http://developer.download.nvidia.com/compute/cuda/repos/

※まだ、Ubuntu 15.04 迄の対応で、Ubuntu 16.04 には未対応です。
※また、CUDA 7.5 には対応していますが、CUDA 8.0 にも未対応です。

Building DIGITS from source

https://github.com/NVIDIA/DIGITS/blob/digits-4.0/docs/GettingStarted.md

DIGITS™ v4.0.0 for Ubuntu 14.04 LTS – Repository Access

https://github.com/NVIDIA/DIGITS/blob/digits-4.0/docs/UbuntuInstall.md#repository-access

Release Notes

https://github.com/NVIDIA/DIGITS/releases/tag/v4.0.0

Bugfixes
•Made device_query compatible with CUDA 8.0 (#890)

Digits and Ubuntu 16.04 LTS

Source code (zip)
Source code (tar.gz)

Building DIGITS from source code
Download source form Git
# example location - can be customized

$ DIGITS_HOME=~/digits
$ git clone https://github.com/NVIDIA/DIGITS.git $DIGITS_HOME

Build Caffe from source (required)
build torch7 from source (suggested)
（インストールしないと、DIGITSがエラーで止まること多々あり）

※ ソースからコンパイルするすれば、Ubuntu系のLinuxである、Linux Mint 18 でも、DIGIT、Caffe、tourch は動作することを確認しました。しかしながら、一部環境変数 $DISTRO により utuntu かどうかをチェックする箇所があり、修正が必要です。
==> Only Ubuntu, elementary OS, Fedora, Archlinux and CentOS distributions are supported.

CentOS 7.2-1511 でも、Ubuntu 16.04 と同様に、ソースから yum を使ってインストールすることが可能ですが、Linux Mint 18 以上に、各種のスクリプトが途中で止まってしまいます。時には、ヘッダー・ファイルが不足しているというようなエラーが出で止まってしまうこともあります。
標準でサポートされていないディストリビューションにインストールする場合は、そのままでは動作しないことを前提に、自己責任でお願いします。

DIGITS™ 5　(2017/01/01)

DIGITS™ v5 for
Ubuntu 14.04 LTS (～2019/04)
Ubuntu 16.04 LTS (～2021/04)

2017年2月1日に、最新版の DIGITS v5 がリリースされました。

NVIDIA CUDA Repository for Linux

http://developer.download.nvidia.com/compute/cuda/repos/

※Ubuntu 16.04 に対応した最初のバージョンです。
※また、CUDA 8.0 にも対応した最初のバージョンです。

Building DIGITS from source

https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/GettingStarted.md

DIGITS™ v5 for Ubuntu 14.04 LTS – Repository Access

https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/UbuntuInstall.md#repository-access

Release Notes

https://github.com/NVIDIA/DIGITS/releases/tag/v5.0.0

Bugfixes
•Made device_query compatible with CUDA 8.0 (#890)

Digits and Ubuntu 16.04 LTS

Source code (zip)
Source code (tar.gz)

Building DIGITS from source code
Download source form Git
# example location - can be customized

$ DIGITS_HOME=~/digits
$ git clone https://github.com/NVIDIA/DIGITS.git $DIGITS_HOME

Build Caffe from source (required)
build torch7 from source (suggested)

■ DIGITS Server の起動
Development mode

$ DIGITS_HOME=~/digits
$ cd $DIGITS_HOME
$ ./digits-devserver
2016-09-15 09:24:00 [INFO ] Loaded 0 jobs.
  ___ ___ ___ ___ _____ ___
|   \_ _/ __|_ _|_   _/ __|
| |) | | (_ || |  | | \__ \
|___/___\___|___| |_| |___/ 4.1-dev

URL = http://localhost:5000 (default port=5000)

Production mode

$ ./digits-server

URL = http://localhost:34448 (default port=34448)

DIGITS user group

https://groups.google.com/forum/#!forum/digits-users

先頭へ戻る

NVIDIA^® cuDNN
CUDA Deep Learning Neural Network library

cuDNN

「cuDDN」は、Deep Neural Network(DNN)開発用のライブラリです。
Deep Learning for Computer Vision with Caffe and cuDNN

Accelerate Machine Learning with the cuDNN Deep Neural Network Library

https://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/

Deep Learning for Computer Vision with Caffe and cuDNN

https://devblogs.nvidia.com/parallelforall/deep-learning-computer-vision-caffe-cudnn/

cuDNN

Download

cuDNN の各バージョンのライブラリ、「cuDNN User Guide」、「cuDNN Install Guide」等をダウンロードするためには、まず

『NVIDIA® Accelerated Computing Developer Program』

に登録する必要があります。

Download cuDNN

https://developer.nvidia.com/rdp/cudnn-download

Download cuDNN v7.0.3 (Sept. 28, 2017), for CUDA 9.0
Download cuDNN v7.0.3 (Sept. 28, 2017), for CUDA 8.0

Download（要ログイン）
cuDNN v7.0.3 Library for Linux
cuDNN v7.0.3 Library for Windows 7
cuDNN v7.0.3 Library for Windows 10
cuDNN v7.0.3 Library for OSX

Download cuDNN v5 (May 27, 2016),
for CUDA 8.0

Download cuDNN v4 (Feb 10, 2016),
for CUDA 7.0 and later.

Download cuDNN v5 (May 12, 2016),
for CUDA 7.5

Download cuDNN v5.1 (August 10, 2016),
for CUDA 8.0

■ cuDNN インストールの手順

$ tar cudnn-8.0-linux-x64-v5.1.tgz
$ ls -al ucda/include
drwxrwxr-x 3 abc abc 4096 9月 15 19:04 .
drwxrwxr-x 4 abc abc 4096 9月 15 18:37 ..
-r--r--r-- 1 abc abc 99657 7月 27 14:44 cudnn.h
drwxrwxr-x 2 abc abc 4096 9月 15 18:37 PaxHeaders.12755
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ ls -al cuda/lib64
drwxrwxr-x 3 abc abc 4096 9月 15 19:04 .
drwxrwxr-x 4 abc abc 4096 9月 15 18:37 ..
lrwxrwxrwx 1 abc abc 13 7月 27 14:55 libcudnn.so -> libcudnn.so.5
lrwxrwxrwx 1 abc abc 17 7月 27 14:55 libcudnn.so.5 -> libcudnn.so.5.1.5
-rwxrwxr-x 1 abc abc 79337624 7月 27 14:53 libcudnn.so.5.1.5
-rw-rw-r-- 1 abc abc 69756172 7月 27 14:53 libcudnn_static.a
drwxrwxr-x 2 abc abc 4096 9月 15 18:37 PaxHeaders.12755
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Download cuDNN v6.0 (March 23, 2017),
for CUDA 8.0

-------------------------------------------------------
用意されてある、ドキュメント、ライブラリ一覧
-------------------------------------------------------
1,363 cudnn_install.txt
2,164,585 CUDNN_Library_User_Guide.pdf

201,237,786 cudnn-8.0-linux-ppc64le-v6.0.tgz
201,123,192 cudnn-8.0-linux-x64-v6.0.tgz
199,371,991 cudnn-8.0-osx-x64-v6.0.tgz
106,498,131 cudnn-8.0-windows10-x64-v6.0.zip
106,498,663 cudnn-8.0-windows7-x64-v6.0.zip
59,878,988 libcudnn6-dev_6.0.20-1+cuda8.0_amd64.deb
59,822,252 libcudnn6-dev_6.0.20-1+cuda8.0_ppc64el.deb
4,577,756 libcudnn6-doc_6.0.20-1+cuda8.0_amd64.deb
6,566,942 libcudnn6-doc_6.0.20-1+cuda8.0_ppc64el.deb
68,493,506 libcudnn6_6.0.20-1+cuda8.0_amd64.deb
68,447,948 libcudnn6_6.0.20-1+cuda8.0_ppc64el.deb
-------------------------------------------------------

■ cuDNN インストールの手順

$ tar cudnn-8.0-linux-x64-v6.0.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Download cuDNN v5.1 (August 10, 2016),
for CUDA 7.5

Download cuDNN v6.0 (March 23, 2017),
for CUDA 7.5

Deep Learning Frameworks 深層学習フレームワーク https://developer.nvidia.com/deep-learning-frameworks
	Deep learning framework by the BVLC (Berkeley Vision and Lerning Center) http://caffe.berkeleyvision.org/ NVIDIA^® Caffe https://developer.nvidia.com/taxonomy/term/690 http://caffe.berkeleyvision.org/installation.html Caffe is a deep learning framework made with expression, speed, and modularity in mind. Caffe is developed by the Berkeley Vision and Learning Center (BVLC), as well as community contributors and is popular for computer vision. Caffe supports cuDNN v5 for GPU acceleration. （まだ、CuDNN v6 は、正式サポートされていません） Supported interfaces: C, C++, Python（インストール時に Version 2 / 3 を要選択）, MATLAB, Command line interface Learning Resources •Deep learning course: Getting Started with the Caffe Framework •Blog: Deep Learning for Computer Vision with Caffe and cuDNN http://caffe.berkeleyvision.org/installation.html ■ Ubuntu 12.04 の場合 Download ZIP https://github.com/BVLC/caffe/ Install git $ cd $ sudo apt-get install git Install caffe $ git clone https://github.com/BVLC/caffe.git Compilation with Make $ cd caffe $ cp Makefile.config.example Makefile.config cnDNN を使用して高速化をする場合は、Makefile.config の Line 5 の冒頭の#を取ってください。 # USE_CUDNN := 1 ↓ USE_CUDNN := 1 ●Compilation Make コマンドにより、コンパイルしてください。 $ make all 6-Core/12-Thread 対応のハイエンド・デスクトップ用の Intel@ Core™ i7 processor （i7-6980K、i7-6985K等）の場合、-j12 と指定しますと、プロセッサの全ての Thread を使用して最大の能力を使って、高速にコンパイルができます。 $ make -j12 all $ make -j12 test $ make -j12 runtest 以上のテストが全てOKとなれば、 Caffe の環境構築は終了となります。 Ubunt 16.04 の場合は、NVIDIA DIGITS 4.0　/ 5.0 をソースからビルドするときに、こちらのページの指示に従って、Caffe をビルドしておく必要があります。 Build Caffe from source using Git 最新版の ubuntu 16.04.3 日本語版では、何とかコンパイルは通過しても、リンカー ld が異常終了して、Core Dump してしまいます。（BLVC、NVIDIA版の両方とも）どちらかというと、NVIDIA版の方が、警告の表示が少なく、お勧めです。どうしても、日本語の環境で ld が異常終了してしまう場合、英語版を試してみてください。先頭へ戻る
	Caffe2 is a deep learning framework enabling simple and flexible deep learning. Built on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind, allowing for a more flexible way to organize computation. Caffe2 supports cuDNN v5.1 for GPU acceleration. Supported interfaces: C++, Python Learning Resources Caffe2 learning resources https://developer.nvidia.com/caffe2 Download Caffe2 Download cuDNN 先頭へ戻る
	https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine Microsoft^® The Computational Network Toolkit unified deep-learning toolkit from Microsoft Research that makes it easy to train and combine popular model types across multiple GPUs and servers. The Microsoft Cognitive Toolkit —previously known as CNTK— is a unified deep-learning toolkit from Microsoft Research that makes it easy to train and combine popular model types across multiple GPUs and servers. Microsoft Cognitive Toolkit implements highly efficient CNN and RNN training for speech, image and text data. Microsoft Cognitive Toolkit supports cuDNN v5.1 for GPU acceleration. Supported interfaces: Python, C++, C# and Command line interface Download CNTK Download cuDNN 先頭へ戻る
	Googleが開発したデータフローグラフを用いた DeepLearning、人工知能、多層ニューラルネットワーク、数値計算のためのソフトウェアライブラリ。2015年11月、オープンソース公開されました。 https://www.tensorflow.org/install/#download-and-setup Software library for numerical computation using data flow graphs, developed by Google’s Machine Intelligence research organization. TensorFlow is a software library for numerical computation using data flow graphs, developed by Google’s Machine Intelligence research organization. TensorFlow supports cuDNN v5.1 for GPU acceleration. Supported interfaces: C++, Python Download TensorFlow Download cuDNN 先頭へ戻る
	http://deeplearning.net/software/theano/ Theano is a math expression compiler that efficiently defines, optimizes, and evaluates mathematical expressions involving multi-dimensional arrays. Theano supports cuDNN v5 for GPU acceleration. Supported interfaces: Python Learning resources •Deep learning course: Getting Started with the Theano Framework https://pypi.python.org/pypi/Theano Download Theano-0.8.2.tar.gz Download Theano-0.9.0.tar.gz Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs. Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. 先頭へ戻る
	http://torch.ch/ Torch is a scientific computing framework that offers wide support for machine learning algorithms. Torch supports cuDNN v5 for GPU acceleration. Supported interfaces: C, C++, Lua Learning resources •Deep learning course: Getting Started with the Torch Framework •Blog: Understanding Natural Language with Deep Neural Networks Using Torch https://github.com/soumith/cudnn.torch ■ Ubuntu 12.04 の場合 Download ZIP https://github.com/torch/torch7 torch7-nv は、0.9.92 から 0.9.98-1+cuda7.5 にアップデート Ubunt 16.04 の場合は、NVIDIA DIGITS 4.0 / 5.0 をソースからビルドするときに、こちらのページの指示に従って、torch7 をビルドしておくおくことが、推奨されています。（インストールしませんと、DIGITS がエラーとなることが多々ありますので、基本的には必ずインストール） Build torch from source using Git 先頭へ戻る
	http://mxnet.readthedocs.io/en/latest/how_to/build.html Deep learning framework designed for both efficiency and flexibility that allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity. MXnet is a deep learning framework designed for both efficiency and flexibility that allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity. MXnet supports cuDNN v5 for GPU acceleration. Supported Interfaces: Python, R, C++, Julia Download MXnet Download cuDNN 先頭へ戻る
	Preferred Networks／Preferred Infrastructure が開発した日本製の深層学習フレームワーク。、ニューラルネットワークを実装するためのライブラリ define-by-runの原則の上に設計されており、ランタイムでのネットワーク修正が可能で、任意の制御フローステートメントを使えるようになります http://chainer.org/ A Powerful, Flexible, and Intuitive Framework of Neural Networks Pythonで動作し、実行にはCUDAが必須となります。 https://github.com/pfnet/chainer Deep learning framework that’s designed on the principle of define-by-run. $ sudo pip install chainer Chainer is a deep learning framework that’s designed on the principle of define-by-run. Unlike frameworks that use the define-and-run approach, Chainer lets you modify networks during runtime, allowing you to use arbitrary control flow statements. Chainer supports cuDNN v5.1 for GPU acceleration. Supported Interfaces: Python Download Chainer Download cuDNN 先頭へ戻る
BIDMach	Welcome to the BID Data Project! Here you will find resources for the fastest Big Data tools on the Web. See our Benchmarks on github. BIDMach running on a single GPU-equipped host holds the records for many common machine learning problems, on single nodes or clusters. Try It! BIDMach is an interactive environment designed to make it extremely easy to build and use machine learning models. BIDMach runs on Linux, Windows 7&8, and Mac OS X, and we have a pre-loaded Amazon EC2 instance. See the instructions in the Download Section. Develop with it. BIDMach includes core classes that take care of managing data sources, optimization and distributing data over CPUs or GPUs. It’s very easy to write your own models by generalizing from the models already included in the Toolkit. Download http://bid2.berkeley.edu/bid-data-project/download/ Download Source https://github.com/BIDData/BIDMach The BID Data Suite is freely available under a BSD-style license. Complete bundles for Linux, Windows and Mac are available here. BIDMat is a self-contained matrix toolkit. BIDMach uses BIDMat as a library but includes it in the bundle, so you dont need to download BIDMat as well. The basic bundles require Java 1.7, and for GPU use, CUDA 7.0. If you want to use the IScala notebooks, you need a recent version of IPython installed. The current release is Version 1.0.3. •BIDMach 1.0.3 for 64-bit Linux •BIDMach 1.0.3 for 64-bit windows •BIDMach 1.0.3 for 64-bit Mac OSX •BIDMat 1.0.3 for 64-bit Linux •BIDMat 1.0.3 for 64-bit windows •BIDMat 1.0.3 for 64-bit Mac OSX Full bundles which include the CUDA 7.0 runtime. •BIDMach 1.0.3 for 64-bit Linux •BIDMat 1.0.3 for 64-bit Linux CUDA 7.5、8.0 を使用する場合は、Github よりソースを入手し、コンパするする必要あり。 Source code is available on Github here: https://github.com/BIDData/BIDMach •BIDMat •BIDMach 先頭へ戻る
Keras	Keras は，Python で書かれた、TensorFlow または Theano 上で実行可能な高水準のニューラルネットワークライブラリです。 Keras は，迅速な実験を可能にすることに重点を置いて開発されました。 Keras は次のような深層学習ライブラリが必要な場合に適しています。: ・簡単で早くプロトタイプ作成が可能 (全体的なモジュール性，ミニマリズム，および拡張性による) ・CNNとRNNの両方，およびこれらの2つの組み合わせをサポート・任意の接続方式 (複数入力および複数出力の学習を含む) をサポート・CPUとGPUでシームレスな実行 Keras は Python 2.7-3.5 と互換性があります． https://keras.io/ （English） https://keras.io/ja/ （日本語） GitHub fcholet Keras https://github.com/fchollet/keras Download Source from GitHub https://github.com/fchollet/keras/archive/master.zip 先頭へ戻る

Another Frameworks

以上の深層学習用のフレームワークの他にも、一例としまして、以下のようなものがあります。

Brainstorm	https://github.com/IDSIA/brainstorm Combining lessons from previous projects with new design elements, and written entirely in Python, Brainstorm has been designed to work on multiple platforms with multiple computing backends.
Kaldi	https://github.com/kaldi-asr/kaldi Kaldi Speech Recognition Toolkit
MatConvNet	http://www.vlfeat.org/matconvnet/ MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications.
MaxDNN	https://github.com/eBay/maxDNN High Efficiency Convolution Kernel for NVIDIA Maxwell GPU Architecture.
Deeplearning4j	http://deeplearning4j.org/ Deep Learning for Java First commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Skymind is its commercial support arm.
Lasagne (Theano)	https://github.com/Lasagne/Lasagne Lightweight library to build and train neural networks in Theano.
Marvin	http://marvin.is/ A minimalist GPU-only N-dimensional ConvNet framework.
Leaf	https://github.com/autumnai/leaf Open Machine Intelligence Framework for Hackers. (GPU/CPU) http://autumnai.com/leaf/book

先頭へ戻る


Self-Paced Courses for Deep Learning	https://developer.nvidia.com/deep-learning-courses Introduction to Deep Learning Getting Started with DIGITS for Image Classification Getting Started with the Caffe Framework Getting Started with the Theano Framework Getting Started with the Torch Framework 先頭へ戻る

GPU ハードウェア要件 Turing Architecture GPUs	プロセッサ	ドライバー	CUDA Toolkit	DIGITS	cuDNN
Frameworks
Caffe	theano	torch	BIDMach	Keras	Another Frameworks

	Avast Software s.r.o. は、 Gen™ Digital Inc. に社名変更となりました
	Avast^® パートナー Avast^® Small Business Security Avast^® Business Antivirus for Linux^® Avast^® Business CloudCare™
	インテル^® ソフトウェア開発製品販売代理店 Intel^® Software Resellers
	Tax Exemption Designated Store 外国公使館向け消費税免税店舗

Avast^® パートナー
インテル^® ソフトウェア開発製品販売代理店
Intel^® Software Resellers

株式会社アークブレイン
〒151-0073
東京都渋谷区笹塚２丁目４７番１号
TEL 03-3375-8968
FAX 03-3375-8767 (09:00～18:00 土日祝日を除く)
お問い合わせ、御見積依頼はこちらからどうぞ

Intel®、インテル®、Intel® ロゴ、Atom™、Core™、Xeon®、Phi™、Pentinum®は、米国およびその他の国におけるIntel® Corporation の商標です。 NVIDIA®、NVIDIA®ロゴ、GeForce、Quadroは、米国NVIDIA® corporationの登録商標です。 AMD®, AMD® Arrowロゴ、ならびにその組み合わせは、Advanced Micro Devices, Inc.の商標です。 Microsoft®（その他商標・登録商標名）は、米国 Microsoft® Corporation の米国およびその他の国における登録商標または商標です。 Windows®の正式名称は、Microsoft® Windows® Operating Systemです。 Linux® は、Linus Torvalds 氏の米国およびその他の国における登録商標です。 RED HATとShadowman logoは米国およびそのほかの国において登録されたRed Hat, Inc. の商標です。 CentOSの名称およびそのロゴは、CentOS ltdの商標または登録商標です。 Ubuntu は Canonical Ltd. の登録商標です。 Linux Mint は Linux Mark Institute の商標です。 IMSL® は、米国およびその他の国における Rouge Wave Software, Inc. の商標です。 Avast™ は、Avast Software の商標です。 AVG® は AVG Technologies の登録商標です。 Python® はPSFの登録商標です。その他、記載されている会社名、製品名は、各社の登録商標または商標です。

▲TOP