TensorFlow が正式に Windows サポートして GPU が使えたので試してみた

Google から正式に Tensorflow が Windows 対応して GPU が使えるとのアナウンスがありました。
https://developers.googleblog.com/2016/11/tensorflow-0-12-adds-support-for-windows.html

セットアップ環境

OS) Windows 10 Pro
GPU) NVIDIA GeForce GTX 960

この環境で TensorFlow を試しみたいと思います。なるべくDドライブにセットアップするようにしています。

CUDA と cuDNN をインストール

CUDA Toolkit 8.0

https://developer.nvidia.com/cuda-downloads

Operating System: Windows
Architecture: x86_64
Version: 10
Installer Type: exe (network)

普通にインストーラを実行しました。

cuDNN v5.1

開発者アカウントを登録して利用規約に同意してダウンロードしました。

https://developer.nvidia.com/rdp/cudnn-download cuDNN v5.1 Library for Windows 10 を落として zip 内の cuda フォルダ内を C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 にコピー展開します。

Python 3.5 のセットアップ

Windows の場合 Anaconda を使った方が後々楽なのでこちらを奨めます。 (2017/01/08 追記)
http://tilfin.hatenablog.com/entry/2017/01/08/220556 内に記載してます。

普通にトップページからダウンロードすると 32bit 版なので、https://www.python.org/downloads/windows/ から Download Windows x86-64 executable installer をダウンロードしてインストールします。 ※ なお、 D:\Python35 にインストールしました。

3.5系は環境変数の追加や pip も同時に入れてくれました。 PowerShell から一応 pip のアップグレードもしました。

virtualenv をインストール

実行環境用の作業フォルダを作れるモジュールを入れます。

PS D:\> pip install --upgrade virtualenv
Collecting virtualenv
  Downloading virtualenv-15.1.0-py2.py3-none-any.whl (1.8MB)
    100% |################################| 1.8MB 646kB/s
Installing collected packages: virtualenv
Successfully installed virtualenv-15.1.0

TensorFlow のパッケージをインストール

PowerShell から pip で tensorflow と tensorflow-gpu を入れます。 https://pypi.python.org/pypi/tensorflow

tensorflow

PS D:\> pip install tensorflow
Processing y:\tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl
Collecting six>=1.10.0 (from tensorflow==0.12.0rc0)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting protobuf==3.1.0 (from tensorflow==0.12.0rc0)
  Downloading protobuf-3.1.0-py2.py3-none-any.whl (339kB)
    100% |################################| 348kB 2.2MB/s
Collecting wheel>=0.26 (from tensorflow==0.12.0rc0)
  Using cached wheel-0.29.0-py2.py3-none-any.whl
Collecting numpy>=1.11.0 (from tensorflow==0.12.0rc0)
  Downloading numpy-1.11.2-cp35-none-win_amd64.whl (7.6MB)
    100% |################################| 7.6MB 179kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools in d:\python35\lib\site-packages (from protobuf==3
1.0->tensorflow==0.12.0rc0)
Installing collected packages: six, protobuf, wheel, numpy, tensorflow
Successfully installed numpy-1.11.2 protobuf-3.1.0 six-1.10.0 tensorflow-0.12.0rc0 wheel-0.29.0

tensorflow-gpu

PS D:\> pip install tensorflow-gpu
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl (32.5MB)
    100% |################################| 32.5MB 40kB/s
Requirement already satisfied: wheel>=0.26 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: protobuf==3.1.0 in d:\python35\lib\site-packages (from tensorflow-gpu)
Requirement already satisfied: setuptools in d:\python35\lib\site-packages (from protobuf==3.1.0->tensorflow-gpu)
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-0.12.0rc0

実行開始

virtualenv で D:\tensorflow に作業フォルダを作ります。

PS D:\> virtualenv --system-site-packages D:\tensorflow
Using base prefix 'd:\\python35'
New python executable in D:\tensorflow\Scripts\python.exe
Installing setuptools, pip, wheel...done.

学習解析サンプルを用意する

適当なところで git clone --recurse-submodules https://github.com/tensorflow/tensorflow します。自分は普段 VirtualBox の Linux を動かしていて SMB でファイル共有するのでそちらでクローンしました。 tensorflow/tensorflow/models を D:\tensorflow\models となるようにコピーします。

MNIST を試す

手書き数字の解析プログラムを試してみます。

PS D:\> cd tensorflow\tensorflow\models\image\mnist

PS D:\tensorflow\models\image\mnist> python convolutional.py

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu
lly opened CUDA library curand64_80.dll locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] F
ound device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] D
MA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0
:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] C
reating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] C
ould not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been bu
ilt with NUMA support.
Initialized!
Step 0 (epoch 0.00), 50.9 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 12.1 ms
Minibatch loss: 3.226, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 7.3%
Step 200 (epoch 0.23), 12.0 ms
Minibatch loss: 3.404, learning rate: 0.010000
Minibatch error: 10.9%
（省略）
Minibatch loss: 1.609, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 1.0%
Test error: 0.8%

割と早く終わったので GPU が効いているのでしょう。上手く動いたことは確認できましたが、わかりやすい ImageNet を次に試します。

ImageNet を試す

画像を解析して何の画かを当てる ImageNet です。

PS D:> cd \tensorflow\models\imagenet

まず準備です。 python .\classify_image.py を実行します。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
>> Downloading inception-2015-12-05.tgz 100.0%
Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
custard apple (score = 0.00141)
earthstar (score = 0.00107)

適当に M:\fuji.jpg に富士山の写真をおきました。 python .\classify_image.py --image_file M:\fuji.jpg で解析させます。

PS D:\tensorflow\models\image\imagenet> python .\classify_image.py --image_file M:\fuji.jpg
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.
dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80
.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll
 locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8
0.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.64GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0)
 -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo
b:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr
ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
volcano (score = 0.91087)
fire screen, fireguard (score = 0.00192)
alp (score = 0.00162)
lakeside, lakeshore (score = 0.00130)
geyser (score = 0.00077)

volcano (score = 0.91087) 火山と認識されましたね。ちなみに写真は雪化粧してる富士山でした。

W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate
 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

と警告が出ていたのでもっとメモリがあるといいのでしょう。

とりあえず特に嵌らずに動いたのでみなさんもお試しください。