TensorFlow が正式に Windows サポートして GPU が使えたので試してみた
Google から正式に Tensorflow が Windows 対応して GPU が使えるとのアナウンスがありました。
https://developers.googleblog.com/2016/11/tensorflow-0-12-adds-support-for-windows.html
セットアップ環境
- OS) Windows 10 Pro
- GPU) NVIDIA GeForce GTX 960
この環境で TensorFlow を試しみたいと思います。なるべくDドライブにセットアップするようにしています。
CUDA と cuDNN をインストール
CUDA Toolkit 8.0
https://developer.nvidia.com/cuda-downloads
普通にインストーラを実行しました。
cuDNN v5.1
開発者アカウントを登録して利用規約に同意してダウンロードしました。
https://developer.nvidia.com/rdp/cudnn-download
cuDNN v5.1 Library for Windows 10 を落として zip 内の cuda フォルダ内を C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
にコピー展開します。
Python 3.5 のセットアップ
Windows の場合 Anaconda を使った方が後々楽なのでこちらを奨めます。 (2017/01/08 追記)
http://tilfin.hatenablog.com/entry/2017/01/08/220556 内に記載してます。
普通にトップページからダウンロードすると 32bit 版なので、https://www.python.org/downloads/windows/ から Download Windows x86-64 executable installer をダウンロードしてインストールします。
※ なお、 D:\Python35
にインストールしました。
3.5系は環境変数の追加や pip も同時に入れてくれました。 PowerShell から一応 pip のアップグレードもしました。
virtualenv をインストール
実行環境用の作業フォルダを作れるモジュールを入れます。
PS D:\> pip install --upgrade virtualenv Collecting virtualenv Downloading virtualenv-15.1.0-py2.py3-none-any.whl (1.8MB) 100% |################################| 1.8MB 646kB/s Installing collected packages: virtualenv Successfully installed virtualenv-15.1.0
TensorFlow のパッケージをインストール
PowerShell から pip で tensorflow と tensorflow-gpu を入れます。 https://pypi.python.org/pypi/tensorflow
tensorflow
PS D:\> pip install tensorflow Processing y:\tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl Collecting six>=1.10.0 (from tensorflow==0.12.0rc0) Using cached six-1.10.0-py2.py3-none-any.whl Collecting protobuf==3.1.0 (from tensorflow==0.12.0rc0) Downloading protobuf-3.1.0-py2.py3-none-any.whl (339kB) 100% |################################| 348kB 2.2MB/s Collecting wheel>=0.26 (from tensorflow==0.12.0rc0) Using cached wheel-0.29.0-py2.py3-none-any.whl Collecting numpy>=1.11.0 (from tensorflow==0.12.0rc0) Downloading numpy-1.11.2-cp35-none-win_amd64.whl (7.6MB) 100% |################################| 7.6MB 179kB/s Requirement already satisfied (use --upgrade to upgrade): setuptools in d:\python35\lib\site-packages (from protobuf==3 1.0->tensorflow==0.12.0rc0) Installing collected packages: six, protobuf, wheel, numpy, tensorflow Successfully installed numpy-1.11.2 protobuf-3.1.0 six-1.10.0 tensorflow-0.12.0rc0 wheel-0.29.0
tensorflow-gpu
PS D:\> pip install tensorflow-gpu Collecting tensorflow-gpu Downloading tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl (32.5MB) 100% |################################| 32.5MB 40kB/s Requirement already satisfied: wheel>=0.26 in d:\python35\lib\site-packages (from tensorflow-gpu) Requirement already satisfied: numpy>=1.11.0 in d:\python35\lib\site-packages (from tensorflow-gpu) Requirement already satisfied: six>=1.10.0 in d:\python35\lib\site-packages (from tensorflow-gpu) Requirement already satisfied: protobuf==3.1.0 in d:\python35\lib\site-packages (from tensorflow-gpu) Requirement already satisfied: setuptools in d:\python35\lib\site-packages (from protobuf==3.1.0->tensorflow-gpu) Installing collected packages: tensorflow-gpu Successfully installed tensorflow-gpu-0.12.0rc0
実行開始
virtualenv で D:\tensorflow
に作業フォルダを作ります。
PS D:\> virtualenv --system-site-packages D:\tensorflow Using base prefix 'd:\\python35' New python executable in D:\tensorflow\Scripts\python.exe Installing setuptools, pip, wheel...done.
学習解析サンプルを用意する
適当なところで
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
します。自分は普段 VirtualBox の Linux を動かしていて SMB でファイル共有するのでそちらでクローンしました。
tensorflow/tensorflow/models
を D:\tensorflow\models
となるようにコピーします。
MNIST を試す
手書き数字の解析プログラムを試してみます。
PS D:\> cd tensorflow\tensorflow\models\image\mnist
PS D:\tensorflow\models\image\mnist> python convolutional.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu lly opened CUDA library cublas64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu lly opened CUDA library cudnn64_5.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu lly opened CUDA library cufft64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu lly opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfu lly opened CUDA library curand64_80.dll locally Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting data\train-images-idx3-ubyte.gz Extracting data\train-labels-idx1-ubyte.gz Extracting data\t10k-images-idx3-ubyte.gz Extracting data\t10k-labels-idx1-ubyte.gz I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] F ound device 0 with properties: name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate (GHz) 1.253 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.64GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] D MA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0 : Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] C reating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0) E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] C ould not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been bu ilt with NUMA support. Initialized! Step 0 (epoch 0.00), 50.9 ms Minibatch loss: 8.334, learning rate: 0.010000 Minibatch error: 85.9% Validation error: 84.6% Step 100 (epoch 0.12), 12.1 ms Minibatch loss: 3.226, learning rate: 0.010000 Minibatch error: 4.7% Validation error: 7.3% Step 200 (epoch 0.23), 12.0 ms Minibatch loss: 3.404, learning rate: 0.010000 Minibatch error: 10.9% (省略) Minibatch loss: 1.609, learning rate: 0.006302 Minibatch error: 0.0% Validation error: 1.0% Test error: 0.8%
割と早く終わったので GPU が効いているのでしょう。上手く動いたことは確認できましたが、わかりやすい ImageNet を次に試します。
ImageNet を試す
画像を解析して何の画かを当てる ImageNet です。
PS D:> cd \tensorflow\models\imagenet
まず準備です。 python .\classify_image.py
を実行します。
PS D:\tensorflow\models\image\imagenet> python .\classify_image.py I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8 0.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5. dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80 .dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8 0.dll locally >> Downloading inception-2015-12-05.tgz 100.0% Successfully downloaded inception-2015-12-05.tgz 88931400 bytes. I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate (GHz) 1.253 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.64GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0) E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo b:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support. W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization(). W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264) custard apple (score = 0.00141) earthstar (score = 0.00107)
適当に M:\fuji.jpg
に富士山の写真をおきました。
python .\classify_image.py --image_file M:\fuji.jpg
で解析させます。
PS D:\tensorflow\models\image\imagenet> python .\classify_image.py --image_file M:\fuji.jpg I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_8 0.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5. dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80 .dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_8 0.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate (GHz) 1.253 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.64GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0) E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /jo b:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support. W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is depr ecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization(). W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. volcano (score = 0.91087) fire screen, fireguard (score = 0.00192) alp (score = 0.00162) lakeside, lakeshore (score = 0.00130) geyser (score = 0.00077)
volcano (score = 0.91087) 火山と認識されましたね。ちなみに写真は雪化粧してる富士山でした。
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
と警告が出ていたのでもっとメモリがあるといいのでしょう。
とりあえず特に嵌らずに動いたのでみなさんもお試しください。