AWS Deep Learning AMI (Ubuntu 16.04)
Amazon Web Services | 48Linux/Unix, Ubuntu 16.04 - 64-bit Amazon Machine Image (AMI)
pytorch_36 breaks
Also unlike the description, the pytorch versioning running there was 1.0.0 not 1.4
Updating the torch version lead to some technical issues. Terrible.
- Leave a Comment |
- 1 comment |
- Mark review as helpful
This AMI Breaks Horovod under tensorflow_p36 env
So this AMI SHOULD NOT be used for Horovod from what I can see - there are multiple versions of openMPI installed and it really breaks everything.
I'm not sure I see why there packages installed globally and also under conda environments, it causes big problems.
I'm happy to be proved wrong but after 12 hours of messing about with it I've decided to build my own image which I shall share!
tensorflow_p36 doesn't work with GPU
You pretty much need to set it up everything yourself. Here is what you get out-of-the-box:
(tensorflow_p36) ubuntu@ip-172-31-6-11:~$ source activate tensorflow_p36
(tensorflow_p36) ubuntu@ip-172-31-6-11:~$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-05-01 23:37:47.936444: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
Aborted (core dumped)
Doesn't work
I've used previous versions <=v20.0 quite often and they worked okayish. In v20 the unattended upgrade process hangs itself and doesn't terminate even if you wait a long time. So you actually have to kill a bunch of processes before you can install additional packages. But now v22.0 doesn't even start due to a missing snapshot.
Tensorflow was not using GPU
I used this AMI for 40 Hours.
and I found out why my models were very slowly trained :
tensorflow/core/common_runtime/gpu/gpu_device.cc:1 406] Ignoring visible gpu device (device: 0, name: GRID K520, pci bus id: 0000:0 0:03.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum r equired Cuda capability is 3.5.
env
conda/3python 3.6/tensorflow
I used a GX2 instance for nothing.
Thanks a lot for the time lost
Not great
This AMI did not work as I expected. I found it easier to install Tensorflow for Python from binaries myself, or using the official Docker Tensorflow image. For context I run Tensorflow for Python on MacOS and Ubuntu. It sounds like the AMI works as expected for people using Conda?
Worked out of the box
Really easy to use. I was able to start developing a Keras/Tensorflow model in under 5 minutes. Multi-gpu training worked as expected.
Works very well
Works great. And if you have any trouble with it, there are plenty of resources available. Would recommend to anyone in need of a cloud solution for training deep learning models.
Works perfectly well!
Before I found this AMI I was wasting hours trying to set up a deep learning machine on my own. It was frustrating.
This AMI works perfectly well! There are lots of Anaconda environments prepared. Create a machine, activate the environment of choice. Go! Thanks AWS!
Ubuntu Deep Learning AMI - It works
So lots of dated reviews with these Deep Learning AMI's.
I can confirm as of Mon Nov 2017, this AMI has everything you need to stand up deep learning on a GPU in five minutes. Conda and all the goodies are installed.