Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Tensorflow Batchnorm Issue but otherwise good

  • By Everett Berry
  • on 11/15/2017

This is a great instance for the CUDA versions and configuration and once I fixed the issue below my training was very fast. HOWEVER you should be very careful with using tensorflow on this instance. It is a Frankenstein's Monster of bleeding edge tensorflow (1.4-rc0) plus some PRs which have not even been merged to master to take advantage of the Voltas and CUDA 9.

My issue was:
'AttributeError: can't set attribute' while using the BatchNormalization layer in Tensorflow. It relates to this PR (https://github.com/tensorflow/tensorflow/pull/13388) where a 'dtype' is added to BatchNorm to allow for FP16 and FP 32 operations. There is an extra line in the tensorflow included in this AMI in /usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/normalization.py on line 145, 'self.dtype = dtype' which causes the error above when using the normal BatchNorm api. Commenting this line out fixes the problem.

Weirdly, this assignment on line 145 is not included in the PR (although the dates and authors match) so I think there must have been a rebase or something. Regardless, the line exists in the tensorflow in this ami and will cause you pain on almost any neural network because they almost all use BatchNorm. I couldn't figure out where I should post this because the code on Github does not have this problem.

Other than that - this is a fine AMI and I'm grateful to AWS for providing it and
for their continued advances in GPUs.


There are no comments to display