Deep Learning on AWS GPU instances with Python and TensorFlow – Part 1

This multipart tutorial will show you how to:

  • Launch a GPU instance on AWS, SSH into it and set up TensorFlow.
  • Blah

I will assume that you have an AWS user account with admin rights and have downloaded the accessKeys.csv file.

Setting up AWS on your local machine

First, install the AWS command line tool using the python pip installer (for other options, see here).

 sudo pip3 install awscli 

Now you should be able to use the aws command:

$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: too few arguments

In order to In order to use the aws account, we have to provide the right credentials. We can do so running aws configure and inputting the details from you accessKey.csv file:

 $ aws configure
 AWS Access Key ID: <acces_key_id>
 AWS Secret Access Key: <secret_access_key>
 Default region name [us-east-1]: us-east-1
 Default output format [None]: <ENTER>

Your config will be stored in ~/.aws

You can test whether you user has admin by running:

$ aws ec2 describe-instances --output table
-------------------
|DescribeInstances|
+-----------------+

If successful you should see the above. Now create an access group ‘my-sg’ and set access rights with ssh access:

$ aws ec2 create-security-group --group-name my-sg --description "My security group"

$ aws ec2 authorize-security-group-ingress --group-name my-sg \
  --protocol tcp --port 22 --cidr 0.0.0.0/0

Next, Create the ssh access key and save it to  ~/.aws/my_aws_key.pem

$ aws ec2 create-key-pair --key-name my_aws_key \
  --query 'KeyMaterial' --output text &amp;amp;amp;gt; ~/.aws/my_aws_key.pem

  chmod 400 ~/.aws/my_aws_key.pem

Now we are ready to launch our EC2 instance 🙂

Launch Ubuntu 14.04 GPU instance

I like to use a basic Ubuntu 14.04 instance (image-id = ami-fce3c696)

$ aws ec2 run-instances --image-id ami-fce3c696 \
  --count 1 --instance-type g2.2xlarge \
  --key-name my_aws_key

Assuming all has gone well, you should now have an instance up and running!

Increase the size of the file system

The first time i did this, I ran into issues with not having i=enough space on the file system to install everything I needed. Follow the steps below to resize the filesystem.

  1. From aws console, stop the instance
  2. From aws console, detach the volume (though note the mount point under attachment info, eg /dev/sda1)
  3. From aws console, take a snapshot of the volume
  4. From aws console, create a new volume using the snapshot (for my g2.2xlarge i went with 800GB)
  5. From aws console, attach the new volume to original mount point /dev/sda1
  6. From aws console, restart the instance

SSH in to the Instance

The command for getting the IP of all running instances is a little clunky so it can be useful to create and alias in the ~/.bashrc file:

alias aws_get_ip='aws ec2 describe-instances --query "Reservations[*].Instances[*].PublicIpAddress" --output=text'

Now, we can ssh in to the instance like so:

alias aws_get_ip='aws ec2 describe-instances --query "Reservations[*].Instances[*].PublicIpAddress" --output=text'

Install CUDA 7.5

sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
rm cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

You should now reboot your machine

sudo reboot

Install cuDNN v4

Register and download the cuDNN v4 from here. You can then put it into your Google Drive folder and share the link:

wget https://www.dropbox.com/s/.../cudnn-7.5-linux-x64-v5.1.tgz
tar xvzf cudnn-7.5-linux-x64-v5.1.tgz
rm cudnn-7.5-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include # move library files to /usr/local/cuda
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
rm -rf ~/cuda

Install Anaconda with python 3.5

wget http://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
bash Anaconda3-4.2.0-Linux-x86_64.sh -b -p ~/bin/anaconda3
rm Anaconda3-4.2.0-Linux-x86_64.sh
echo 'export PATH="$HOME/bin/anaconda3/bin:$PATH"' >> ~/.bashrc

Setup tensorFlow

pip install tensorflow

Run a MNIST classifier and monitor the system usage

To finish the installation process, let’s run a MNIST classifier and monitor the system usage.

First, install the required packages:

sudo apt-get install htop
pip install gpustat
sudo nvidia-smi daemon  ## run daemon to make monitoring faster

Now start byobu (terminal multiplexer, similar to tmux or GNU screen):

byobu

Next, press Ctrl-F2 to split the window vertically and run htop:

htop

Press Shift-F2 to split the window horizontally and run the continous GPU monitor gpustat:

watch --color -n1.0 gpustat -cp

With Shift-<left> move to the left pannel, download the MNIST classification script and execute it:

wget https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/mnist/convolutional.py
python convolutional.py

Leave a Reply

Your email address will not be published. Required fields are marked *