This multipart tutorial will show you how to:
- Launch a GPU instance on AWS, SSH into it and set up TensorFlow.
- Blah
I will assume that you have an AWS user account with admin rights and have downloaded the accessKeys.csv file.
Setting up AWS on your local machine
First, install the AWS command line tool using the python pip installer (for other options, see here).
sudo pip3 install awscli
Now you should be able to use the aws command:
$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
aws help
aws <command> help
aws <command> <subcommand> help
aws: error: too few arguments
In order to In order to use the aws account, we have to provide the right credentials. We can do so running aws configure and inputting the details from you accessKey.csv file:
$ aws configure
AWS Access Key ID: <acces_key_id>
AWS Secret Access Key: <secret_access_key>
Default region name [us-east-1]: us-east-1
Default output format [None]: <ENTER>
Your config will be stored in ~/.aws
You can test whether you user has admin by running:
$ aws ec2 describe-instances --output table ------------------- |DescribeInstances| +-----------------+
If successful you should see the above. Now create an access group ‘my-sg’ and set access rights with ssh access:
$ aws ec2 create-security-group --group-name my-sg --description "My security group" $ aws ec2 authorize-security-group-ingress --group-name my-sg \ --protocol tcp --port 22 --cidr 0.0.0.0/0
Next, Create the ssh access key and save it to ~/.aws/my_aws_key.pem
$ aws ec2 create-key-pair --key-name my_aws_key \ --query 'KeyMaterial' --output text &amp;amp;gt; ~/.aws/my_aws_key.pem chmod 400 ~/.aws/my_aws_key.pem
Now we are ready to launch our EC2 instance 🙂
Launch Ubuntu 14.04 GPU instance
I like to use a basic Ubuntu 14.04 instance (image-id = ami-fce3c696)
$ aws ec2 run-instances --image-id ami-fce3c696 \ --count 1 --instance-type g2.2xlarge \ --key-name my_aws_key
Assuming all has gone well, you should now have an instance up and running!
Increase the size of the file system
The first time i did this, I ran into issues with not having i=enough space on the file system to install everything I needed. Follow the steps below to resize the filesystem.
- From aws console, stop the instance
- From aws console, detach the volume (though note the mount point under attachment info, eg
/dev/sda1
) - From aws console, take a snapshot of the volume
- From aws console, create a new volume using the snapshot (for my g2.2xlarge i went with 800GB)
- From aws console, attach the new volume to original mount point /dev/sda1
- From aws console, restart the instance
SSH in to the Instance
The command for getting the IP of all running instances is a little clunky so it can be useful to create and alias in the ~/.bashrc file:
alias aws_get_ip='aws ec2 describe-instances --query "Reservations[*].Instances[*].PublicIpAddress" --output=text'
Now, we can ssh in to the instance like so:
alias aws_get_ip='aws ec2 describe-instances --query "Reservations[*].Instances[*].PublicIpAddress" --output=text'
Install CUDA 7.5
sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
rm cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda
You should now reboot your machine
sudo reboot
Install cuDNN v4
Register and download the cuDNN v4 from here. You can then put it into your Google Drive folder and share the link:
wget https://www.dropbox.com/s/.../cudnn-7.5-linux-x64-v5.1.tgz tar xvzf cudnn-7.5-linux-x64-v5.1.tgz rm cudnn-7.5-linux-x64-v5.1.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda/include # move library files to /usr/local/cuda sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* rm -rf ~/cuda
Install Anaconda with python 3.5
wget http://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
bash Anaconda3-4.2.0-Linux-x86_64.sh -b -p ~/bin/anaconda3
rm Anaconda3-4.2.0-Linux-x86_64.sh
echo 'export PATH="$HOME/bin/anaconda3/bin:$PATH"' >> ~/.bashrc
Setup tensorFlow
pip install tensorflow
Run a MNIST classifier and monitor the system usage
To finish the installation process, let’s run a MNIST classifier and monitor the system usage.
First, install the required packages:
sudo apt-get install htop
pip install gpustat
sudo nvidia-smi daemon ## run daemon to make monitoring faster
Now start byobu (terminal multiplexer, similar to tmux or GNU screen):
byobu
Next, press Ctrl-F2
to split the window vertically and run htop:
htop
Press Shift-F2
to split the window horizontally and run the continous GPU monitor gpustat
:
watch --color -n1.0 gpustat -cp
With Shift-<left>
move to the left pannel, download the MNIST classification script and execute it:
wget https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/mnist/convolutional.py
python convolutional.py