TensorFlow – TechnoFob

Assuming you create a remote instance named ‘fvm’ (free VM):

gcloud compute instances create fvm \
 --image ubuntu-2004-focal-v20201211 \
 --image-project ubuntu-os-cloud \
 --boot-disk-device-name=freedisk \
 --boot-disk-size=30 \
 --machine-type=f1-micro \
 --zone=us-east1-b \
 --boot-disk-type=pd-standard \
 --metadata-from-file startup-script=startup_gcp_free.sh

With the following start-up script, installing python 3.9 on the remote machine:

echo "startup-script"

sudo apt-get update

sudo apt-get install -y locales
sudo DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata
sudo ln -fs /usr/share/zoneinfo/Asia/Singapore /etc/localtime
sudo dpkg-reconfigure -f noninteractive tzdata

sudo apt -y install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt -y install python3.9
python3.9 --version

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.9 1

You can view the log file for the startup script here

cat /var/log/syslog

Connect to the remote GCP instance:

gcloud compute ssh fvm

if connecting over VPN with no public address:
gcloud compute ssh --internal-ip fvm

Create remote directory:

gcloud compute ssh fvm -- 'mkdir -p ~/log'

Start / Stop instances:

gcloud compute instances stop fvm
gcloud compute instances start fvm

Delete an instance:

gcloud compute instances delete fvm

Copy files back and forth from the local machine to the GCP instance:

Copy files from the local machine to the GCP instance:
gcloud compute scp --compress /Users/user/Documents/develop/python/.../*.py fvm:~/<folder>

Unzip files on the remote machine:
gcloud compute ssh fml -- 'unzip ~/<folder>/data.zip -d ~/<folder>'

Compress the result files on the remote machine:
zip -r data.zip <folder>

Copy the compressed files to the local machine:
gcloud compute scp --compress fvm:~/<folder>/data.zip /Users/user/Documents/develop/<folder>/data.zip

If you plan to push a GPU-enabled Docker image to the VM, you should publish it first:

docker build -f tf2_gpu.dockerfile --force-rm --tag tf2.3_gpu:1.0 .
docker images -a
docker tag tf2.3_gpu:1.0 asia.gcr.io/<GCP project name>/tf2.3_gpu
docker push asia.gcr.io/<GCP project name>/tf2.3_gpu

Connecting to the docker image (from the remote machine)

docker run -it asia.gcr.io/<GCP Project name>/tf2.3_gpu sh

remove the previous docker instance

docker rm tf23gpu

Run a python file on the Docker image with a shared filesystem (GCP remote machine and the Docker image), using the -v command with the following format

-v [host-src:]container-dest[:<options>]

docker run --name=tf23gpu --gpus all --runtime=nvidia -d -v ~/auto:/usr/src/app/auto asia.gcr.io/<GCP project name>/tf2.3_gpu python3 <python filename>.py -<param1>=<value1> -<param2>=<value2> ...

If you are planning a relatively short training period (less than 24 hours), you may want to create a cheaper preemptible instance:

gcloud compute instances create gcp_instance_name \
--image ubuntu-1804-bionic-v20200916 \
--image-project ubuntu-os-cloud \
--boot-disk-device-name=boot_disk_name \
--boot-disk-size=150 \
--machine-type=n1-highmem-4 \
--accelerator=count=1,type=nvidia-tesla-t4 \
--maintenance-policy TERMINATE \
--boot-disk-type=pd-standard \
--network-interface subnet=default-subnet \
--metadata-from-file startup-script=startup_gcp.sh \
--preemptible

The startup script will install Docker and a few more useful libraries on your newly created machine:

echo "startup-script"

echo "Set locals and timezone"
sudo locale-gen "en_US.UTF-8"
sudo dpkg-reconfigure locales
sudo timedatectl set-timezone Asia/Singapore

# https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
echo "NVIDIA Driver Installation"
sudo apt-get install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -y install cuda-drivers
nvidia-smi

echo ":trying to remove docker engine (if exists)"
sudo apt-get remove docker docker-engine docker.io containerd runc

echo ":apt-get update"
sudo apt-get update
sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

echo ":curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

echo ":apt-key fingerprint 0EBFCD88"
sudo apt-key fingerprint 0EBFCD88

echo ":sudo add-apt-repository..."
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

sudo apt-get update

echo ":sudo apt-get install -y docker-ce docker-ce-cli containerd.io"
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# https://docs.docker.com/engine/install/linux-postinstall/
echo ":using docker without sudo"
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

echo Configure authentication to Container Registry.

# https://cloud.google.com/container-registry/docs/advanced-authentication
VERSION=2.0.2
OS=linux
ARCH=amd64

echo ":curl -L ..."
curl -L "https://github.com/GoogleCloudPlatform/docker-credential-gcr/releases/download/v${VERSION}/docker-credential-gcr_${OS}_${ARCH}-${VERSION}.tar.gz" -o docker-credential-gcr_linux_amd64-2.0.2.tar.gz

echo ":tar xvf ..."
tar xvf "./docker-credential-gcr_${OS}_${ARCH}-${VERSION}.tar.gz"

echo ":sudo cp ./docker-credential-gcr /usr/local/bin/docker-credential-gcr"
sudo cp ./docker-credential-gcr /usr/local/bin/docker-credential-gcr

echo ":chmod +x /usr/local/bin/docker-credential-gcr"
sudo chmod +x /usr/local/bin/docker-credential-gcr

echo ":docker-credential-gcr configure-docker"
docker-credential-gcr configure-docker

echo "Install nvidia-docker"
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

sudo apt install zip unzip

The docker file may look like this:

# We start with specifying our base image. Use the FROM keyword to do that -
# FROM tensorflow/tensorflow:2.3.0-gpu
# FROM tensorflow/tensorflow:latest-gpu
FROM tensorflow/tensorflow:nightly-gpu


RUN apt-get install -y locales
RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

# First, we set a working directory and then copy all the files for our app.
WORKDIR /usr/src/app

# copy all the files to the container
# adds files from your Docker client’s current directory.
COPY . .

RUN python3 -m pip install --upgrade pip

# install dependencies
# RUN pip install -r requirements.txt

RUN pip install numpy pandas sklearn matplotlib pandas_gbq

RUN apt-get install -y nano

RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata
RUN ln -fs /usr/share/zoneinfo/Asia/Singapore /etc/localtime
RUN dpkg-reconfigure -f noninteractive tzdata

For cases where you will need to share directories between the host and the Docker, use:

docker run --name=docker_instance_name --gpus all -d -v ....

Based on this reference:

-v --volume=[host-src:]container-dest[:<options>]: Bind mount a volume.
-d to start a container in detached mode
--gpus GPU devices to add to the container (‘all’ to pass all GPUs)

Category: TensorFlow

Useful gcloud command-line (CLI) commands

Running TensorFlow 2.x GPU on Docker and GCP

Share this:

Share this: