Setup for deep learning workstation

This page covers the steps for setting up a machine primarily intended for deep learning analyses. This is assuming Ubuntu has already been installed.

Ubuntu installation notes:

Other tips/notes on running analyses: KerasTips

If the system restarts and updates the kernel on you, you might have to reinstall CUDA/CUDNN... see DeepLearningKernelUpdate for details.

Initial setup

Early on, presumably right after installation of OS, remember to update all packages:

sudo apt-get update
sudo apt-get upgrade

Package installation

These two are definite necessities. In particular, need to install openssh-server before basically anything else because otherwise we can't get SSH access.

sudo apt-get install openssh-server
sudo apt-get install tightvncserver

The following may not be necessary anymore -- it was for our old VNC setup. But it shouldn't hurt to install these packages anyway, just in case we want to use something like the old setup again.

sudo apt-get install ubuntu-desktop gnome-panel gnome-settings-daemon metacity nautilus gnome-terminal

OLD VNC server configuration

 #!/bin/sh
[-x /etc/vnc/startup] && exec /etc/vnc/startup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
x-window-manager &
gnome-panel &
gnome-settings-daemon &
metacity &
nautilus

NEW VNC server configuration

Steps roughly follow https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-16-04. Exact instructions below:

Install the following packages:

sudo apt-get install xfce4 xfce4-goodies
sudo apt-get install autocutsel

Next, start up VNC:

vncserver :yourvncnumber

It will prompt you to set and confirm a password; do so. Then end the session:

vncserver -kill :yourvncnumber

This creates ~/.vnc/xstartup.

Either edit ~/.vnc/xstartup , or delete it and make a new file. IF MAKING A NEW FILE, enter this command as well:

chmod 755 ~/.vnc/xstartup

The new contents of ~/.vnc/xstartup should be:

xrdb $HOME/.Xresources
startxfce4 &

After you've edited (or deleted/recreated) xstartup, start a new VNC desktop:

vncserver :yourvncnumber -geometry 1280x800

When you open the VNC viewer, you might get a "Welcome to first start" message; select "Use default config". (You may also get an error message saying Ubuntu had a problem but it doesn't appear to cause issues.)

That should be the basic VNC setup. Other convenience functions/packages/etc below:

Adding users

sudo adduser newusername
sudo usermod -aG sudo newusername

Install additional packages

Enable copy/paste on VNC

Allows copy/paste between VNC windows and your computer. This has to be done at the beginning of every VNC session (so you should only need to do it once, unless you kill your VNC session, Agnew/Calculon/etc restart, etc).

autocutsel -fork

Enable the Tab key https://www.starnet.com/xwin32kb/tab-key-not-working-when-using-xfce-desktop/





CUDA/CUDNN/Keras/etc. setup

Install and run Anaconda

First, download the Anaconda installer from their website. (Just Google it.) We want Linux version, x86, 64-bit, Python 3.6 edition. Then:

sudo bash [name of anaconda .sh installer file]
when prompted, install into: /opt/anaconda3

Next, do the CUDA setup:

Download CUDA installer from NVidia (or actually, just get from Agnew/Calculon/etc.) For reference, the version we're running on Agnew/Calculon/Lrrr/Ndnd as of June 2017 is 8.0.44. Before we can actually install it though, we need to follow the following pages' instructions for shutting down display manager and blacklisting Nouveau. The links follow immediately, but see below them for the short summary of what we actually have to do.

http://askubuntu.com/questions/788323/change-runlevel-on-16-04

http://askubuntu.com/questions/481414/install-nvidia-driver-instead-nouveau (top solution)

(First askubuntu page:) To disable starting up in graphical mode:

sudo systemctl isolate multi-user.target
sudo systemctl enable multi-user.target
sudo systemctl set-default multi-user.target

(Second askubuntu page:) Now keep nouveau from running by editing the blacklist:

sudo nano /etc/modprobe.d/blacklist.conf

Add the following lines to that blacklist file (see Agnew/Calculon's blacklist files if you want to confirm you got it right)

blacklist amd76x_edac #this might not be required for x86 32 bit users.
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

Now the following is probably not necessary (as there shouldn't be any proprietary Nvidia drivers on the system yet) but not a bad idea to run anyway:

sudo apt-get remove --purge nvidia*

Now, restart the system. When restarted, you should be able to run the CUDA installer.

Running the CUDA installer

In most cases, you can just run the cuda_8.0.44_linux.run (or whatever version) installer file and accept most of the defaults. However, note the following exception:

GTX 1080Ti GPU (current as of June 2017 None of the CUDA installers currently have the right drivers for this card. So when you install CUDA, do not let it install the GPU driver! Instead, do everything else normally but don't install any driver. Then download the current driver for a 1080Ti card from the Nvidia website. (Lrrr is using NVIDIA-Linux-x86_64-381.22.run as of June 2017.) Install the driver -- if it says there is already a driver installed (e.g., maybe from a past failed CUDA installation attempt or something), and asks to overwrite the old driver, allow it to overwrite! Otherwise, CUDA installation and everything following it should be the same as written below.

Detour over; back to CUDA installation. When asked if you want to install samples, say yes and put them in /opt/cuda_samples.

CUDA should now be installed. Next up is CUDNN -- need to download that from Nvidia developer program or just get from Agnew/Calculon/etc. We are currently using version 5.1 on all machines (even Lrrr, with the weird driver) as of June 2017.

Unzip/untar/whatever the CUDNN files e.g. cudnn-8.0-linux-x64-v5.1.tar. Should yield a cuda directory with lib64 and include subdirectories. Copy the files in each of those to the corresponding /usr/local/cuda subdirectories (will require sudo), e.g. sudo cp lib64/* /usr/local/cuda/lib64/ (assuming you are in the cuda directory already).

Now we need to put CUDA in the path and set up its environment variable(s) -- should just need to add the following to each user's .bashrc file (and exit shell / re-enter shell to take effect):

export PATH="/usr/local/cuda/bin:$PATH"
export set CUDA_ROOT=/usr/local/cuda

Also, it seems you need to enter sudo ldconfig /usr/local/cuda/lib64 at some point after installing all this stuff -- we think this has to do with making the system aware of the shared libraries? Seems like we need to enter it periodically but it's not clear when -- maybe after each restart???

Installing Theano and Keras, and apparently Git which isn't installed by default???

We are currently (June 2017) using old-ish versions of Keras and Theano for compatibility reasons. Use the commands below to install the right versions. Note the full path to pip is necessary even if you have Python 3 in your path, because it isn't in the super-user's path by default.

sudo /opt/anaconda3/bin/pip install theano==0.9.0
sudo /opt/anaconda3/bin/pip install keras==1.2
sudo apt-get install git

You'll need to create a .keras folder in your home directory and put the following file, named keras.json, inside it. Or just copy the .keras folder from another computer.

{
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano",
    "image_dim_ordering": "tf"
}

Critical things that are changed from the defaults are backend (default tensorflow, needs to be theano) and image_dim_ordering (default th, should be tf)

UPDATE: The tf and th stuff is outdated (for Keras 1) and might not have even been right in the first place? Anyway, the image_dim_ordering parameter equivalent in Keras 2 is now called image_data_format and the value should be set to channels_first (at least, that is what we're all using -- in theory I [MRJ] think it is OK to use either as long as you know what you're doing and are consistent in how you set stuff up, but channels_first is probably safest for compatibility with the rest of the lab's work).

You'll also need to make a .theanorc file in your home directory (or copy it from another system). That file should contain the following:

[global]
floatX = float32
device = gpu0

[lib]
cnmem = 1

[dnn]
enabled = True

(Note that if you have your GPUs configured in a different order, you may want to change gpu0 to gpu1, or even to cpu if you want to live life in the slow lane.)

At this point, you will hopefully be ready to try a sample analysis. If you feel up to it, maybe grab some Keras examples and try to run them! Best way, since we are using old code, is to steal an old Keras testing installation from Agnew/Calculon/Lrrr/Ndnd (e.g., the 'testing' folder in Matt's home directory). Go into the keras directory, then the examples directory, and try something like python mnist_mlp.py and hopefully it will run on the chosen GPU!

Mount Farnsworth

Only needs to be done once per boot. Will only unmount if we do so explicitly or if Agnew/Calculon/Lrrr/Ndnd gets rebooted (or if their network connection dies).

Install cifs-utils package: sudo apt-get install cifs-utils

Then need to make a place for the share to live: sudo mkdir /mnt/eeg_data_analysis (or whatever the share is named).

Finally, to mount it up:

sudo mount -t cifs -o ro,username=matt //farnsworth/eeg_data_analysis /mnt/eeg_data_analysis/ (replace username with your Farnsworth username)

Note: MRJ edited the command above to make it read-only, which might be a good idea for Farnsworth general-purpose mounting. Probably better to keep data local with local permissions for now, just in case someone accidentally wants to wipe out a whole Farnsworth share... and make writing back to Farnsworth more of a special-occasion kind of thing.

Cross-mounting the other deep learning machines

First install some packages for serving and mounting NFS shares, respectively:

sudo apt-get install nfs-kernel-server portmap nfs-common
sudo apt-get install nfs-client nfs-common

Then, to allow other systems to see the current system's home folder, edit /etc/exports (e.g. sudo nano /etc/exports ) and add lines such as:

/home agnew.local(ro)
/home calculon.local(ro)
/home lrrr.local(ro)
/home ndnd.local(ro)

(Leave out the line corresponding to whichever system you're actually on -- don't need to share with yourself!)

Then run sudo exportfs -r to make the changes live. At this point you're set to mount everything up the usual way:

sudo mkdir /mnt/agnew
sudo mkdir /mnt/calculon
sudo mkdir /mnt/lrrr
sudo mkdir /mnt/ndnd

sudo mount -t nfs -o ro agnew.local:/home /mnt/agnew/
sudo mount -t nfs -o ro calculon.local:/home /mnt/calculon/
sudo mount -t nfs -o ro lrrr.local:/home /mnt/lrrr/
sudo mount -t nfs -o ro ndnd.local:/home /mnt/ndnd/

(Again, leave off the list whatever system you're actually on.)

Note these commands, like Farnsworth above, are defaulting to only mounting read-only... that's probably best for now.

Start VNC session

Enter in Terminal:

ssh yourusername@agnew/calculon.local
vncserver :yourvncnumber -geometry  (whatever, e.g.) 1280x800

Enter in VNC:

agnew/calculon.local :yourvncnumber

End VNC session

vncserver -kill :yourvncnumber

MATLAB setup

Not really deep learning per se, but relates to the deep-learning machines.

Currently we have R2016a installed on Agnew. Others coming soon.

When using the GUI (i.e. not -nodesktop option), Matlab will crash on startup. This is due to the following issue:

https://www.mathworks.com/support/bugreports/1297894

The solution is as specified on that page:

Summary

MATLAB crashes during startup on Ubuntu 15.04 and newer, as well as distributions derived from those versions

Description

When using Ubuntu Linux distributions 15.04 and newer, as well as distributions derived from those versions, MATLAB can crash during startup.

This crash occurs because these releases include a newer version of libstdc++.so.6 than the version shipped with MATLAB (version 6.0.17). When MATLAB loads version 6.0.17 first, the OS reaches an incompatibility that causes MATLAB to crash.

Workaround

You can force MATLAB to load the newer version of the library provided by the operating system, by following these instructions:

Identify the location where MATLAB is installed
Navigate to the sys/os/glnxa64 directory within this installation folder
Rename libstdc++.so.6 library to libstdc++.so.6.old