Learn how to Setup a Multi-GPU Linux Machine for Deep Studying in 2024 | by Nika

Machine Learning

Learn how to Setup a Multi-GPU Linux Machine for Deep Studying in 2024 | by Nika | Could, 2024

hhhhm

2024年5月20日

Learn how to Setup a Multi-GPU Linux Machine for Deep Studying in 2024 | by Nika | Could, 2024

[ad_1]

DEEP LEARNING WITH MULTIPLE GPUS

Tremendous-fast setup of CUDA and PyTorch in minutes!

Picture by Creator: Multi-GPU machine (cartoon)

As Deep Studying fashions (particularly LLMs) maintain getting greater, the necessity for extra GPU reminiscence (VRAM) is ever-increasing for creating them and utilizing them regionally. Constructing or acquiring a multi-GPU machine is simply the primary a part of the problem. Most libraries and purposes solely use a single GPU by default. Thus, the machine additionally must have acceptable drivers together with libraries that may leverage the multi-GPU setup.

This story offers a information on how you can arrange a multi-GPU (Nvidia) Linux machine with necessary libraries. It will hopefully prevent a while on experimentation and get you began in your improvement.

On the finish, hyperlinks are offered to fashionable open-source libraries that may leverage the multi-GPU setup for Deep Studying.

Goal

Arrange a Multi-GPU Linux system with essential libraries reminiscent of CUDA Toolkit and PyTorch to get began with Deep Studying 🤖. The identical steps additionally apply to a single GPU machine.

We’ll set up 1) CUDA Toolkit, 2) PyTorch and three) Miniconda to get began with Deep Studying utilizing frameworks reminiscent of exllamaV2 and torchtune.

©️ All of the libraries and knowledge talked about on this story are open-source and/or publicly out there.

Getting Began

Picture by Creator: Output of the nvidia-smi command on a Linux Machine with 8 Nvidia A10G GPUs

Test the variety of GPUs put in within the machine utilizing the nvidia-smi command within the terminal. It ought to print a listing of all of the put in GPUs. If there’s a discrepancy or if the command doesn’t work, first set up the Nvidia drivers on your model of Linux. Be certain that the nvidia-smi command prints a listing of all of the GPUs put in in your machine as proven above.

Observe this web page to put in Nvidia Drivers if not accomplished already:

Learn how to set up the NVIDIA drivers on Ubuntu 22.04 — Linux Tutorials — Be taught Linux Configuration– (Supply: linuxconfig.org)

Step-1 Set up CUDA-Toolkit

💡 Test for any current CUDA folder at usr/native/cuda-xx. Which means a model of CUDA is already put in. If you have already got the specified CUDA toolkit put in (verify with the nvcc command in your terminal) please skip to Step-2.

Test the CUDA model wanted on your desired PyTorch library: Begin Regionally | PyTorch (We’re putting in Set up CUDA 12.1)

Go to CUDA Toolkit 12.1 Downloads | NVIDIA Developer to acquire Linux instructions to put in CUDA 12.1 (select your OS model and the corresponding “deb (native)” installer sort).

Choices chosen for Ubuntu 22 (Supply: developer.nvidia.com)

The terminal instructions for the bottom installer will seem in line with your chosen choices. Copy-paste and run them in your Linux terminal to put in the CUDA toolkit. For instance, for x86_64 Ubuntu 22, run the next instructions by opening the terminal within the downloads folder:

wget https://developer.obtain.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /and so forth/apt/preferences.d/cuda-repository-pin-600
wget https://developer.obtain.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get replace
sudo apt-get -y set up cuda

⚠️Whereas putting in the CUDA toolkit, the installer might immediate a kernel replace. If any pop-up seems within the terminal to replace the kernel, press the esc button to cancel it. Don’t replace the kernel throughout this stage!— it could break your Nvidia drivers ☠️.

Restart the Linux machine after the set up. The nvcc command will nonetheless not work. You should add the CUDA set up to PATH. Open the .bashrc file utilizing the nano editor.

nano /house/$USER/.bashrc

Scroll to the underside of the .bashrc file and add these two strains:

 export PATH="/usr/native/cuda-12.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/native/cuda-12.1/lib64:$LD_LIBRARY_PATH"

💡 Notice you could change cuda-12.1 to your put in CUDA model, cuda-xx if wanted sooner or later , ‘xx’ being your CUDA model.

Save the modifications and shut the nano editor:

 To save lots of modifications - On you keyboard, press the next: ctrl + o             --> save 
enter or return key  --> settle for modifications
ctrl + x             --> shut editor

Shut and reopen the terminal. Now the nvcc--version command ought to print the put in CUDA model in your terminal.

Step-2 Set up Miniconda

Earlier than we set up PyTorch, it’s higher to put in Miniconda after which set up PyTorch inside a Conda atmosphere. It is also helpful to create a brand new Conda atmosphere for every challenge.

Open the terminal within the Downloads folder and run the next instructions:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh# provoke conda
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Shut and re-open the terminal. Now the conda command ought to work.

Step-3 Set up PyTorch

(Elective) — Create a brand new conda atmosphere on your challenge. You may exchange <environment-name> with the identify of your alternative. I often identify it after my challenge identify. 💡 You should use the conda activate <environment-name> and conda deactivate <environment-name> instructions earlier than and after working in your challenge.

conda create -n <environment-name> python=3.11# activate the atmosphere
conda activate <environment-name>

Set up the PyTorch library on your CUDA model. The next instructions are for cuda-12.1 which we put in:

pip3 set up torch torchvision torchaudio

The above command is obtained from PyTorch set up information — Begin Regionally | PyTorch .

After PyTorch set up, verify the variety of GPUs seen to PyTorch within the terminal.

python>> import torch
>> print(torch.cuda.device_count())
8

This could print the variety of GPUs put in within the system (8 in my case), and also needs to match the variety of listed GPUs within the nvidia-smi command.

Viola! you might be all set to start out working in your Deep Studying initiatives that leverage a number of GPUs 🥳.

1. 🤗 To get began, you may clone a preferred mannequin from Hugging Face:

2. 💬 For inference (utilizing LLM fashions), clone and set up exllamav2 in a separate atmosphere. This makes use of all of your GPUs for quicker inference: (Test my medium web page for an in depth tutorial)

3. 👨‍🏫 For fine-tuning or coaching, you may clone and set up torchtune. Observe the directions to both full finetune or lora finetune your fashions, leveraging all of your GPUs: (Test my medium web page for an in depth tutorial)

This information walks you thru the machine setup wanted for multi-GPU deep studying. Now you can begin engaged on any challenge that leverages a number of GPUs – like torchtune for quicker improvement!

Keep tuned for extra detailed tutorials on exllamaV2 and torchtune.

[ad_2]