Home Machine Learning LLMs for Everybody: Working the LLaMA-13B mannequin and LangChain in Google Colab | by Dmitrii Eliuseev | Jan, 2024

LLMs for Everybody: Working the LLaMA-13B mannequin and LangChain in Google Colab | by Dmitrii Eliuseev | Jan, 2024

0
LLMs for Everybody: Working the LLaMA-13B mannequin and LangChain in Google Colab | by Dmitrii Eliuseev | Jan, 2024

[ad_1]

Experimenting with Massive Language Fashions free of charge (Half 2)

Picture by Glib Albovsky, Unsplash

Within the first half of the story, we used a free Google Colab occasion to run a Mistral-7B mannequin and extract data utilizing the FAISS (Fb AI Similarity Search) database. On this half, we are going to go additional, and I’ll present learn how to run a LLaMA 2 13B mannequin; we may even take a look at some additional LangChain performance like making chat-based purposes and utilizing brokers. In the identical means, as within the first half, all used parts are primarily based on open-source tasks and can work fully free of charge.

Let’s get into it!

LLaMA.cpp

A LLaMA.CPP is a really attention-grabbing open-source challenge, initially designed to run an LLaMA mannequin on Macbooks, however its performance grew far past that. First, it’s written in plain C/C++ with out exterior dependencies and may run on any {hardware} (CUDA, OpenCL, and Apple silicon are supported; it may possibly even work on a Raspberry Pi). Second, LLaMA.CPP could be linked with LangChain, which permits us to check plenty of its performance free of charge with out having an OpenAI key. Final however not least, as a result of LLaMA.CPP works in every single place, it is a good candidate to run in a free Google Colab occasion. As a reminder, Google supplies free entry to Python notebooks with 12 GB of RAM and 16 GB of VRAM, which could be opened utilizing the Colab Analysis web page. The code is opened within the internet browser and runs within the cloud, so everyone can entry it, even from a minimalistic finances PC.

Earlier than utilizing LLaMA, let’s set up the library. The set up itself is simple; we solely must allow LLAMA_CUBLAS earlier than utilizing pip:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 set up llama-cpp-python
!pip3 set up huggingface-hub
!pip3 set up sentence-transformers langchain langchain-experimental
!huggingface-cli obtain TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir /content material --local-dir-use-symlinks False

For the primary take a look at, I shall be utilizing a 7B mannequin. Right here, I additionally put in a huggingface-hub library, which permits us to robotically obtain a “Llama-2–7b-Chat” mannequin within the GGUF format wanted for LLaMA.CPP. I additionally put in a LangChain

[ad_2]