[ad_1]
Introduction
I’ve all the time thought that even one of the best undertaking on the planet doesn’t have a lot worth if folks can’t use it. That’s the reason it is rather necessary to learn to deploy Machine Studying fashions. On this article we deal with deploying a small giant language mannequin, Tiny-Llama, on an AWS occasion known as EC2.
Listing of instruments I’ve used for this undertaking:
- Deepnote: is a cloud-based pocket book that’s nice for collaborative knowledge science tasks, good for prototyping
- FastAPI: an internet framework for constructing APIs with Python
- AWS EC2: is an internet service that gives sizable compute capability within the cloud
- Nginx: is an HTTP and reverse proxy server. I exploit it to attach the FastAPI server to AWS
- GitHub: GitHub is a internet hosting service for software program tasks
- HuggingFace: is a platform to host and collaborate on limitless fashions, datasets, and purposes.
About Tiny Llama
TinyLlama-1.1B is a undertaking aiming to pretrain a 1.1B Llama on 3 trillion tokens. It makes use of the identical structure as Llama2 .
As we speak’s giant language fashions have spectacular capabilities however are extraordinarily costly when it comes to {hardware}. In lots of areas we have now restricted {hardware}: assume smartphones or satellites. So there may be quite a lot of analysis on creating smaller fashions to allow them to be deployed on edge.
Here’s a record of “small” fashions which are catching on:
- Cellular VLM (Multimodal)
- Phi-2
- Obsidian (Multimodal)
[ad_2]