Deploy Tiny-Llama on AWS EC2. Discover ways to deploy an actual ML… | by Marcello Politi

Machine Learning

Deploy Tiny-Llama on AWS EC2. Discover ways to deploy an actual ML… | by Marcello Politi | Jan, 2024

hhhhm

2024年1月12日

Deploy Tiny-Llama on AWS EC2. Discover ways to deploy an actual ML… | by Marcello Politi | Jan, 2024

[ad_1]

Tiny-Llama emblem (src: https://github.com/jzhang38/TinyLlama)

Discover ways to deploy an actual ML utility utilizing AWS and FastAPI

Introduction

I’ve all the time thought that even one of the best undertaking on the planet doesn’t have a lot worth if folks can’t use it. That’s the reason it is rather necessary to learn to deploy Machine Studying fashions. On this article we deal with deploying a small giant language mannequin, Tiny-Llama, on an AWS occasion known as EC2.

Listing of instruments I’ve used for this undertaking:

Deepnote: is a cloud-based pocket book that’s nice for collaborative knowledge science tasks, good for prototyping
FastAPI: an internet framework for constructing APIs with Python
AWS EC2: is an internet service that gives sizable compute capability within the cloud
Nginx: is an HTTP and reverse proxy server. I exploit it to attach the FastAPI server to AWS
GitHub: GitHub is a internet hosting service for software program tasks
HuggingFace: is a platform to host and collaborate on limitless fashions, datasets, and purposes.

About Tiny Llama

TinyLlama-1.1B is a undertaking aiming to pretrain a 1.1B Llama on 3 trillion tokens. It makes use of the identical structure as Llama2 .

As we speak’s giant language fashions have spectacular capabilities however are extraordinarily costly when it comes to {hardware}. In lots of areas we have now restricted {hardware}: assume smartphones or satellites. So there may be quite a lot of analysis on creating smaller fashions to allow them to be deployed on edge.

Here’s a record of “small” fashions which are catching on:

Cellular VLM (Multimodal)
Phi-2
Obsidian (Multimodal)

[ad_2]