Mannequin Evaluations Versus Process Evaluations | by Aparna Dhinakaran

Machine Learning

Mannequin Evaluations Versus Process Evaluations | by Aparna Dhinakaran | Mar, 2024

hhhhm

2024年3月27日

Mannequin Evaluations Versus Process Evaluations | by Aparna Dhinakaran | Mar, 2024

[ad_1]

Picture created by creator utilizing Dall-E 3

Understanding the distinction for LLM purposes

For a second, think about an airplane. What springs to thoughts? Now think about a Boeing 737 and a V-22 Osprey. Each are plane designed to maneuver cargo and other people, but they serve completely different functions — another common (industrial flights and freight), the opposite very particular (infiltration, exfiltration, and resupply missions for particular operations forces). They give the impression of being far completely different as a result of they’re constructed for various actions.

With the rise of LLMs, we now have seen our first really general-purpose ML fashions. Their generality helps us in so some ways:

The identical engineering workforce can now do sentiment evaluation and structured information extraction
Practitioners in lots of domains can share information, making it attainable for the entire trade to learn from one another’s expertise
There’s a variety of industries and jobs the place the identical expertise is beneficial

However as we see with plane, generality requires a really completely different evaluation from excelling at a specific activity, and on the finish of the day enterprise worth usually comes from fixing specific issues.

It is a good analogy for the distinction between mannequin and activity evaluations. Mannequin evals are targeted on total common evaluation, however activity evals are targeted on assessing efficiency of a specific activity.