Design an Simple-to-Use Deep Studying Framework | by Haifeng Jin

Machine Learning

Design an Simple-to-Use Deep Studying Framework | by Haifeng Jin | Apr, 2024

hhhhm

2024年4月10日

Design an Simple-to-Use Deep Studying Framework | by Haifeng Jin | Apr, 2024

[ad_1]

The three software program design rules I discovered as an open-source contributor

Deep studying frameworks are extraordinarily transitory. Should you examine the deep studying frameworks individuals use as we speak with what it was eight years in the past, you’ll find the panorama is totally completely different. There have been Theano, Caffe2, and MXNet, which all went out of date. Right now’s hottest frameworks, like TensorFlow and PyTorch, have been simply launched to the general public.

By means of all these years, Keras has survived as a high-level user-facing library supporting completely different backends, together with TensorFlow, PyTorch, and JAX. As a contributor to Keras, I discovered how a lot the staff cares about consumer expertise for the software program and the way they ensured a great consumer expertise by following just a few easy but highly effective rules of their design course of.

On this article, I’ll share the three most necessary software program design rules I discovered by contributing to the Keras by the previous few years, which can be generalizable to all forms of software program and make it easier to make an impression within the open-source group with yours.

Why consumer expertise is necessary for open-source software program

Earlier than we dive into the primary content material, let’s rapidly talk about why consumer expertise is so necessary. We are able to study this by the PyTorch vs. TensorFlow case.

They have been developed by two tech giants, Meta and Google, and have fairly completely different cultural strengths. Meta is nice at product, whereas Google is nice at engineering. In consequence, Google’s frameworks like TensorFlow and JAX are the quickest to run and technically superior to PyTorch, as they assist sparse tensors and distributed coaching properly. Nonetheless, PyTorch nonetheless took away half of the market share from TensorFlow as a result of it prioritizes consumer expertise over different elements of the software program.

Higher consumer expertise wins for the analysis scientists who construct the fashions and propagate them to the engineers, who take fashions from them since they don’t at all times wish to convert the fashions they obtain from the analysis scientists to a different framework. They are going to construct new software program round PyTorch to easy their workflow, which can set up a software program ecosystem round PyTorch.

TensorFlow additionally made just a few blunders that brought about its customers to lose. TensorFlow’s common consumer expertise is nice. Nonetheless, its set up information for GPU assist was damaged for years earlier than it was fastened in 2022. TensorFlow 2 broke the backward compatibility, which price its customers tens of millions of {dollars} emigrate.

So, the lesson we discovered right here is that regardless of technical superiority, consumer expertise decides which software program the open-source customers would select.

All deep studying frameworks make investments closely in consumer expertise

All of the deep studying frameworks—TensorFlow, PyTorch, and JAX—make investments closely in consumer expertise. Good proof is that all of them have a comparatively excessive Python share of their codebases.

All of the core logic of deep studying frameworks, together with tensor operations, computerized differentiation, compilation, and distribution are carried out in C++. Why would they wish to expose a set of Python APIs to the customers? It’s simply because the customers love Python they usually wish to polish their consumer expertise.

Investing in consumer expertise is of excessive ROI

Think about how a lot engineering effort it requires to make your deep studying framework just a little bit sooner than others. Loads.

Nonetheless, for a greater consumer expertise, so long as you comply with a sure design course of and a few rules, you possibly can obtain it. For attracting extra customers, your consumer expertise is as necessary because the computing effectivity of your framework. So, investing in consumer expertise is of excessive return on funding (ROI).

The three rules

I’ll share the three necessary software program design rules I discovered by contributing to Keras, every with good and unhealthy code examples from completely different frameworks.

Precept 1: Design end-to-end workflows

Once we consider designing the APIs of a chunk of software program, chances are you’ll seem like this.

class Mannequin:
def __call__(self, enter):
"""The ahead name of the mannequin.Args:
enter: A tensor. The enter to the mannequin.
"""
cross

Outline the category and add the documentation. Now, we all know all the category names, technique names, and arguments. Nonetheless, this is able to not assist us perceive a lot concerning the consumer expertise.

What we should always do is one thing like this.

enter = keras.Enter(form=(10,))
x = layers.Dense(32, activation='relu')(enter)
output = layers.Dense(10, activation='softmax')(x)
mannequin = keras.fashions.Mannequin(inputs=enter, outputs=output)
mannequin.compile(
optimizer='adam', loss='categorical_crossentropy'
)

We wish to write out the whole consumer workflow of utilizing the software program. Ideally, it ought to be a tutorial on tips on how to use the software program. It offers far more details about the consumer expertise. It might assist us spot many extra UX issues in the course of the design part in contrast with simply writing out the category and strategies.

Let’s take a look at one other instance. That is how I found a consumer expertise drawback by following this precept when implementing KerasTuner.

When utilizing KerasTuner, customers can use this RandomSearch class to pick out the perfect mannequin. We now have the metrics, and goals within the arguments. By default, goal equals validation loss. So, it helps us discover the mannequin with the smallest validation loss.

class RandomSearch:
def __init__(self, ..., metrics, goal="val_loss", ...):
"""The initializer.Args:
metrics: A listing of Keras metrics.
goal: String or a customized metric operate. The
identify of the metirc we wish to decrease.
"""
cross

Once more, it doesn’t present a lot details about the consumer expertise. So, every part appears to be like OK for now.

Nonetheless, if we write an end-to-end workflow like the next. It exposes many extra issues. The consumer is attempting to outline a customized metric operate named custom_metric. The target is just not so simple to make use of anymore. What ought to we cross to the target argument now?

tuner = RandomSearch(
...,
metrics=[custom_metric],
goal="val_???",
)

It ought to be simply "val_custom_metric”. Simply use the prefix of "val_" and the identify of the metric operate. It’s not intuitive sufficient. We wish to make it higher as an alternative of forcing the consumer to study this. We simply noticed a consumer expertise drawback by penning this workflow.

Should you wrote the design extra comprehensively by together with the implementation of the custom_metric operate, you’ll find you even must discover ways to write a Keras customized metric. It’s a must to comply with the operate signature to make it work, as proven within the following code snippet.

def custom_metric(y_true, y_pred):
squared_diff = ops.sq.(y_true - y_pred)
return ops.imply(squared_diff, axis=-1)

After discovering this drawback. We specifically designed a greater workflow for customized metrics. You solely must override HyperModel.match() to compute your customized metric and return it. No strings to call the target. No operate signature to comply with. Only a return worth. The consumer expertise is a lot better proper now.

class MyHyperModel(HyperModel):
def match(self, trial, mannequin, validation_data):
x_val, y_true = validation_data
y_pred = mannequin(x_val)
return custom_metric(y_true, y_pred)tuner = RandomSearch(MyHyperModel(), max_trials=20)

Another factor to recollect is we should always at all times begin from the consumer expertise. The designed workflows backpropagate to the implementation.

Precept 2: Reduce cognitive load

Don’t power the consumer to study something until it’s actually essential. Let’s see some good examples.

The Keras modeling API is an effective instance proven within the following code snippet. The mannequin builders have already got these ideas in thoughts, for instance, a mannequin is a stack of layers. It wants a loss operate. We are able to match it with knowledge or make it predict on knowledge.

mannequin = keras.Sequential([
layers.Dense(10, activation="relu"),
layers.Dense(num_classes, activation="softmax"),
])
mannequin.compile(loss='categorical_crossentropy')
mannequin.match(...)
mannequin.predict(...)

So mainly, no new ideas have been discovered to make use of Keras.

One other good instance is the PyTorch modeling. The code is executed similar to Python code. All tensors are simply actual tensors with actual values. You may depend upon the worth of a tensor to determine your path with plain Python code.

class MyModel(nn.Module):
def ahead(self, x):
if x.sum() > 0:
return self.path_a(x)
return self.path_b(x)

You may as well do that with Keras with TensorFlow or JAX backend however must be written in a different way. All of the if situations must be written with this ops.cond operate as proven within the following code snippet.

class MyModel(keras.Mannequin):
def name(self, inputs):
return ops.cond(
ops.sum(inputs) > 0,
lambda : self.path_a(inputs),
lambda : self.path_b(inputs),
)

That is instructing the consumer to study a brand new op as an alternative of utilizing the if-else clause they’re accustomed to, which is unhealthy. In compensation, it brings important enchancment in coaching pace.

Right here is the catch of the flexibleness of PyTorch. Should you ever wanted to optimize the reminiscence and pace of your mannequin, you would need to do it by your self utilizing the next APIs and new ideas to take action, together with the inplace arguments for the ops, the parallel op APIs, and express gadget placement. It introduces a reasonably excessive studying curve for the customers.

torch.relu(x, inplace=True)
x = torch._foreach_add(x, y)
torch._foreach_add_(x, y)
x = x.cuda()

Another good examples are keras.ops, tensorflow.numpy, jax.numpy. They’re only a reimplementation of the numpy API. When introducing some cognitive load, simply reuse what individuals already know. Each framework has to supply some low-level ops in these frameworks. As an alternative of letting individuals study a brand new set of APIs, which can have 100 capabilities, they simply use the preferred present API for it. The numpy APIs are well-documented and have tons of Stack Overflow questions and solutions associated to it.

The worst factor you are able to do with consumer expertise is to trick the customers. Trick the consumer to imagine your API is one thing they’re accustomed to however it isn’t. I’ll give two examples. One is on PyTorch. The opposite one is on TensorFlow.

What ought to we cross because the pad argument in F.pad() operate if you wish to pad the enter tensor of the form (100, 3, 32, 32) to (100, 3, 1+32+1, 2+32+2) or (100, 3, 34, 36)?

import torch.nn.practical as F
# pad the 32x32 photos to (1+32+1)x(2+32+2)
# (100, 3, 32, 32) to (100, 3, 34, 36)
out = F.pad(
torch.empty(100, 3, 32, 32),
pad=???,
)

My first instinct is that it ought to be ((0, 0), (0, 0), (1, 1), (2, 2)), the place every sub-tuple corresponds to one of many 4 dimensions, and the 2 numbers are the padding dimension earlier than and after the present values. My guess is originated from the numpy API.

Nonetheless, the right reply is (2, 2, 1, 1). There isn’t any sub-tuple, however one plain tuple. Furthermore, the size are reversed. The final dimension goes the primary.

The next is a foul instance from TensorFlow. Are you able to guess what’s the output of the next code snippet?

worth = True@tf.operate
def get_value():
return worth
worth = False
print(get_value())

With out the tf.operate decorator, the output ought to be False, which is fairly easy. Nonetheless, with the decorator, the output is True. It is because TensorFlow compiles the operate and any Python variable is compiled into a brand new fixed. Altering the outdated variable’s worth wouldn’t have an effect on the created fixed.

It methods the consumer into believing it’s the Python code they’re accustomed to, however truly, it isn’t.

Precept 3: Interplay over documentation

Nobody likes to learn lengthy documentation if they will determine it out simply by working some instance code and tweaking it by themselves. So, we attempt to make the consumer workflow of the software program comply with the identical logic.

Right here is an effective instance proven within the following code snippet. In PyTorch, all strategies with the underscore are inplace ops, whereas those with out usually are not. From an interactive perspective, these are good, as a result of they’re simple to comply with, and the customers don’t must examine the docs at any time when they need the inplace model of a way. Nonetheless, after all, they launched some cognitive load. The customers must know what does inplace means and when to make use of them.

x = x.add(y)
x.add_(y)
x = x.mul(y)
x.mul_(y)

One other good instance is the Keras layers. They strictly comply with the identical naming conference as proven within the following code snippet. With a transparent naming conference, the customers can simply keep in mind the layer names with out checking the documentation.

from keras import layerslayers.MaxPooling2D()
layers.GlobalMaxPooling1D()
layers.GlobalAveragePooling3D()

One other necessary a part of the interplay between the consumer and the software program is the error message. You can’t count on the consumer to jot down every part appropriately the very first time. We should always at all times do the required checks within the code and attempt to print useful error messages.

Let’s see the next two examples proven within the code snippet. The primary one has not a lot data. It simply says tensor form mismatch. The
second one incorporates far more helpful data for the consumer to search out the bug. It not solely tells you the error is due to tensor form mismatch, nevertheless it additionally reveals what’s the anticipated form and what’s the unsuitable form it acquired. Should you didn’t imply to cross that form, you’ve got a greater concept
of the bug now.

# Unhealthy instance:
elevate ValueError("Tensor form mismatch.")# Good instance:
elevate ValueError(
"Tensor form mismatch. "
"Anticipated: (batch, num_features). "
f"Acquired: {x.form}"
)

The perfect error message could be instantly pointing the consumer to the repair. The next code snippet reveals a common Python error message. It guessed what was unsuitable with the code and instantly pointed the consumer to the repair.

import mathmath.sqr(4)
"AttributeError: module 'math' has no attribute 'sqr'. Did you imply: 'sqrt'?"

Ultimate phrases

To date we now have launched the three most beneficial software program design rules I’ve discovered when contributing to the deep studying frameworks. First, write end-to-end workflows to find extra consumer expertise issues. Second, cut back cognitive load and don’t educate the consumer something until essential. Third, comply with the identical logic in your API design and throw significant error messages in order that the customers can study your software program by interacting with it as an alternative of regularly checking the documentation.

Nonetheless, there are numerous extra rules to comply with if you wish to make your software program even higher. You may consult with the Keras API design tips as a whole API design information.

[ad_2]