Home Machine Learning Apple M2 Max GPU vs Nvidia V100 (Half 2): Massive Fashions and Power Effectivity | by Fabrice Daniel | Feb, 2024

Apple M2 Max GPU vs Nvidia V100 (Half 2): Massive Fashions and Power Effectivity | by Fabrice Daniel | Feb, 2024

0
Apple M2 Max GPU vs Nvidia V100 (Half 2): Massive Fashions and Power Effectivity | by Fabrice Daniel | Feb, 2024

[ad_1]

Examine Apple Silicon M2 Max GPU performances and power effectivity to Nvidia V100 for coaching CNN large fashions with TensorFlow

In my earlier article, I in contrast M2 Max GPU with Nvidia V100, P100, and T4 on MLP, CNN, and LSTM coaching. The outcomes present that M2 Max can carry out very properly, exceeding Nvidia GPUs on small mannequin coaching. However as acknowledged within the article:

[…] these metrics can solely be thought-about for related neural community sorts and depths as used on this take a look at.

So this second half exams larger fashions, specializing in CNN solely and evaluating M2 Max with essentially the most highly effective GPU beforehand examined: the Nvidia V100.

One other level thought-about on this take a look at is reminiscence administration. Whereas the Nvidia GPU is dropping a whole lot of time in reminiscence switch, the M2 Max GPU has direct entry to the unified reminiscence, so it doesn’t require any delay earlier than coaching the mannequin. Since, because the outcomes proven within the earlier article, this makes a giant distinction for small fashions skilled on a small variety of epochs, we take away this impact for larger fashions to match the pure coaching time solely.

For this objective, we prepare fashions on ten epochs, however as a substitute of utilizing the entire coaching time, we seize and common the step’s coaching length from the second epoch to the final one. This removes the initialization and reminiscence switch overhead, which can also be partially mirrored within the first epoch.

And the final, however these days most important level, is the power consumed by the GPUs to coach a giant mannequin. As we’ll present right here, that is the place M2 Max is an actual recreation changer.

On this article, you will see the next exams:

  • Coaching 4 customized CNN starting from 122,570 to 1,649,482 parameters on CIFAR-10¹ with batch dimension starting from 32 to 1024
  • Coaching ResNet50 mannequin on CIFAR-10 with batch dimension starting from 32 to 1024

Then, within the two instances, I’ll examine:

  • the uncooked coaching performances (epoch length in milliseconds)
  • the power consumption per epoch
  • the power effectivity ratio between the 2 GPUs

[ad_2]