However, using the GPU also has a downside: Metal only thinks in terms of textures and images.

\Using images makes a lot of sense for convolutional networks but you’ll still have to convert the output from your network into something you can use from Swift. And what about neural nets that work on non-image data such as audio or text?

Forge makes it much easier to convert your data to and from these textures.

Source: Forge: neural network toolkit for Metal