I’ve managed to create a couple of libraries that can achieve the same speed as carefully optimized vectorized C code, whilst being concise and easy to use.

Read more