Search...

Saturday, July 2, 2016

Matrix Multiply On a Beast

The Beast - Scaling up

I spun up a beast instance on Azure

Standard DS5_v2 (16 Cores, 56 GB memory)

First I changed the executable to run five iterations - from 1,000 x 1,000 size to 5,000 x 5,000 size. Then I changed the data type from float to double. So that added to the memory load as the double takes 8 bytes not 4 and then to the processing time because doubles take longer to multiply. So how fast did the matrix multiply run on the beast?

Test Run

----------------------------------------
Score:

1000: RowOptimized * RowOptimized = NonOptimized - 121.67
1000: RowOptimized * ColumnOptimized = NonOptimized - 125.00
1000: ColumnOptimized * ColumnOptimized = NonOptimized - 131.67
1000: ColumnOptimized * RowOptimized = NonOptimized - 169.67
2000: ColumnOptimized * RowOptimized = NonOptimized - 843.67
2000: RowOptimized * ColumnOptimized = NonOptimized - 863.33
2000: ColumnOptimized * ColumnOptimized = NonOptimized - 892.33
2000: RowOptimized * RowOptimized = NonOptimized - 988.00
3000: ColumnOptimized * RowOptimized = NonOptimized - 3,295.67
3000: RowOptimized * ColumnOptimized = NonOptimized - 3,363.33
3000: RowOptimized * RowOptimized = NonOptimized - 3,565.00
3000: ColumnOptimized * ColumnOptimized = NonOptimized - 3,784.33
4000: RowOptimized * ColumnOptimized = NonOptimized - 10,433.33
4000: ColumnOptimized * RowOptimized = NonOptimized - 10,793.33
4000: RowOptimized * RowOptimized = NonOptimized - 11,515.33
4000: ColumnOptimized * ColumnOptimized = NonOptimized - 11,567.33

----------------------------------------

The data is pretty simple to read

1000 : RowOptimized * RowOptimized = NonOptimized - 121.67

  • 1000 is the size of the matrix - 1,000 by 1,000 in this case
1000 : RowOptimized * RowOptimized = NonOptimized - 121.67
  • RowOptimized * RowOptimized = NonOptimized
    • Indicates the matrix type
    • "Row Optimized" meaning that it is laid out in memory to optimize the access by row
    • "Column Optimized" is optimized for access by column
    • "Non Optimized" indicates that there are no optimizations if you call the Multiply method on this object. It is laid out in memory so that it would be considered row optimized but there are no compute optimizations in the code.
1000 : RowOptimized * RowOptimized = NonOptimized - 121.67
  • The final value is the number of milliseconds that it took to complete the multiplication.
Pretty amazing really. If you have a lot of horsepower you can multiply a lot of matrices. And you can multiply them with reasonable speed and efficiency.

Notice that when you get into very large matrices that the sheer volume of data that is being accessed causes things to slow down quite a bit.

A 1,000 by 1,000 matrix using doubles is 8,000,000 bytes. Change that to a 2,000 by 2,000 matrix and the size quadruples to 32,000,000 bytes.

Matrix Size Bytes
1,000 8,000,000
2,000 32,000,000
3,000 72,000,000
4,000 128,000,000
5,000 200,000,000

A 4,000 by 4,000 cell matrix takes up 128 megabytes of memory. I guess this explains the time it takes to multiply that beast. It's a good thing we have beasts to do the calculations on.

Next

Soon I may try to move this code to using what I hope will be the more powerful graphics hardware. I think it would be interesting to see what that could do for speed.