The Beast - Scaling up
I spun up a beast instance on AzureStandard DS5_v2 (16 Cores, 56 GB memory)
First I changed the executable to run five iterations - from 1,000 x 1,000 size to 5,000 x 5,000 size. Then I changed the data type from float to double. So that added to the memory load as the double takes 8 bytes not 4 and then to the processing time because doubles take longer to multiply. So how fast did the matrix multiply run on the beast?
Test Run
----------------------------------------Score:
1000: RowOptimized * RowOptimized = NonOptimized - 121.67
1000: RowOptimized * ColumnOptimized = NonOptimized - 125.00
1000: ColumnOptimized * ColumnOptimized = NonOptimized - 131.67
1000: ColumnOptimized * RowOptimized = NonOptimized - 169.67
2000: ColumnOptimized * RowOptimized = NonOptimized - 843.67
2000: RowOptimized * ColumnOptimized = NonOptimized - 863.33
2000: ColumnOptimized * ColumnOptimized = NonOptimized - 892.33
2000: RowOptimized * RowOptimized = NonOptimized - 988.00
3000: ColumnOptimized * RowOptimized = NonOptimized - 3,295.67
3000: RowOptimized * ColumnOptimized = NonOptimized - 3,363.33
3000: RowOptimized * RowOptimized = NonOptimized - 3,565.00
3000: ColumnOptimized * ColumnOptimized = NonOptimized - 3,784.33
4000: RowOptimized * ColumnOptimized = NonOptimized - 10,433.33
4000: ColumnOptimized * RowOptimized = NonOptimized - 10,793.33
4000: RowOptimized * RowOptimized = NonOptimized - 11,515.33
4000: ColumnOptimized * ColumnOptimized = NonOptimized - 11,567.33
----------------------------------------
The data is pretty simple to read
1000 : RowOptimized * RowOptimized = NonOptimized - 121.67
- 1000 is the size of the matrix - 1,000 by 1,000 in this case
1000 : RowOptimized * RowOptimized = NonOptimized - 121.67
- RowOptimized * RowOptimized = NonOptimized
- Indicates the matrix type
- "Row Optimized" meaning that it is laid out in memory to optimize the access by row
- "Column Optimized" is optimized for access by column
- "Non Optimized" indicates that there are no optimizations if you call the Multiply method on this object. It is laid out in memory so that it would be considered row optimized but there are no compute optimizations in the code.
- The final value is the number of milliseconds that it took to complete the multiplication.
Notice that when you get into very large matrices that the sheer volume of data that is being accessed causes things to slow down quite a bit.
A 1,000 by 1,000 matrix using doubles is 8,000,000 bytes. Change that to a 2,000 by 2,000 matrix and the size quadruples to 32,000,000 bytes.
| Matrix Size | Bytes | 
| 1,000 | 8,000,000 | 
| 2,000 | 32,000,000 | 
| 3,000 | 72,000,000 | 
| 4,000 | 128,000,000 | 
| 5,000 | 200,000,000 | 
A 4,000 by 4,000 cell matrix takes up 128 megabytes of memory. I guess this explains the time it takes to multiply that beast. It's a good thing we have beasts to do the calculations on.
