Matrices are a fundamental building block in many computational applications, particularly in the realm of scientific computing (e.g. CFD – Computational Fluid Dynamics) and machine learning. The architecture of a matrix class must be carefully crafted to:
- Efficiently perform matrix functions.
- Scale without needing to significantly refactor or redesign algorithms.
In this post we’re going to create a matrix class that can perform:
In later posts, we will use this matrix class to create a simple neural network that will learn how to perform the XOR function, and then we’ll cover how we can use the GPU to perform our matrix operations in parallel.
Data Storage
So how are we going to represent the actual matrix? For now, we will only be working with M x N matrices (i.e. a 2 dimensional array), where M is the number of rows, and N is the number of columns. We could use a raw array, an std::vector
, or even more complex data types.
A one dimensional array (or vector) is often the most practical data structure to use in a matrix class. If using a one dimensional array to represent a two dimensional matrix, the index can be calculated using:
index = row * number_of_columns + column
While a raw one dimensional array would be a compact and efficient representation of a matrix, it lacks flexibility, automatic memory management and safety features. Considering this, we’ll use an std::vector
for this first example, as it provides a good balance between performance and ease of use. Moreover, when working with OpenCL, we need to allocate memory on the device, and an std::vector
can easily be converted to raw pointers when needed.
Okay, less yappin’, more tappin’ (on the keyboard, that is!)…
Implementation
For this first attempt, we’re going to shove everything into a header file (just to keep it simple). In this case, the file will be called matrix.h
:
class Matrix
{
}
Let’s add a vector
of vector
of double
to store our 2D matrix. We’ll also store the number of rows and columns whenever we instantiate a new Matrix
:
class Matrix
{
private:
std::vector<std::vector<double>> data;
size_t rows, cols;
}
And now the constructor to initialize the Matrix
class when passing in a vector
or vector
of double
(2D vector
):
class Matrix
{
private:
std::vector<std::vector<double>> data;
size_t rows, cols;
public:
// Constructor
Matrix(const std::vector<std::vector<double>> &values)
: data(values), rows(values.size()), cols(values[0].size()) {}
// Getters
size_t getRows() const { return rows; }
size_t getCols() const { return cols; }
}
Let’s now add a couple of methods to output the Matrix:
#include <iomanip>
#include <iostream>
#include <vector>
class Matrix
{
private:
std::vector<std::vector<double>> data;
size_t rows, cols;
public:
// Constructor
Matrix(const std::vector<std::vector<double>> &values)
: data(values), rows(values.size()), cols(values[0].size()) {}
// Getters
size_t getRows() const { return rows; }
size_t getCols() const { return cols; }
// Print Matrix
void print() const {
for (const auto &row : data) {
for (double val : row)
std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
std::cout << std::endl;
}
}
// Overload print method to print text above the matrix
void print(const std::string &text) const {
std::cout << text << std::endl;
this->print();
std::cout << std::endl;
}
}
Now we’re ready to implement whatever matrix functions we want/need in our Matrix
class.
Matrix Addition
From the above wiki links:
Consider Matrix A and Matrix B:
A = [ 2 3 ]
[ 1 4 ]
B = [ 5 2 ]
[ 0 1 ]
To add these two matrices:
A + B = [ (2 + 5) (3 + 2) ]
[ (1 + 0) (4 + 1) ]
The result:
A + B = [ 7 5 ]
[ 1 5 ]
Let’s implement this in our Matrix
class by overloading the “+” operator:
// Matrix Addition
Matrix operator+(const Matrix &other) const {
if (rows != other.rows || cols != other.cols)
throw std::invalid_argument("Matrix dimensions must match for addition.");
std::vector<std::vector<double>> result(rows, std::vector<double>(cols));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < cols; ++j)
result[i][j] = data[i][j] + other.data[i][j];
return Matrix(result);
}
Matrix Multiplication
Again, from the wiki links above:
Consider Matrix A and Matrix B:
A = [ 1 2 ]
[ 3 4 ]
B = [ 5 6 ]
[ 7 8 ]
A * B = [ 1 ร 5 + 2 ร 7, 1 ร 6 + 2 ร 8 ]
[ 3 ร 5 + 4 ร 7, 3 ร 6 + 4 ร 8 ]
The result:
A * B = [ 19 22 ]
[ 43 50 ]
Implementation within out Matrix
class, this time overloading the “*” operator:
// Matrix Multiplication
Matrix operator*(const Matrix &other) const {
if (cols != other.rows)
throw std::invalid_argument("Invalid dimensions for matrix multiplication.");
std::vector<std::vector<double>> result(rows, std::vector<double>(other.cols, 0));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < other.cols; ++j)
for (size_t k = 0; k < cols; ++k)
result[i][j] += data[i][k] * other.data[k][j];
return Matrix(result);
}
Matrix Transposition
Finally, let’s “transpose” a matrix:
Consider a single matrix:
A = [ 1 2 3 ]
[ 4 5 6 ]
Transpose A:
Aแต = [ 1 4 ]
[ 2 5 ]
[ 3 6 ]
Pretty straight forward:
// Matrix Transpose
Matrix transpose() const {
std::vector<std::vector<double>> result(cols, std::vector<double>(rows));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < cols; ++j)
result[j][i] = data[i][j];
return Matrix(result);
}
One more thing to do – let’s turn this (now complete) example into a template class. The complete listing of what we now have (converted to a template class):
#include <iomanip>
#include <iostream>
#include <vector>
template <typename T>
class Matrix
{
private:
std::vector<std::vector<T>> data;
size_t rows, cols;
public:
// Constructor
Matrix(const std::vector<std::vector<T>> &values)
: data(values), rows(values.size()), cols(values[0].size()) {
}
// Getters
size_t getRows() const {
return rows;
}
size_t getCols() const {
return cols;
}
// Print Matrix
void print() const {
for (const auto &row : data) {
for (T val : row)
std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
std::cout << std::endl;
}
}
// Overload print method so text can also be printed before the Matrix is printed
void print(const std::string &text) const {
std::cout << text << std::endl;
this->print();
std::cout << std::endl;
}
// ---------- The Matrix operations ---------------------------------------------------------
// Matrix Addition
Matrix operator+(const Matrix &other) const {
if (rows != other.rows || cols != other.cols)
throw std::invalid_argument("Matrix dimensions must match for addition.");
std::vector<std::vector<T>> result(rows, std::vector<T>(cols));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < cols; ++j)
result[i][j] = data[i][j] + other.data[i][j];
return Matrix(result);
}
// Matrix Multiplication
Matrix operator*(const Matrix &other) const {
if (cols != other.rows)
throw std::invalid_argument("Invalid dimensions for matrix multiplication.");
std::vector<std::vector<T>> result(rows, std::vector<T>(other.cols, 0));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < other.cols; ++j)
for (size_t k = 0; k < cols; ++k) {
result[i][j] += data[i][k] * other.data[k][j];
}
return Matrix(result);
}
// Transpose
Matrix transpose() const {
std::vector<std::vector<T>> result(cols, std::vector<T>(rows));
for (size_t i = 0; i < rows; ++i)
for (size_t j = 0; j < cols; ++j)
result[j][i] = data[i][j];
return Matrix(result);
}
};
Test Harness(es)
It’s test harness time! We could write a single test harness that tests all three matrix operations we have implemented, or we could write a separate test harness for each operation. For this example, we’ll go for the latter as it offers us more flexibility (i.e. with matrix dimensions).
You should also test the validity of the results produced by our Matrix class with an online matrix calculator, such as: https://matrixcalc.org/
Matrix Addition Test Harness
Create a file called matrix-addition.cpp
with the following:
#include "matrix.h"
int main() {
Matrix<double> A({{3, 8, 2, 1},
{5, 2, 9, 4},
{8, 2, 6, 11}});
Matrix<double> B({{13, 9, 7, 13},
{8, 7, 4, 6},
{6, 4, 0, 3}});
A.print("Matrix A:");
B.print("Matrix B:");
Matrix<double> C = A + B;
C.print("Matrix C = A + B:");
return 0;
}
To compile and run the above matrix addition test harness:
$ g++ matrix-addition.cpp
$ ./a.out
Matrix A:
3.0000 8.0000 2.0000 1.0000
5.0000 2.0000 9.0000 4.0000
8.0000 2.0000 6.0000 11.0000
Matrix B:
13.0000 9.0000 7.0000 13.0000
8.0000 7.0000 4.0000 6.0000
6.0000 4.0000 0.0000 3.0000
Matrix C = A + B:
16.0000 17.0000 9.0000 14.0000
13.0000 9.0000 13.0000 10.0000
14.0000 6.0000 6.0000 14.0000
Matrix Multiplication Test Harness
Similar to the above matrix addition test harness, except this time we create a file called matrix-multiplication.cpp
with the following:
#include "matrix.h"
int main() {
Matrix<double> A({{3, 4, 2}});
Matrix<double> B({{13, 9, 7, 13},
{8, 7, 4, 6},
{6, 4, 0, 3}});
A.print("Matrix A:");
B.print("Matrix B:");
Matrix<double> C = A * B;
C.print("Matrix C = A * B:");
return 0;
}
Again, compile and test (not forgetting to test the results with an online matrix calculator):
$ g++ matrix-multiplication.cpp
$ ./a.out
Matrix A:
3.0000 4.0000 2.0000
Matrix B:
13.0000 9.0000 7.0000 13.0000
8.0000 7.0000 4.0000 6.0000
6.0000 4.0000 0.0000 3.0000
Matrix C = A * B:
83.0000 63.0000 37.0000 69.0000
Matrix Transpose Test Harness
Last one – create a file called matrix-transpose.cpp
with the following:
#include "matrix.h"
int main() {
Matrix<double> A({{1, 2, 3, 4},
{5, 6, 7, 8},
{9, 10, 11, 12}});
A.print("Matrix A:");
Matrix<double> B = A.transpose();
B.print("Matrix A after transpose():");
Matrix<double> C = B.transpose();
C.print("After transpose() again:");
return 0;
}
Compile and run the matrix transpose test harness:
$ g++ matrix-transpose.cpp
$ ./a.out
Matrix A:
1.0000 2.0000 3.0000 4.0000
5.0000 6.0000 7.0000 8.0000
9.0000 10.0000 11.0000 12.0000
Matrix A after transpose():
1.0000 5.0000 9.0000
2.0000 6.0000 10.0000
3.0000 7.0000 11.0000
4.0000 8.0000 12.0000
After transpose() again:
1.0000 2.0000 3.0000 4.0000
5.0000 6.0000 7.0000 8.0000
9.0000 10.0000 11.0000 12.0000
Conclusion
This is a very simple example of a matrix class. It is by no means a complete/exhaustive implementation, but it should be enough to create neural networks. It’s simplicity will also serve as a clear example of how we can can use “compute” on GPUs to execute matrix operations in a highly parallel manner.
The next post in this series will use our Matrix class to implement a neural network. For now, that’s it! As a wise synthetic once said:
“Big things have small beginnings” ๐