Matrices are a fundamental building block in many computational applications, particularly in the realm of scientific computing (e.g. CFD – Computational Fluid Dynamics) and machine learning. The architecture of a matrix class must be carefully crafted to:

Efficiently perform matrix functions.
Scale without needing to significantly refactor or redesign algorithms.

In this post we’re going to create a matrix class that can perform:

In later posts, we will use this matrix class to create a simple neural network that will learn how to perform the XOR function, and then we’ll cover how we can use the GPU to perform our matrix operations in parallel.

Data Storage

So how are we going to represent the actual matrix? For now, we will only be working with M x N matrices (i.e. a 2 dimensional array), where M is the number of rows, and N is the number of columns. We could use a raw array, an std::vector, or even more complex data types.

A one dimensional array (or vector) is often the most practical data structure to use in a matrix class. If using a one dimensional array to represent a two dimensional matrix, the index can be calculated using:

index = row * number_of_columns + column

index = row * number_of_columns + column

While a raw one dimensional array would be a compact and efficient representation of a matrix, it lacks flexibility, automatic memory management and safety features. Considering this, we’ll use an std::vector for this first example, as it provides a good balance between performance and ease of use. Moreover, when working with OpenCL, we need to allocate memory on the device, and an std::vector can easily be converted to raw pointers when needed.

Okay, less yappin’, more tappin’ (on the keyboard, that is!)…

Implementation

For this first attempt, we’re going to shove everything into a header file (just to keep it simple). In this case, the file will be called matrix.h:

class Matrix
{
}

class Matrix
{
}

Let’s add a vector of vector of double to store our 2D matrix. We’ll also store the number of rows and columns whenever we instantiate a new Matrix:

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;
}

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;
}

And now the constructor to initialize the Matrix class when passing in a vector or vector of double (2D vector):

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;

  public:
  
  // Constructor
  Matrix(const std::vector<std::vector<double>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {}

  // Getters
  size_t getRows() const { return rows; }
  size_t getCols() const { return cols; }
}

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;

  public:
  
  // Constructor
  Matrix(const std::vector<std::vector<double>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {}

  // Getters
  size_t getRows() const { return rows; }
  size_t getCols() const { return cols; }
}

Let’s now add a couple of methods to output the Matrix:

#include <iomanip>
#include <iostream>
#include <vector>

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;

  public:
  
  // Constructor
  Matrix(const std::vector<std::vector<double>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {}

  // Getters
  size_t getRows() const { return rows; }
  size_t getCols() const { return cols; }

  // Print Matrix
  void print() const {
    for (const auto &row : data) {
      for (double val : row)
        std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
      std::cout << std::endl;
    }
  }

  // Overload print method to print text above the matrix
  void print(const std::string &text) const {
    std::cout << text << std::endl;
    this->print();
    std::cout << std::endl;
  }
}

#include <iomanip>
#include <iostream>
#include <vector>

class Matrix
{
  private:
  
  std::vector<std::vector<double>> data;
  size_t rows, cols;

  public:
  
  // Constructor
  Matrix(const std::vector<std::vector<double>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {}

  // Getters
  size_t getRows() const { return rows; }
  size_t getCols() const { return cols; }

  // Print Matrix
  void print() const {
    for (const auto &row : data) {
      for (double val : row)
        std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
      std::cout << std::endl;
    }
  }

  // Overload print method to print text above the matrix
  void print(const std::string &text) const {
    std::cout << text << std::endl;
    this->print();
    std::cout << std::endl;
  }
}

Now we’re ready to implement whatever matrix functions we want/need in our Matrix class.

Matrix Addition

From the above wiki links:

Consider Matrix A and Matrix B:

A =  [ 2  3 ]
     [ 1  4 ]

B =  [ 5  2 ]
     [ 0  1 ]
     
To add these two matrices:

A + B = [ (2 + 5)  (3 + 2) ]
        [ (1 + 0)  (4 + 1) ]
        
The result:

A + B = [ 7  5 ]
        [ 1  5 ]

Consider Matrix A and Matrix B:

A =  [ 2  3 ]
     [ 1  4 ]

B =  [ 5  2 ]
     [ 0  1 ]
     
To add these two matrices:

A + B = [ (2 + 5)  (3 + 2) ]
        [ (1 + 0)  (4 + 1) ]
        
The result:

A + B = [ 7  5 ]
        [ 1  5 ]

Let’s implement this in our Matrix class by overloading the “+” operator:

// Matrix Addition
Matrix operator+(const Matrix &other) const {
  if (rows != other.rows || cols != other.cols)
    throw std::invalid_argument("Matrix dimensions must match for addition.");

  std::vector<std::vector<double>> result(rows, std::vector<double>(cols));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < cols; ++j)
      result[i][j] = data[i][j] + other.data[i][j];

  return Matrix(result);
}

// Matrix Addition
Matrix operator+(const Matrix &other) const {
  if (rows != other.rows || cols != other.cols)
    throw std::invalid_argument("Matrix dimensions must match for addition.");

  std::vector<std::vector<double>> result(rows, std::vector<double>(cols));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < cols; ++j)
      result[i][j] = data[i][j] + other.data[i][j];

  return Matrix(result);
}

Matrix Multiplication

Again, from the wiki links above:

Consider Matrix A and Matrix B:

A = [ 1   2 ]
    [ 3   4 ]

B = [ 5   6 ]
    [ 7   8 ]
    
A * B = [ 1 × 5 + 2 × 7, 1 × 6 + 2 × 8 ]
        [ 3 × 5 + 4 × 7, 3 × 6 + 4 × 8 ]

The result:

A * B = [ 19   22 ]
        [ 43   50 ]

Consider Matrix A and Matrix B:

A = [ 1   2 ]
    [ 3   4 ]

B = [ 5   6 ]
    [ 7   8 ]
    
A * B = [ 1 × 5 + 2 × 7, 1 × 6 + 2 × 8 ]
        [ 3 × 5 + 4 × 7, 3 × 6 + 4 × 8 ]

The result:

A * B = [ 19   22 ]
        [ 43   50 ]

Implementation within out Matrix class, this time overloading the “*” operator:

// Matrix Multiplication
Matrix operator*(const Matrix &other) const {
  if (cols != other.rows)
    throw std::invalid_argument("Invalid dimensions for matrix multiplication.");

  std::vector<std::vector<double>> result(rows, std::vector<double>(other.cols, 0));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < other.cols; ++j)
      for (size_t k = 0; k < cols; ++k)
        result[i][j] += data[i][k] * other.data[k][j];

  return Matrix(result);
}

// Matrix Multiplication
Matrix operator*(const Matrix &other) const {
  if (cols != other.rows)
    throw std::invalid_argument("Invalid dimensions for matrix multiplication.");

  std::vector<std::vector<double>> result(rows, std::vector<double>(other.cols, 0));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < other.cols; ++j)
      for (size_t k = 0; k < cols; ++k)
        result[i][j] += data[i][k] * other.data[k][j];

  return Matrix(result);
}

Matrix Transposition

Finally, let’s “transpose” a matrix:

Consider a single matrix:

A = [ 1  2  3 ]
    [ 4  5  6 ]

Transpose A:

Aᵗ = [ 1  4 ]
     [ 2  5 ]
     [ 3  6 ]

Consider a single matrix:

A = [ 1  2  3 ]
    [ 4  5  6 ]

Transpose A:

Aᵗ = [ 1  4 ]
     [ 2  5 ]
     [ 3  6 ]

Pretty straight forward:

// Matrix Transpose
Matrix transpose() const {
  std::vector<std::vector<double>> result(cols, std::vector<double>(rows));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < cols; ++j)
      result[j][i] = data[i][j];

  return Matrix(result);
}

// Matrix Transpose
Matrix transpose() const {
  std::vector<std::vector<double>> result(cols, std::vector<double>(rows));
  for (size_t i = 0; i < rows; ++i)
    for (size_t j = 0; j < cols; ++j)
      result[j][i] = data[i][j];

  return Matrix(result);
}

One more thing to do – let’s turn this (now complete) example into a template class. The complete listing of what we now have (converted to a template class):

#include <iomanip>
#include <iostream>
#include <vector>

template <typename T>
class Matrix
{
  private:

  std::vector<std::vector<T>> data;
  size_t rows, cols;

  public:

  // Constructor
  Matrix(const std::vector<std::vector<T>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {
  }

  // Getters
  size_t getRows() const {
    return rows;
  }
  size_t getCols() const {
    return cols;
  }

  // Print Matrix
  void print() const {
    for (const auto &row : data) {
      for (T val : row)
        std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
      std::cout << std::endl;
    }
  }

  // Overload print method so text can also be printed before the Matrix is printed
  void print(const std::string &text) const {
    std::cout << text << std::endl;
    this->print();
    std::cout << std::endl;
  }

  // ---------- The Matrix operations ---------------------------------------------------------

  // Matrix Addition
  Matrix operator+(const Matrix &other) const {
    if (rows != other.rows || cols != other.cols)
      throw std::invalid_argument("Matrix dimensions must match for addition.");

    std::vector<std::vector<T>> result(rows, std::vector<T>(cols));
    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < cols; ++j)
        result[i][j] = data[i][j] + other.data[i][j];

    return Matrix(result);
  }

  // Matrix Multiplication
  Matrix operator*(const Matrix &other) const {
    if (cols != other.rows)
      throw std::invalid_argument("Invalid dimensions for matrix multiplication.");

    std::vector<std::vector<T>> result(rows, std::vector<T>(other.cols, 0));

    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < other.cols; ++j)
        for (size_t k = 0; k < cols; ++k) {
          result[i][j] += data[i][k] * other.data[k][j];
        }

    return Matrix(result);
  }

  // Transpose
  Matrix transpose() const {
    std::vector<std::vector<T>> result(cols, std::vector<T>(rows));
    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < cols; ++j)
        result[j][i] = data[i][j];

    return Matrix(result);
  }
};

#include <iomanip>
#include <iostream>
#include <vector>

template <typename T>
class Matrix
{
  private:

  std::vector<std::vector<T>> data;
  size_t rows, cols;

  public:

  // Constructor
  Matrix(const std::vector<std::vector<T>> &values)
      : data(values), rows(values.size()), cols(values[0].size()) {
  }

  // Getters
  size_t getRows() const {
    return rows;
  }
  size_t getCols() const {
    return cols;
  }

  // Print Matrix
  void print() const {
    for (const auto &row : data) {
      for (T val : row)
        std::cout << std::fixed << std::setprecision(4) << std::setw(10) << val << " ";
      std::cout << std::endl;
    }
  }

  // Overload print method so text can also be printed before the Matrix is printed
  void print(const std::string &text) const {
    std::cout << text << std::endl;
    this->print();
    std::cout << std::endl;
  }

  // ---------- The Matrix operations ---------------------------------------------------------

  // Matrix Addition
  Matrix operator+(const Matrix &other) const {
    if (rows != other.rows || cols != other.cols)
      throw std::invalid_argument("Matrix dimensions must match for addition.");

    std::vector<std::vector<T>> result(rows, std::vector<T>(cols));
    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < cols; ++j)
        result[i][j] = data[i][j] + other.data[i][j];

    return Matrix(result);
  }

  // Matrix Multiplication
  Matrix operator*(const Matrix &other) const {
    if (cols != other.rows)
      throw std::invalid_argument("Invalid dimensions for matrix multiplication.");

    std::vector<std::vector<T>> result(rows, std::vector<T>(other.cols, 0));

    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < other.cols; ++j)
        for (size_t k = 0; k < cols; ++k) {
          result[i][j] += data[i][k] * other.data[k][j];
        }

    return Matrix(result);
  }

  // Transpose
  Matrix transpose() const {
    std::vector<std::vector<T>> result(cols, std::vector<T>(rows));
    for (size_t i = 0; i < rows; ++i)
      for (size_t j = 0; j < cols; ++j)
        result[j][i] = data[i][j];

    return Matrix(result);
  }
};

Test Harness(es)

It’s test harness time! We could write a single test harness that tests all three matrix operations we have implemented, or we could write a separate test harness for each operation. For this example, we’ll go for the latter as it offers us more flexibility (i.e. with matrix dimensions).

You should also test the validity of the results produced by our Matrix class with an online matrix calculator, such as: https://matrixcalc.org/

Matrix Addition Test Harness

Create a file called matrix-addition.cpp with the following:

#include "matrix.h"

int main() {

  Matrix<double> A({{3, 8, 2, 1},
                    {5, 2, 9, 4},
                    {8, 2, 6, 11}});

  Matrix<double> B({{13, 9, 7, 13},
                    {8, 7, 4, 6},
                    {6, 4, 0, 3}});

  A.print("Matrix A:");
  B.print("Matrix B:");

  Matrix<double> C = A + B;
  C.print("Matrix C = A + B:");

  return 0;
}

#include "matrix.h"

int main() {

  Matrix<double> A({{3, 8, 2, 1},
                    {5, 2, 9, 4},
                    {8, 2, 6, 11}});

  Matrix<double> B({{13, 9, 7, 13},
                    {8, 7, 4, 6},
                    {6, 4, 0, 3}});

  A.print("Matrix A:");
  B.print("Matrix B:");

  Matrix<double> C = A + B;
  C.print("Matrix C = A + B:");

  return 0;
}

To compile and run the above matrix addition test harness:

$ g++ matrix-addition.cpp 
$ ./a.out 

Matrix A:
    3.0000     8.0000     2.0000     1.0000 
    5.0000     2.0000     9.0000     4.0000 
    8.0000     2.0000     6.0000    11.0000 

Matrix B:
   13.0000     9.0000     7.0000    13.0000 
    8.0000     7.0000     4.0000     6.0000 
    6.0000     4.0000     0.0000     3.0000 

Matrix C = A + B:
   16.0000    17.0000     9.0000    14.0000 
   13.0000     9.0000    13.0000    10.0000 
   14.0000     6.0000     6.0000    14.0000

$ g++ matrix-addition.cpp 
$ ./a.out 

Matrix A:
    3.0000     8.0000     2.0000     1.0000 
    5.0000     2.0000     9.0000     4.0000 
    8.0000     2.0000     6.0000    11.0000 

Matrix B:
   13.0000     9.0000     7.0000    13.0000 
    8.0000     7.0000     4.0000     6.0000 
    6.0000     4.0000     0.0000     3.0000 

Matrix C = A + B:
   16.0000    17.0000     9.0000    14.0000 
   13.0000     9.0000    13.0000    10.0000 
   14.0000     6.0000     6.0000    14.0000

Matrix Multiplication Test Harness

Similar to the above matrix addition test harness, except this time we create a file called matrix-multiplication.cpp with the following:

#include "matrix.h"

int main() {

  Matrix<double> A({{3, 4, 2}});

  Matrix<double> B({{13, 9, 7, 13},
                    {8, 7, 4, 6},
                    {6, 4, 0, 3}});

  A.print("Matrix A:");
  B.print("Matrix B:");

  Matrix<double> C = A * B;
  C.print("Matrix C = A * B:");

  return 0;
}

#include "matrix.h"

int main() {

  Matrix<double> A({{3, 4, 2}});

  Matrix<double> B({{13, 9, 7, 13},
                    {8, 7, 4, 6},
                    {6, 4, 0, 3}});

  A.print("Matrix A:");
  B.print("Matrix B:");

  Matrix<double> C = A * B;
  C.print("Matrix C = A * B:");

  return 0;
}

Again, compile and test (not forgetting to test the results with an online matrix calculator):

$ g++ matrix-multiplication.cpp 
$ ./a.out 

Matrix A:
    3.0000     4.0000     2.0000 

Matrix B:
   13.0000     9.0000     7.0000    13.0000 
    8.0000     7.0000     4.0000     6.0000 
    6.0000     4.0000     0.0000     3.0000 

Matrix C = A * B:
   83.0000    63.0000    37.0000    69.0000

$ g++ matrix-multiplication.cpp 
$ ./a.out 

Matrix A:
    3.0000     4.0000     2.0000 

Matrix B:
   13.0000     9.0000     7.0000    13.0000 
    8.0000     7.0000     4.0000     6.0000 
    6.0000     4.0000     0.0000     3.0000 

Matrix C = A * B:
   83.0000    63.0000    37.0000    69.0000

Matrix Transpose Test Harness

Last one – create a file called matrix-transpose.cpp with the following:

#include "matrix.h"

int main() {

  Matrix<double> A({{1, 2, 3, 4},
                    {5, 6, 7, 8},
                    {9, 10, 11, 12}});

  A.print("Matrix A:");
  Matrix<double> B = A.transpose();
  B.print("Matrix A after transpose():");

  Matrix<double> C = B.transpose();
  C.print("After transpose() again:");

  return 0;
}

#include "matrix.h"

int main() {

  Matrix<double> A({{1, 2, 3, 4},
                    {5, 6, 7, 8},
                    {9, 10, 11, 12}});

  A.print("Matrix A:");
  Matrix<double> B = A.transpose();
  B.print("Matrix A after transpose():");

  Matrix<double> C = B.transpose();
  C.print("After transpose() again:");

  return 0;
}

Compile and run the matrix transpose test harness:

$ g++ matrix-transpose.cpp 
$ ./a.out 

Matrix A:
    1.0000     2.0000     3.0000     4.0000 
    5.0000     6.0000     7.0000     8.0000 
    9.0000    10.0000    11.0000    12.0000 

Matrix A after transpose():
    1.0000     5.0000     9.0000 
    2.0000     6.0000    10.0000 
    3.0000     7.0000    11.0000 
    4.0000     8.0000    12.0000 

After transpose() again:
    1.0000     2.0000     3.0000     4.0000 
    5.0000     6.0000     7.0000     8.0000 
    9.0000    10.0000    11.0000    12.0000

$ g++ matrix-transpose.cpp 
$ ./a.out 

Matrix A:
    1.0000     2.0000     3.0000     4.0000 
    5.0000     6.0000     7.0000     8.0000 
    9.0000    10.0000    11.0000    12.0000 

Matrix A after transpose():
    1.0000     5.0000     9.0000 
    2.0000     6.0000    10.0000 
    3.0000     7.0000    11.0000 
    4.0000     8.0000    12.0000 

After transpose() again:
    1.0000     2.0000     3.0000     4.0000 
    5.0000     6.0000     7.0000     8.0000 
    9.0000    10.0000    11.0000    12.0000

Conclusion

This is a very simple example of a matrix class. It is by no means a complete/exhaustive implementation, but it should be enough to create neural networks. It’s simplicity will also serve as a clear example of how we can can use “compute” on GPUs to execute matrix operations in a highly parallel manner.

The next post in this series will use our Matrix class to implement a neural network. For now, that’s it! As a wise synthetic once said:

“Big things have small beginnings” 🙂

Random Linux Thoughts

Simple C++ Matrix Class