Let's say we have a system of equations,
and we want to solve for
The first step is to transform the system of equations into a matrix by using the coefficients in front of each variable, where each row corresponds to another equation and each column corresponds to an independent variable like
$$ \left[ \begin{array}{ccc} 2 & 3 & 4\ 1 & 2 & 3\ 3 & -4 & 0 \end{array} \right] \left[ \begin{array}{c} x \ y \ z \end{array} \right]
\left[ \begin{array}{c} 6 \ 4 \ 10 \end{array} \right] $$
Or more simply:
At first, translating the set of equations into a matrix like this doesn't seem to help with anything, so let's think of this in another way.
Instead of the complicated mess of equations shown above, imagine if the system looked like this:
Then we could just solve for
This matrix form has a particular name: Row Echelon Form. Basically, any matrix can be considered in row echelon form if the leading coefficient or pivot (the first non-zero element in every row when reading from left to right) is right of the pivot of the row above it.
This creates a matrix that sometimes resembles an upper-triangular matrix; however, that doesn't mean that all row-echelon matrices are upper-triangular. For example, all of the following matrices are in row echelon form:
The first two of these have the right dimensions to find a solution to a system of equations; however, the last two matrices are respectively under- and over-constrained, meaning they do not provide an appropriate solution to a system of equations.
That said, this doesn't mean that every matrix in the correct form can be solved either.
For example, if you translate the second matrix into a system of equations again, the last row translates into
Row echelon form is nice, but wouldn't it be even better if our system of equations looked simply like this:
Then we would know exactly what
This introduces yet another matrix configuration: * Reduced Row Echelon Form*. A matrix is in reduced row echelon form if it satisfies the following conditions:
- It is in row echelon form.
- Every pivot is 1 and is the only nonzero entry in its column.
All the following examples are in the reduced row echelon form:
Again, only the first of these (the one that looks like an identity matrix) is desirable in the context of solving a system of equations, but transforming any matrix in this form gives us an immediate and definitive answer at the question: can I solve my system of equations?
Beyond solving a system of equations, reshaping a matrix in this form makes it very easy to deduce other properties of the matrix, such as its rank — the maximum number of linearly independent columns. In reduced row echelon form, the rank is simply the number of pivots.
For now, I hope the motivation is clear: we want to convert a matrix into row echelon and then reduced row echelon form to make large systems of equations trivial to solve, so we need some method to do that. In general, the term Gaussian Elimination refers to the process of transforming a matrix into row echelon form, and the process of transforming a row echelon matrix into reduced row echelon form is called Gauss-Jordan Elimination. That said, the notation here is sometimes inconsistent. Several authors use the term Gaussian Elimination to include Gauss-Jordan elimination as well. In addition, the process of Gauss-Jordan elimination is sometimes called Back-substitution, which is also confusing because the term can also be used to mean solving a system of equations from row echelon form, without simplifying to reduced row echelon form. For this reason, we will be using the following definitions in this chapter:
- Gaussian Elimination: The process of transforming a matrix into row echelon form
- Gauss-Jordan Elimination: The process of transforming a row echelon matrix into reduced row echelon form
- Back-substitution: The process of directly solving a row echelon matrix, without transforming into reduced row echelon form
Gaussian elimination is inherently analytical and can be done by hand for small systems of equations; however, for large systems, this (of course) become tedious and we will need to find an appropriate numerical solution. For this reason, I have split this section into two parts. One will cover the analytical framework, and the other will cover an algorithm you can write in your favorite programming language.
In the end, reducing large systems of equations boils down to a game you play on a seemingly random matrix with 3 possible moves. You can:
- Swap any two rows.
- Multiply any row by a non-zero scale value.
- Add any row to a multiple of any other row.
That's it. Before continuing, I suggest you try to recreate the row echelon matrix we made above. That is, do the following:
There are plenty of different strategies you could use to do this, and no one strategy is better than the rest. One method is to subtract a multiple of the top row from subsequent rows below it such that all values beneath the pivot value are zero. This process might be easier if you swap some rows around first and can be performed for each pivot.
After you get a row echelon matrix, the next step is to find the reduced row echelon form. In other words, we do the following:
Here, the idea is similar to above and the same rules apply. In this case, we might start from the right-most column and subtracts upwards instead of downwards.
The analytical method for Gaussian Elimination may seem straightforward, but the computational method does not obviously follow from the "game" we were playing before.
Ultimately, the computational method boils down to two separate steps and has a complexity of
As a note, this process iterates through all the rows in the provided matrix.
When we say "current row" (curr_row
), we mean the specific row iteration number we are on at that time, and as before, the "pivot" corresponds to the first non-zero element in that row.
For each element in the pivot column under the current row, find the highest value and switch the row with the highest value with the current row. The pivot is then considered to be the first element in the highest swapped row.
For example, in this case the highest value is
After finding this value, we simply switch the row with the
In this case, the new pivot is
In code, this process might look like this:
{% method %} {% sample lang="jl" %} import:12-24, lang:"julia" {% sample lang="java" %} import:14-30, lang:"java" {% sample lang="c" %} import:5-13, lang:"c" import:19-34, lang:"c" {% sample lang="cpp" %} import:13-23, lang:"cpp" {% sample lang="hs" %} import:10-17, lang:"haskell" import:44-46, lang:"haskell" {% sample lang="js" %} import:7-23, lang:"javascript" {% sample lang="go" %} import:15-32, lang:"go" {% sample lang="py" %} import:13-19, lang:"python" {% sample lang="rs" %} import:43-76, lang:"rust" {% endmethod %}
As a note, if the highest value is
For the row beneath the current pivot row and within the pivot column, find a fraction that corresponds to the ratio of the value in that column to the pivot, itself. After this, subtract the current pivot row multiplied by the fraction from each corresponding row element. This process essentially subtracts an optimal multiple of the current row from each row underneath (similar to Step 3 from the above game). Ideally, this should always create a 0 under the current row's pivot value.
For example, in this matrix, the next row is
After finding the fraction, we simply subtract
After this, repeat the process for all other rows.
Here is what it might look like in code: {% method %} {% sample lang="jl" %} import:26-38, lang:"julia" {% sample lang="java" %} import:32-40, lang:"java" {% sample lang="c" %} import:36-41, lang:"c" {% sample lang="cpp" %} import:25-32, lang:"cpp" {% sample lang="hs" %} import:19-33, lang:"haskell" import:42-42, lang:"haskell" {% sample lang="js" %} import:25-30, lang:"javascript" {% sample lang="go" %} import:38-49, lang:"go" {% sample lang="py" %} import:21-26, lang:"python" {% sample lang="rs" %} import:62-75, lang:"rust" {% endmethod %}
When we put everything together, it looks like this:
{% method %} {% sample lang="jl" %} import:1-45, lang:"julia" {% sample lang="c" %} import:15-48, lang:"c" {% sample lang="cpp" %} import:8-34, lang:"cpp" {% sample lang="hs" %} import:10-36, lang:"haskell" {% sample lang="py" %} import:3-28, lang:"python" {% sample lang="java" %} import:5-47, lang:"java" {% sample lang="js" %} import:1-38, lang:"javascript" {% sample lang="go" %} import:9-53, lang:"go" {% sample lang="rs" %} import:41-77, lang:"rust" {% endmethod %}
To be clear: if the matrix is found to be singular during this process, the system of equations is either over- or under-determined and no general solution exists. For this reason, many implementations of this method will stop the moment the matrix is found to have no unique solutions. In this implementation, we allowed for the more general case and opted to simply output when the matrix is singular instead. If you intend to solve a system of equations, then it makes sense to stop the method the moment you know there is no unique solution, so some small modification of this code might be necessary!
So what do we do from here? Well, we continue reducing the matrix; however, there are two ways to do this:
- Reduce the matrix further into reduced row echelon form with Gauss-Jordan elimination
- Solve the system directly with back-substitution if the matrix allows for such solutions
Let's start with Gauss-Jordan Elimination and then back-substitution
Gauss-Jordan Elimination is precisely what we said above; however, in this case, we often work from the bottom-up instead of the top-down. We basically need to find the pivot of every row and set that value to 1 by dividing the entire row by the pivot value. Afterwards, we subtract upwards until all values above the pivot are 0 before moving on to the next column from right to left (instead of left to right, like before). Here it is in code:
{% method %} {% sample lang="jl" %} import:67-93, lang:"julia" {% sample lang="c" %} import:64-82, lang:"c" {% sample lang="cpp" %} import:36-54, lang:"cpp" {% sample lang="hs" %} import:38-46, lang:"haskell" {% sample lang="py" %} import:31-49, lang:"python" {% sample lang="java" %} import:49-70, lang:"java" {% sample lang="js" %} import:57-76, lang:"javascript" {% sample lang="go" %} import:55-82, lang:"go" {% sample lang="rs" %} import:79-96, lang:"rust" {% endmethod %}
As a note: Gauss-Jordan elimination can also be used to find the inverse of a matrix by following the same procedure to generate a reduced row echelon matrix, but with an identity matrix on the other side instead of the right-hand side of each equation. This process is straightforward but will not be covered here, simply because there are much faster numerical methods to find an inverse matrix; however, if you would like to see this, let me know and I can add it in for completeness.
The idea of back-substitution is straightforward: we create a matrix of solutions and iteratively solve for each variable by plugging in all variables before it. For example, if our matrix looks like this:
We can quickly solve
{% method %} {% sample lang="jl" %} import:47-64, lang:"julia" {% sample lang="c" %} import:50-62, lang:"c" {% sample lang="cpp" %} import:56-72, lang:"cpp" {% sample lang="rs" %} import:98-112, lang:"rust" {% sample lang="hs" %} import:48-53, lang:"haskell" {% sample lang="py" %} import:52-64, lang:"python" {% sample lang="java" %} import:72-87, lang:"java" {% sample lang="js" %} import:40-55, lang:"javascript" {% sample lang="go" %} import:84-98, lang:"go" {% endmethod %}
We have thus far used Gaussian elimination as a method to solve a system of equations; however, there is often a much easier way to find a similar solution simply by plotting each row in our matrix.
For the case of 2 equations and 2 unknowns, we would plot the two lines corresponding to each equation and the
What, then, is the point of Gaussian elimination if we can simply plot our set of equations to find a solution? Well, this analogy breaks down quickly when we start moving beyond 3D, so it is obvious we need some method to deal with higher-dimensional systems. That said, it is particularly interesting to see what happens as we plot our matrix during Gaussian elimination for the 3D case.
As we can see in the above visualization, the planes wobble about in 3D until they reach row echelon form, where one plane is parallel to the
This visualization might have been obvious for some readers, but I found it particularly enlightening at first. By performing Gaussian elimination, we are manipulating our planes such that they can be interpreted at a glance -- which is precisely the same thing we are doing with the matrix interpretation!
And with that, we have two possible ways to reduce our system of equations and find a solution. If we are sure our matrix is not singular and that a solution exists, it's fastest to use back-substitution to find our solution. If no solution exists or we are trying to find a reduced row echelon matrix, then Gauss-Jordan elimination is best. As we said at the start, the notation for Gaussian Elimination is rather ambiguous in the literature, so we are hoping that the definitions provided here are clear and consistent enough to cover all the bases.
As for what's next... Well, we are in for a treat!
The above algorithm clearly has 3 for
loops and has a complexity of
There are also plenty of other solvers that do similar things that we will get to in due time.
Here's a video describing Gaussian elimination:
{% method %} {% sample lang="jl" %} import, lang:"julia" {% sample lang="c" %} import, lang:"c" {% sample lang="cpp" %} import, lang:"cpp" {% sample lang="rs" %} import, lang:"rust" {% sample lang="hs" %} import, lang:"haskell" {% sample lang="py" %} import, lang:"python" {% sample lang="java" %} import, lang:"java" {% sample lang="js" %} import, lang:"javascript" {% sample lang="go" %} import, lang:"go" {% endmethod %}
<script> MathJax.Hub.Queue(["Typeset",MathJax.Hub]); </script>The code examples are licensed under the MIT license (found in LICENSE.md).
The text of this chapter was written by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The animation "GEvis" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
After initial licensing (#560), the following pull requests have modified the text or graphics of this chapter:
- none