Interactive Linear Algebra

Dan Margalit, Joseph Rabinoff

Section 6.5 The Method of Least Squares ¶ permalink

  • Learn examples of best-fit problems.
  • Learn to turn a best-fit problem into a least-squares problem.
  • Recipe: find a least-squares solution (two ways).
  • Picture: geometry of a least-squares solution.
  • Vocabulary words: least-squares solution .

In this section, we answer the following important question:

Suppose that Ax = b does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution . We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Subsection 6.5.1 Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation Ax = b .

Let A be an m × n matrix and let b be a vector in R m . A least-squares solution of the matrix equation Ax = b is a vector K x in R n such that

for all other vectors x in R n .

Recall that dist ( v , w )= A v − w A is the distance between the vectors v and w . The term “least squares” comes from the fact that dist ( b , Ax )= A b − A K x A is the square root of the sum of the squares of the entries of the vector b − A K x . So a least-squares solution minimizes the sum of the squares of the differences between the entries of A K x and b . In other words, a least-squares solution solves the equation Ax = b as closely as possible, in the sense that the sum of the squares of the difference b − Ax is minimized.

Least Squares: Picture

Suppose that the equation Ax = b is inconsistent. Recall from this note in Section 2.3 that the column space of A is the set of all other vectors c such that Ax = c is consistent. In other words, Col ( A ) is the set of all vectors of the form Ax . Hence, the closest vector of the form Ax to b is the orthogonal projection of b onto Col ( A ) . This is denoted b Col ( A ) , following this notation in Section 6.3 .

A least-squares solution of Ax = b is a solution K x of the consistent equation Ax = b Col ( A )

If Ax = b is consistent, then b Col ( A ) = b , so that a least-squares solution is the same as a usual solution.

Where is K x in this picture? If v 1 , v 2 ,..., v n are the columns of A , then

Hence the entries of K x are the “coordinates” of b Col ( A ) with respect to the spanning set { v 1 , v 2 ,..., v m } of Col ( A ) . (They are honest B -coordinates if the columns of A are linearly independent.)

We learned to solve this kind of orthogonal projection problem in Section 6.3 .

Let A be an m × n matrix and let b be a vector in R m . The least-squares solutions of Ax = b are the solutions of the matrix equation

By this theorem in Section 6.3 , if K x is a solution of the matrix equation A T Ax = A T b , then A K x is equal to b Col ( A ) . We argued above that a least-squares solution of Ax = b is a solution of Ax = b Col ( A ) .

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a least-squares solution

Let A be an m × n matrix and let b be a vector in R n . Here is a method for computing a least-squares solution of Ax = b :

  • Compute the matrix A T A and the vector A T b .
  • Form the augmented matrix for the matrix equation A T Ax = A T b , and row reduce.
  • This equation is always consistent, and any solution K x is a least-squares solution.

To reiterate: once you have found a least-squares solution K x of Ax = b , then b Col ( A ) is equal to A K x .

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of A are linearly dependent, then Ax = b Col ( A ) has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of this corollary in Section 6.3 .

Let A be an m × n matrix and let b be a vector in R m . The following are equivalent:

  • Ax = b has a unique least-squares solution.
  • The columns of A are linearly independent.
  • A T A is invertible.

In this case, the least-squares solution is

The set of least-squares solutions of Ax = b is the solution set of the consistent equation A T Ax = A T b , which is a translate of the solution set of the homogeneous equation A T Ax = 0. Since A T A is a square matrix, the equivalence of 1 and 3 follows from the invertible matrix theorem in Section 5.1 . The set of least squares-solutions is also the solution set of the consistent equation Ax = b Col ( A ) , which has a unique solution if and only if the columns of A are linearly independent by this important note in Section 2.5 .

Example (Infinitely many least-squares solutions)

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if A is an m × n matrix with orthogonal columns u 1 , u 2 ,..., u m , then we can use the projection formula in Section 6.4 to write

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent .

Recipe 2: Compute a least-squares solution

Let A be an m × n matrix with orthogonal columns u 1 , u 2 ,..., u m , and let b be a vector in R n . Then the least-squares solution of Ax = b is the vector

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

Subsection 6.5.2 Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example (Best-fit line)

Suppose that we have measured three data points

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

The general equation for a (non-vertical) line is

If our three data points were to lie on this line, then the following equations would be satisfied:

In order to find the best-fit line, we try to solve the above equations in the unknowns M and B . As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve Ax = b for

We solved this least-squares problem in this example : the only least-squares solution to Ax = b is K x = A M B B = A − 3 5 B , so the best-fit line is

What exactly is the line y = f ( x )= − 3 x + 5 minimizing? The least-squares solution K x minimizes the sum of the squares of the entries of the vector b − A K x . The vector b is the left-hand side of (6.5.1) , and

In other words, A K x is the vector whose entries are the y -coordinates of the graph of the line at the values of x we specified in our data points, and b is the vector whose entries are the y -coordinates of those data points. The difference b − A K x is the vertical distance of the graph from the data points:

The best-fit line minimizes the sum of the squares of these vertical distances.

Interactive: Best-fit line

Example (best-fit parabola), example (best-fit linear function).

All of the above examples have the following form: some number of data points ( x , y ) are specified, and we want to find a function

that best approximates these points, where g 1 , g 2 ,..., g m are fixed functions of x . Indeed, in the best-fit line example we had g 1 ( x )= x and g 2 ( x )= 1; in the best-fit parabola example we had g 1 ( x )= x 2 , g 2 ( x )= x , and g 3 ( x )= 1; and in the best-fit linear function example we had g 1 ( x 1 , x 2 )= x 1 , g 2 ( x 1 , x 2 )= x 2 , and g 3 ( x 1 , x 2 )= 1 (in this example we take x to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns B 1 , B 2 ,..., B m —once we evaluate the g i , they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of y = f ( x ) to our original data points.

To emphasize that the nature of the functions g i really is irrelevant, consider the following example.

Example (Best-fit trigonometric function)

The next example has a somewhat different flavor from the previous ones.

Example (Best-fit ellipse)

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

6.5: The Method of Least Squares

  • Last updated
  • Save as PDF
  • Page ID 79396

Learning Objectives

  • Learn examples of best-fit problems.
  • Learn to turn a best-fit problem into a least-squares problem.
  • Recipe: find a least-squares solution (two ways). solution.
  • Picture: geometry of a least-squares solution.
  • Vocabulary words: least-squares solution .

In this section, we answer the following important question:

Suppose that \(Ax=b\) does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution . We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation \(Ax=b\).

Definition \(\PageIndex{1}\): Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). A least-squares solution of the matrix equation \(Ax=b\) is a vector \(\hat x\) in \(\mathbb{R}^n \) such that

\[ \text{dist}(b,\,A\hat x) \leq \text{dist}(b,\,Ax) \nonumber \]

for all other vectors \(x\) in \(\mathbb{R}^n \).

Recall that \(\text{dist}(v,w) = \|v-w\|\) is the distance, Definition 6.1.2  in Section 6.1 , between the vectors \(v\) and \(w\). The term “least squares” comes from the fact that \(\text{dist}(b,Ax) = \|b-A\hat x\|\) is the square root of the sum of the squares of the entries of the vector \(b-A\hat x\). So a least-squares solution minimizes the sum of the squares of the differences between the entries of \(A\hat x\) and \(b\). In other words, a least-squares solution solves the equation \(Ax=b\) as closely as possible, in the sense that the sum of the squares of the difference \(b-Ax\) is minimized.

Least Squares: Picture

Suppose that the equation \(Ax=b\) is inconsistent. Recall from Note 2.3.6  in Section 2.3  that the column space of \(A\) is the set of all other vectors \(c\) such that \(Ax=c\) is consistent. In other words, \(\text{Col}(A)\) is the set of all vectors of the form \(Ax.\) Hence, the closest vector, Note 6.3.1 in Section 6.3 , of the form \(Ax\) to \(b\) is the orthogonal projection of \(b\) onto \(\text{Col}(A)\). This is denoted \(b_{\text{Col}(A)}\text{,}\) following Definition 6.3.1 in Section 6.3 . 

clipboard_ea9ac872eb74fe7e1e7b7622e003f8e71.png

Figure \(\PageIndex{1}\)

Note \(\PageIndex{1}\)

A least-squares solution of \(Ax=b\) is a solution \(\hat x\) of the consistent equation \(Ax=b_{\text{Col}(A)}\)

Note \(\PageIndex{2}\)

If \(Ax=b\) is consistent, then \(b_{\text{Col}(A)} = b\text{,}\) so that a least-squares solution is the same as a usual solution.

Where is \(\hat x\) in this picture? If \(v_1,v_2,\ldots,v_n\) are the columns of \(A\text{,}\) then

\[ A\hat x = A\left(\begin{array}{c}\hat{x}_1 \\ \hat{x}_2 \\ \vdots \\ \hat{x}_{n}\end{array}\right) = \hat x_1v_1 + \hat x_2v_2 + \cdots + \hat x_nv_n. \nonumber \]

Hence the entries of \(\hat x\) are the “coordinates” of \(b_{\text{Col}(A)}\) with respect to the spanning set \(\{v_1,v_2,\ldots,v_m\}\) of \(\text{Col}(A)\). (They are honest \(\mathcal{B}\)-coordinates if the columns of \(A\) are linearly independent.) 

clipboard_e56c7ceadd078dc151c81725c752e0e72.png

Figure \(\PageIndex{2}\)

clipboard_e468ac062352ff76d38dadac3ae095ad7.png

We learned to solve this kind of orthogonal projection problem in Section 6.3 .

Theorem \(\PageIndex{1}\)

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The least-squares solutions of \(Ax=b\) are the solutions of the matrix equation

\[ A^TAx = A^Tb \nonumber \]

By Theorem 6.3.2  in Section 6.3 , if \(\hat x\) is a solution of the matrix equation \(A^TAx = A^Tb\text{,}\) then \(A\hat x\) is equal to \(b_{\text{Col}(A)}\). We argued above that a least-squares solution of \(Ax=b\) is a solution of \(Ax = b_{\text{Col}(A)}.\)

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^n \). Here is a method for computing a least-squares solution of \(Ax=b\text{:}\)

  • Compute the matrix \(A^TA\) and the vector \(A^Tb\).
  • Form the augmented matrix for the matrix equation \(A^TAx = A^Tb\text{,}\) and row reduce.
  • This equation is always consistent, and any solution \(\hat x\) is a least-squares solution.

To reiterate: once you have found a least-squares solution \(\hat x\) of \(Ax=b\text{,}\) then \(b_{\text{Col}(A)}\) is equal to \(A\hat x\).

Example \(\PageIndex{1}\)

Find the least-squares solutions of \(Ax=b\) where:

\[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

What quantity is being minimized?

\[ A^TA = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) = \left(\begin{array}{cc}5&3\\3&3\end{array}\right) \nonumber \]

\[ A^T b = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{c}6\\0\\0\end{array}\right) = \left(\begin{array}{c}0\\6\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{cc|c}5&3&0\\3&3&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&-3\\0&1&5\end{array}\right). \nonumber \]

Therefore, the only least-squares solution is \(\hat x = {-3\choose 5}.\)

This solution minimizes the distance from \(A\hat x\) to \(b\text{,}\) i.e., the sum of the squares of the entries of \(b-A\hat x = b-b_{\text{Col}(A)} = b_{\text{Col}(A)^\perp}\). In this case, we have

\[||b-A\hat{x}||=\left|\left|\left(\begin{array}{c}6\\0\\0\end{array}\right)-\left(\begin{array}{c}5\\2\\-1\end{array}\right)\right|\right|=\left|\left|\left(\begin{array}{c}1\\-2\\1\end{array}\right)\right|\right|=\sqrt{1^2+(-2)^2+1^2}=\sqrt{6}.\nonumber\]

Therefore, \(b_{\text{Col}(A)} = A\hat x\) is \(\sqrt 6\) units from \(b.\)

In the following picture, \(v_1,v_2\) are the columns of \(A\text{:}\) 

clipboard_ee2372c7c893331cd07fc8b7519cf16e6.png

Figure \(\PageIndex{4}\)

clipboard_ef5a556c30d640916b1428fe87e386f04.png

Example \(\PageIndex{2}\)

\[ A = \left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) \qquad b = \left(\begin{array}{c}1\\0\\-1\end{array}\right). \nonumber \]

\[ A^T A = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) = \left(\begin{array}{cc}5&-1\\-1&5\end{array}\right) \nonumber \]

\[ A^T b = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{c}1\\0\\-1\end{array}\right) = \left(\begin{array}{c}2\\-2\end{array}\right). \nonumber \]

\[ \left(\begin{array}{cc|c}5&-1&2\\-1&5&-2\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&1/3\\0&1&-1/3\end{array}\right). \nonumber \]

Therefore, the only least-squares solution is \(\hat x = \frac 13{1\choose -1}.\)

clipboard_ee2f33531424345df9c977d7ab301f97e.png

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of \(A\) are linearly dependent, then \(Ax=b_{\text{Col}(A)}\) has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of Corollary 6.3.1 in Section 6.3 .

Theorem \(\PageIndex{2}\)

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The following are equivalent:

  • \(Ax=b\) has a unique least-squares solution.
  • The columns of \(A\) are linearly independent.
  • \(A^TA\) is invertible.

In this case, the least-squares solution is

\[ \hat x = (A^TA)^{-1} A^Tb. \nonumber \]

The set of least-squares solutions of \(Ax=b\) is the solution set of the consistent equation \(A^TAx=A^Tb\text{,}\) which is a translate of the solution set of the homogeneous equation \(A^TAx=0\). Since \(A^TA\) is a square matrix, the equivalence of 1 and 3 follows from Theorem 5.1.1 in Section 5.1 . The set of least squares-solutions is also the solution set of the consistent equation \(Ax = b_{\text{Col}(A)}\text{,}\) which has a unique solution if and only if the columns of \(A\) are linearly independent by Recipe: Checking Linear Independence in Section 2.5 .

Example \(\PageIndex{3}\): Infinitely many least-squares solutions

\[ A = \left(\begin{array}{ccc}1&0&1\\1&1&-1\\1&2&-3\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

\[ A^TA = \left(\begin{array}{ccc}3&3&-3\\3&5&-7\\-3&-7&11\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}6\\0\\6\end{array}\right). \nonumber \]

\[ \left(\begin{array}{ccc|c}3&3&-3&6\\3&5&-7&0\\-3&-7&11&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&1&5\\0&1&-2&-3\\0&0&0&0\end{array}\right). \nonumber \]

The free variable is \(x_3\text{,}\) so the solution set is

\[\left\{\begin{array}{rrrrr}x_1 &=& -x_3 &+& 5\\ x_2 &=& 2x_3 &-& 3\\ x_3 &=& x_3&{}&{}\end{array}\right. \quad\xrightarrow[\text{vector form}]{\text{parametric}}\quad \hat{x}=\left(\begin{array}{c}x_1\\x_2\\x_3\end{array}\right)=x_3\left(\begin{array}{c}-1\\2\\1\end{array}\right)+\left(\begin{array}{c}5\\-3\\0\end{array}\right).\nonumber\]

For example, taking \(x_3 = 0\) and \(x_3=1\) gives the least-squares solutions

\[ \hat x = \left(\begin{array}{c}5\\-3\\0\end{array}\right)\quad\text{and}\quad \hat x =\left(\begin{array}{c}4\\-1\\1\end{array}\right). \nonumber \]

Geometrically, we see that the columns \(v_1,v_2,v_3\) of \(A\) are coplanar:

clipboard_e3e1e6b554d28d13810e30a946ffc09d6.png

Figure \(\PageIndex{7}\)

Therefore, there are many ways of writing \(b_{\text{Col}(A)}\) as a linear combination of \(v_1,v_2,v_3\).

clipboard_e6682babeee9fea8a2b096b8b443dfd77.png

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if \(A\) is an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) then we can use the Theorem 6.4.1 in Section 6.4  to write

\[ b_{\text{Col}(A)} = \frac{b\cdot u_1}{u_1\cdot u_1}\,u_1 + \frac{b\cdot u_2}{u_2\cdot u_2}\,u_2 + \cdots + \frac{b\cdot u_m}{u_m\cdot u_m}\,u_m = A\left(\begin{array}{c}(b\cdot u_1)/(u_1\cdot u_1) \\ (b\cdot u_2)/(u_2\cdot u_2) \\ \vdots \\ (b\cdot u_m)/(u_m\cdot u_m)\end{array}\right). \nonumber \]

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent, Fact 6.4.1 in Section 6.4 .

Recipe 2: Compute a Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) and let \(b\) be a vector in \(\mathbb{R}^n \). Then the least-squares solution of \(Ax=b\) is the vector

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \ldots,\; \frac{b\cdot u_m}{u_m\cdot u_m} \right). \nonumber \]

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

Example \(\PageIndex{4}\)

Find the least-squares solution of \(Ax=b\) where:

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

Let \(u_1,u_2,u_3\) be the columns of \(A\). These form an orthogonal set, so

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \]

Compare Example 6.4.8  in Section 6.4 .

Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example \(\PageIndex{5}\): Best-Fit Line

Suppose that we have measured three data points

\[ (0,6),\quad (1,0),\quad (2,0), \nonumber \]

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

clipboard_e54d25efe04081a443400baa7c044f5f5.png

Figure \(\PageIndex{9}\)

The general equation for a (non-vertical) line is

\[ y = Mx + B. \nonumber \]

If our three data points were to lie on this line, then the following equations would be satisfied:

\[\begin{align} 6&=M\cdot 0+B\nonumber \\ 0&=M\cdot 1+B\label{eq:1} \\ 0&=M\cdot 2+B.\nonumber\end{align}\]

In order to find the best-fit line, we try to solve the above equations in the unknowns \(M\) and \(B\). As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve \(Ax=b\) for

\[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad x = \left(\begin{array}{c}M\\B\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

We solved this least-squares problem in Example \(\PageIndex{1}\): the only least-squares solution to \(Ax=b\) is \(\hat x = {M\choose B} = {-3\choose 5}\text{,}\) so the best-fit line is

\[ y = -3x + 5. \nonumber \]

clipboard_eedbef359c60de7568039df013728ef60.png

Figure \(\PageIndex{10}\)

What exactly is the line \(y= f(x) = -3x+5\) minimizing? The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the left-hand side of \(\eqref{eq:1}\), and

\[ A\left(\begin{array}{c}-3\\5\end{array}\right) = \left(\begin{array}{c}-3(0)+5\\-3(1)+5\\-3(2)+5\end{array}\right) = \left(\begin{array}{c}f(0)\\f(1)\\f(2)\end{array}\right). \nonumber \]

In other words, \(A\hat x\) is the vector whose entries are the \(y\)-coordinates of the graph of the line at the values of \(x\) we specified in our data points, and \(b\) is the vector whose entries are the \(y\)-coordinates of those data points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points:

clipboard_e6819759224eb2dda5f75df3c656338dc.png

Figure \(\PageIndex{11}\)

\[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}6\\0\\0\end{array}\right)-A\left(\begin{array}{c}-3\\5\end{array}\right)=\left(\begin{array}{c}-1\\2\\-1\end{array}\right)}\nonumber\]

The best-fit line minimizes the sum of the squares of these vertical distances.

Example \(\PageIndex{6}\): Interactive: Best-fit line

clipboard_ec060533584f97fb4238f355f92a5763b.png

Example \(\PageIndex{7}\): Best-fit parabola

Find the parabola that best approximates the data points

\[ (-1,\,1/2),\quad(1,\,-1),\quad(2,\,-1/2),\quad(3,\,2). \nonumber \]

clipboard_e2f0577094c91ba6284574284c685ffbc.png

Figure \(\PageIndex{13}\)

The general equation for a parabola is

\[ y = Bx^2 + Cx + D. \nonumber \]

If the four points were to lie on this parabola, then the following equations would be satisfied:

\[\begin{align} \frac{1}{2}&=B(-1)^2+C(-1)+D\nonumber \\ -1&=B(1)^2+C(1)+D\nonumber \\ -\frac{1}{2}&=B(2)^2+C(2)+D\label{eq:2} \\ 2&=B(3)^2+C(3)+D.\nonumber\end{align}\]

We treat this as a system of equations in the unknowns \(B,C,D\). In matrix form, we can write this as \(Ax=b\) for

\[ A = \left(\begin{array}{ccc}1&-1&1\\1&1&1\\4&2&1\\9&3&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}1/2 \\ -1\\-1/2 \\ 2\end{array}\right). \nonumber \]

We find a least-squares solution by multiplying both sides by the transpose:

\[ A^TA = \left(\begin{array}{ccc}99&35&15\\35&15&5\\15&5&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}31/2\\7/2\\1\end{array}\right), \nonumber \]

then forming an augmented matrix and row reducing:

\[ \left(\begin{array}{ccc|c}99&35&15&31/2 \\ 35&15&5&7/2 \\ 15&5&4&1\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&0&53/88 \\ 0&1&0&-379/440 \\ 0&0&1&-41/44\end{array}\right) \implies \hat x = \left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44 \end{array}\right). \nonumber \]

The best-fit parabola is

\[ y = \frac{53}{88}x^2 - \frac{379}{440}x - \frac{41}{44}. \nonumber \]

Multiplying through by \(88\text{,}\) we can write this as

\[ 88y = 53x^2 - \frac{379}{5}x - 82. \nonumber \]

clipboard_e2b76e1827ddf6d682634349c519b50af.png

Figure \(\PageIndex{14}\)

Now we consider what exactly the parabola \(y = f(x)\) is minimizing. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the left-hand side of \(\eqref{eq:2}\), and

\[A\hat{x}=\left(\begin{array}{c} \frac{53}{88}(-1)^2-\frac{379}{440}(-1)-\frac{41}{44} \\  \frac{53}{88}(1)^2-\frac{379}{440}(1)-\frac{41}{44} \\ \frac{53}{88}(2)^2-\frac{379}{440}(2)-\frac{41}{44} \\ \frac{53}{88}(3)^2-\frac{379}{440}(3)-\frac{41}{44}\end{array}\right)=\left(\begin{array}{c}f(-1) \\ f(1) \\ f(2) \\ f(3)\end{array}\right).\nonumber\]

In other words, \(A\hat x\) is the vector whose entries are the \(y\)-coordinates of the graph of the parabola at the values of \(x\) we specified in our data points, and \(b\) is the vector whose entries are the \(y\)-coordinates of those data points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points:

clipboard_ee7779e17dd5a2f93a1f7a03119ae2cd9.png

Figure \(\PageIndex{15}\)

\[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}1/2 \\ -1 \\ -1/2 \\ 2\end{array}\right)-A\left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44\end{array}\right)=\left(\begin{array}{c}-7/220 \\ 21/110 \\ -14/55 \\ 21/220\end{array}\right)}\nonumber\]

The best-fit parabola minimizes the sum of the squares of these vertical distances.

clipboard_e9939d02f4a164ee8d2a5b711e17c9cee.png

Example \(\PageIndex{8}\): Best-fit linear function

Find the linear function \(f(x,y)\) that best approximates the following data:

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 1 & 0 & 0 \\ 0 & 1 & 1 \\ -1 & 0 & 3 \\ 0 & -1 & 4 \end{array} \nonumber \]

The general equation for a linear function in two variables is

\[ f(x,y) = Bx + Cy + D. \nonumber \]

We want to solve the following system of equations in the unknowns \(B,C,D\text{:}\)

\[\begin{align} B(1)+C(0)+D&=0 \nonumber \\ B(0)+C(1)+D&=1 \nonumber \\ B(-1)+C(0)+D&=3\label{eq:3} \\ B(0)+C(-1)+D&=4\nonumber\end{align}\]

In matrix form, we can write this as \(Ax=b\) for

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

We observe that the columns \(u_1,u_2,u_3\) of \(A\) are orthogonal , so we can use Recipe 2: Compute a Least-Squares Solution:

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \] We find a least-squares solution by multiplying both sides by the transpose:

\[ A^TA = \left(\begin{array}{ccc}2&0&0\\0&2&0\\0&0&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}-3\\-3\\8\end{array}\right). \nonumber \] The matrix \(A^TA\) is diagonal (do you see why that happened?), so it is easy to solve the equation \(A^TAx = A^Tb\text{:}\)

\[ \left(\begin{array}{cccc}2&0&0&-3 \\ 0&2&0&-3 \\ 0&0&4&8\end{array}\right) \xrightarrow{\text{RREF}} \left(\begin{array}{cccc}1&0&0&-3/2 \\ 0&1&0&-3/2 \\ 0&0&1&2\end{array}\right)  \implies \hat x = \left(\begin{array}{c}-3/2 \\ -3/2 \\ 2\end{array}\right). \nonumber \] Therefore, the best-fit linear equation is

\[ f(x,y) = -\frac 32x - \frac32y + 2. \nonumber \]

Here is a picture of the graph of \(f(x,y)\text{:}\)

clipboard_e12786042cc38fae737cfaa5e2d04b926.png

Figure \(\PageIndex{17}\)

Now we consider what quantity is being minimized by the function \(f(x,y)\). The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the right-hand side of \(\eqref{eq:3}\), and

\[A\hat{x}=\left(\begin{array}{rrrrr}-\frac32(1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(1) &+& 2\\ -\frac32(-1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(-1) &+& 2\end{array}\right)=\left(\begin{array}{c}f(1,0) \\ f(0,1) \\ f(-1,0) \\ f(0,-1)\end{array}\right).\nonumber\]

In other words, \(A\hat x\) is the vector whose entries are the values of \(f\) evaluated on the points \((x,y)\) we specified in our data table, and \(b\) is the vector whose entries are the desired values of \(f\) evaluated at those points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points, as indicated in the above picture. The best-fit linear function minimizes the sum of these vertical distances.

clipboard_e849ad7e50773d81bcdb143f06d1a75ea.png

All of the above examples have the following form: some number of data points \((x,y)\) are specified, and we want to find a function

\[ y = B_1g_1(x) + B_2g_2(x) + \cdots + B_mg_m(x) \nonumber \]

that best approximates these points, where \(g_1,g_2,\ldots,g_m\) are fixed functions of \(x\). Indeed, in the best-fit line example we had \(g_1(x)=x\) and \(g_2(x)=1\text{;}\) in the best-fit parabola example we had \(g_1(x)=x^2\text{,}\) \(g_2(x)=x\text{,}\) and \(g_3(x)=1\text{;}\) and in the best-fit linear function example we had \(g_1(x_1,x_2)=x_1\text{,}\) \(g_2(x_1,x_2)=x_2\text{,}\) and \(g_3(x_1,x_2)=1\) (in this example we take \(x\) to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns \(B_1,B_2,\ldots,B_m\)—once we evaluate the \(g_i\text{,}\) they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to our original data points.

To emphasize that the nature of the functions \(g_i\) really is irrelevant, consider the following example.

Example \(\PageIndex{9}\): Best-fit trigonometric function

What is the best-fit function of the form

\[ y=B+C\cos(x)+D\sin(x)+E\cos(2x)+F\sin(2x)+G\cos(3x)+H\sin(3x) \nonumber \]

passing through the points

\[ \left(\begin{array}{c}-4\\ -1\end{array}\right),\; \left(\begin{array}{c}-3\\ 0\end{array}\right),\; \left(\begin{array}{c}-2\\ -1.5\end{array}\right),\; \left(\begin{array}{c}-1\\ .5\end{array}\right),\; \left(\begin{array}{c}0\\1\end{array}\right),\; \left(\begin{array}{c}1\\-1\end{array}\right),\; \left(\begin{array}{c}2\\-.5\end{array}\right),\; \left(\begin{array}{c}3\\2\end{array}\right),\; \left(\begin{array}{c}4 \\-1\end{array}\right)? \nonumber \]

clipboard_e6cc7b9e078c2fdd902f95a6b8ce4b5db.png

Figure \(\PageIndex{19}\)

We want to solve the system of equations

\[\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &+& C\cos(-4) &+& D\sin(-4) &+& E\cos(-8) &+& F\sin(-8) &+& G\cos(-12) &+& H\sin(-12)\\ 0 &=& B &+& C\cos(-3) &+& D\sin(-3) &+& E\cos(-6) &+& F\sin(-6) &+& G\cos(-9) &+& H\sin(-9)\\ -1.5 &=& B &+& C\cos(-2) &+& D\sin(-2) &+& E\cos(-4) &+& F\sin(-4) &+& G\cos(-6) &+& H\sin(-6) \\ 0.5 &=& B &+& C\cos(-1) &+& D\sin(-1) &+& E\cos(-2) &+& F\sin(-2) &+& G\cos(-3) &+& H\sin(-3)\\ 1 &=& B &+& C\cos(0) &+& D\sin(0) &+& E\cos(0) &+& F\sin(0) &+& G\cos(0) &+& H\sin(0)\\ -1 &=& B &+& C\cos(1) &+& D\sin(1) &+& E\cos(2) &+& F\sin(2) &+& G\cos(3) &+& H\sin(3)\\ -0.5 &=& B &+& C\cos(2) &+& D\sin(2) &+& E\cos(4) &+& F\sin(4) &+& G\cos(6) &+& H\sin(6)\\ 2 &=& B &+& C\cos(3) &+& D\sin(3) &+& E\cos(6) &+& F\sin(6) &+& G\cos(9) &+& H\sin(9)\\ -1 &=& B &+& C\cos(4) &+& D\sin(4) &+& E\cos(8) &+& F\sin(8) &+& G\cos(12) &+& H\sin(12).\end{array}\nonumber\]

All of the terms in these equations are numbers , except for the unknowns \(B,C,D,E,F,G,H\text{:}\)

\[\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &-& 0.6536C &+& 0.7568D &-& 0.1455E &-& 0.9894F &+& 0.8439G &+& 0.5366H\\ 0 &=& B &-& 0.9900C &-& 0.1411D &+& 0.9602E &+& 0.2794F &-& 0.9111G &-& 0.4121H\\ -1.5 &=& B &-& 0.4161C &-& 0.9093D &-& 0.6536E &+& 0.7568F &+& 0.9602G &+& 0.2794H\\ 0.5 &=& B &+& 0.5403C &-& 0.8415D &-& 0.4161E &-& 0.9093F &-& 0.9900G &-& 0.1411H\\ 1&=&B&+&C&{}&{}&+&E&{}&{}&+&G&{}&{}\\ -1 &=& B &+& 0.5403C &+& 0.8415D &-& 0.4161E &+& 0.9093F &-& 0.9900G &+& 0.1411H\\ -0.5 &=& B &-& 0.4161C &+& 0.9093D &-& 0.6536E &-& 0.7568F &+& 0.9602G &-& 0.2794H\\ 2 &=& B &-& 0.9900C &+& 0.1411D &+& 0.9602E &-& 0.2794F &-& 0.9111G &+& 0.4121H\\ -1 &=& B &-& 0.6536C &-& 0.7568D &-& 0.1455E &+& 0.9894F &+& 0.8439G &-& 0.5366H.\end{array}\nonumber\]

Hence we want to solve the least-squares problem

\[\left(\begin{array}{rrrrrrr} 1 &-0.6536& 0.7568& -0.1455& -0.9894& 0.8439 &0.5366\\ 1& -0.9900& -0.1411 &0.9602 &0.2794& -0.9111& -0.4121\\ 1& -0.4161& -0.9093& -0.6536& 0.7568& 0.9602& 0.2794\\ 1& 0.5403& -0.8415 &-0.4161& -0.9093& -0.9900 &-0.1411\\ 1& 1& 0& 1& 0& 1& 0\\ 1& 0.5403& 0.8415& -0.4161& 0.9093& -0.9900 &0.1411\\ 1& -0.4161& 0.9093& -0.6536& -0.7568& 0.9602& -0.2794\\ 1& -0.9900 &0.1411 &0.9602& -0.2794& -0.9111& 0.4121\\ 1& -0.6536& -0.7568& -0.1455& 0.9894 &0.8439 &-0.5366\end{array}\right)\left(\begin{array}{c}B\\C\\D\\E\\F\\G\\H\end{array}\right)=\left(\begin{array}{c}-1\\0\\-1.5\\0.5\\1\\-1\\-0.5\\2\\-1\end{array}\right).\nonumber\]

We find the least-squares solution with the aid of a computer:

\[\hat{x}\approx\left(\begin{array}{c} -0.1435 \\0.2611 \\-0.2337\\ 1.116\\ -0.5997\\ -0.2767 \\0.1076\end{array}\right).\nonumber\]

Therefore, the best-fit function is

\[ \begin{split} y \amp\approx -0.1435 + 0.2611\cos(x) -0.2337\sin(x) + 1.116\cos(2x) -0.5997\sin(2x) \\ \amp\qquad\qquad -0.2767\cos(3x) + 0.1076\sin(3x). \end{split} \nonumber \]

clipboard_eb6859512d02b4b08be3366962c54394e.png

Figure \(\PageIndex{20}\)

As in the previous examples, the best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to the data points.

clipboard_ee73022f612e0249d723c71b76a653f10.png

The next example has a somewhat different flavor from the previous ones.

Example \(\PageIndex{10}\): Best-fit ellipse

Find the best-fit ellipse through the points

\[ (0,2),\, (2,1),\, (1,-1),\, (-1,-2),\, (-3,1),\, (-1,-1). \nonumber \]

clipboard_e8baa8445961c3d0e4133792238688766.png

Figure \(\PageIndex{22}\)

The general equation for an ellipse (actually, for a nondegenerate conic section) is

\[ x^2 + By^2 + Cxy + Dx + Ey + F = 0. \nonumber \]

This is an implicit equation : the ellipse is the set of all solutions of the equation, just like the unit circle is the set of solutions of \(x^2+y^2=1.\) To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of \(x\) and \(y\text{:}\)

\[\label{eq:4} \begin{array}{rrrrrrrrrrrrl} (0)^2 &+& B(2)^2 &+& C(0)(2)&+& D(0) &+& E(2)&+& F&=& 0 \\ (2)^2 &+& B(1)^2 &+& C(2)(1)&+& D(2)&+&E(1)&+&F&=& 0 \\ (1)^2&+& B(-1)^2&+&C(1)(-1)&+&D(1)&+&E(-1)&+&F&=&0 \\ (-1)^2&+&B(-2)^2&+&C(-1)(2)&+&D(-1)&+&E(-2)&+&F&=&0 \\ (-3)^2&+&B(1)^2&+&C(-3)(1)&+&D(-3)&+&E(1)&+&F&=&0 \\ (-1)^2&+&B(-1)^2&+&C(-1)(-1)&+&D(-1)&+&E(-1)&+&F&=&0.\end{array}\]

To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as \(Ax=b\) for

\[A=\left(\begin{array}{ccccc}4&0&0&2&1\\1&2&2&1&1\\1&-1&1&-1&1 \\ 4&2&-1&-2&1 \\ 1&-3&-3&1&1 \\ 1&1&-1&-1&1\end{array}\right)\quad x=\left(\begin{array}{c}B\\C\\D\\E\\F\end{array}\right)\quad b=\left(\begin{array}{c}0\\-4\\-1\\-1\\-9\\-1\end{array}\right).\nonumber\]

\[ A^TA = \left(\begin{array}{ccccc}36&7&-5&0&12 \\ 7&19&9&-5&1 \\ -5&9&16&1&-2 \\ 0&-5&1&12&0 \\ 12&1&-2&0&6\end{array}\right) \qquad A^T b = \left(\begin{array}{c}-19\\17\\20\\-9\\-16\end{array}\right). \nonumber \]

\[\left(\begin{array}{ccccc|c}  36 &7& -5& 0& 12& -19\\ 7& 19& 9& -5& 1& 17\\ -5& 9& 16& 1& -2& 20\\ 0& -5& 1 &12& 0& -9\\ 12& 1& -2& 0& 6& -16\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccccc|c} 1& 0& 0& 0& 0& 405/266\\ 0& 1& 0& 0& 0& -89/133\\ 0& 0& 1& 0& 0& 201/133\\ 0& 0& 0& 1& 0& -123/266\\ 0& 0& 0& 0& 1& -687/133\end{array}\right).\nonumber\]

The least-squares solution is

\[\hat{x}=\left(\begin{array}{c}405/266\\ -89/133\\ 201/133\\ -123/266\\ -687/133\end{array}\right),\nonumber\]

so the best-fit ellipse is

\[ x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} = 0. \nonumber \]

Multiplying through by \(266\text{,}\) we can write this as

\[ 266 x^2 + 405 y^2 - 178 xy + 402 x - 123 y - 1374 = 0. \nonumber \]

clipboard_e90d2576a3e3a0f46cfa29861a35333f0.png

Figure \(\PageIndex{23}\)

Now we consider the question of what quantity is minimized by this ellipse. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\text{,}\) or equivalently, of \(A\hat x-b\). The vector \(-b\) contains the constant terms of the left-hand sides of \(\eqref{eq:4}\), and

\[A\hat{x}=\left(\begin{array}{rrrrrrrrr} \frac{405}{266}(2)^2 &-& \frac{89}{133}(0)(2)&+&\frac{201}{133}(0)&-&\frac{123}{266}(2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-& \frac{89}{133}(2)(1)&+&\frac{201}{133}(2)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2 &-&\frac{89}{133}(1)(-1)&+&\frac{201}{133}(1)&-&\frac{123}{266}(-1)&-&\frac{687}{133} \\ \frac{405}{266}(-2)^2&-&\frac{89}{133}(-1)(-2)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-&\frac{89}{133}(-3)(1)&+&\frac{201}{133}(-3)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2&-&\frac{89}{133}(-1)(-1)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-1)&-&\frac{687}{133}\end{array}\right)\nonumber\]

contains the rest of the terms on the left-hand side of \(\eqref{eq:4}\). Therefore, the entries of \(A\hat x-b\) are the quantities obtained by evaluating the function

\[ f(x,y) = x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} \nonumber \]

on the given data points.

If our data points actually lay on the ellipse defined by \(f(x,y)=0\text{,}\) then evaluating \(f(x,y)\) on our data points would always yield zero, so \(A\hat x-b\) would be the zero vector. This is not the case; instead, \(A\hat x-b\) contains the actual values of \(f(x,y)\) when evaluated on our data points. The quantity being minimized is the sum of the squares of these values:

\[ \begin{split} \amp\text{minimized} = \\ \amp\quad f(0,2)^2 + f(2,1)^2 + f(1,-1)^2 + f(-1,-2)^2 + f(-3,1)^2 + f(-1,-1)^2. \end{split} \nonumber \]

One way to visualize this is as follows. We can put this best-fit problem into the framework of Example \(\PageIndex{8}\) by asking to find an equation of the form

\[ f(x,y) = x^2 + By^2 + Cxy + Dx + Ey + F \nonumber \]

which best approximates the data table

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 0 & 2 & 0 \\ 2 & 1 & 0 \\ 1 & -1 & 0 \\ -1 & -2 & 0 \\ -3 & 1 & 0 \\ -1 & -1 & 0\rlap. \end{array} \nonumber \]

The resulting function minimizes the sum of the squares of the vertical distances from these data points \((0,2,0),\,(2,1,0),\,\ldots\text{,}\) which lie on the \(xy\)-plane, to the graph of \(f(x,y)\).

clipboard_eac37b75c454270321c9e86a0ff237f0a.png

Note \(\PageIndex{3}\)

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

Least Square Method

Least square method is the process of finding a regression line or best-fitted line for any data set that is described by an equation. This method requires reducing the sum of the squares of the residual parts of the points from the curve or line and the trend of outcomes is found quantitatively. The method of curve fitting is seen while regression analysis and the fitting equations to derive the curve is the least square method.

Let us look at a simple example, Ms. Dolma said in the class "Hey students who spend more time on their assignments are getting better grades". A student wants to estimate his grade for spending 2.3 hours on an assignment. Through the magic of the least-squares method, it is possible to determine the predictive model that will help him estimate the grades far more accurately. This method is much simpler because it requires nothing more than some data and maybe a calculator.

In this section, we’re going to explore least squares, understand what it means, learn the general formula, steps to plot it on a graph, know what are its limitations, and see what tricks we can use with least squares.

Least Square Method Definition

The least-squares method is a statistical method used to find the line of best fit of the form of an equation such as y = mx + b to the given data. The curve of the equation is called the regression line. Our main objective in this method is to reduce the sum of the squares of errors as much as possible. This is the reason this method is called the least-squares method. This method is often used in data fitting where the best fit result is assumed to reduce the sum of squared errors that is considered to be the difference between the observed values and corresponding fitted value. The sum of squared errors helps in finding the variation in observed data. For example, we have 4 data points and using this method we arrive at the following graph.

Least Square Method

The two basic categories of least-square problems are ordinary or linear least squares and nonlinear least squares.

Limitations for Least Square Method

Even though the least-squares method is considered the best method to find the line of best fit, it has a few limitations. They are:

  • This method exhibits only the relationship between the two variables . All other causes and effects are not taken into consideration.
  • This method is unreliable when data is not evenly distributed.
  • This method is very sensitive to outliers. In fact, this can skew the results of the least-squares analysis.

Least Square Method Graph

Look at the graph below, the straight line shows the potential relationship between the independent variable and the dependent variable. The ultimate goal of this method is to reduce this difference between the observed response and the response predicted by the regression line. Less residual means that the model fits better. The data points need to be minimized by the method of reducing residuals of each point from the line. There are vertical residuals and perpendicular residuals. Vertical is mostly used in polynomials and hyperplane problems while perpendicular is used in general as seen in the image below.

Least Square Method Graph

Least Square Method Formula

Least-square method is the curve that best fits a set of observations with a minimum sum of squared residuals or errors. Let us assume that the given points of data are (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), …, (x n , y n ) in which all x’s are independent variables, while all y’s are dependent ones. This method is used to find a linear line of the form y = mx + b, where y and x are variables, m is the slope, and b is the y-intercept. The formula to calculate slope m and the value of b is given by:

m = (n∑xy - ∑y∑x)/n∑x 2 - (∑x) 2

b = (∑y - m∑x)/n

Here, n is the number of data points.

Following are the steps to calculate the least square using the above formulas.

  • Step 1: Draw a table with 4 columns where the first two columns are for x and y points.
  • Step 2: In the next two columns, find xy and (x) 2 .
  • Step 3: Find ∑x, ∑y, ∑xy, and ∑(x) 2 .
  • Step 4: Find the value of slope m using the above formula.
  • Step 5: Calculate the value of b using the above formula.
  • Step 6: Substitute the value of m and b in the equation y = mx + b

Let us look at an example to understand this better.

Example: Let's say we have data as shown below.

Solution: We will follow the steps to find the linear line.

Find the value of m by using the formula,

m = [(5×88) - (15×25)]/(5×55) - (15) 2

m = (440 - 375)/(275 - 225)

m = 65/50 = 13/10

Find the value of b by using the formula,

b = (25 - 1.3×15)/5

b = (25 - 19.5)/5

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

Important Notes

  • The least-squares method is used to predict the behavior of the dependent variable with respect to the independent variable.
  • The sum of the squares of errors is called variance.
  • The main aim of the least-squares method is to minimize the sum of the squared errors.

Related Topics

Listed below are a few topics related to least-square method.

  • x and y graph

Least Square Method Examples

Example 1: Consider the set of points: (1, 1), (-2,-1), and (3, 2). Plot these points and the least-squares regression line in the same graph.

Solution: There are three points, so the value of n is 3

Now, find the value of m, using the formula.

m = [(3×9) - (2×2)]/(3×14) - (2) 2

m = (27 - 4)/(42 - 4)

Now, find the value of b using the formula,

b = [2 - (23/38)×2]/3

b = [2 -(23/19)]/3

b = 15/(3×19)

So, the required equation of least squares is y = mx + b = 23/38x + 5/19. The required graph is shown as:

Least Square Method Example

Therefore, the equation of regression line is y = 23/38x + 5/19.

Example 2: Consider the set of points: (-1, 0), (0, 2), (1, 4), and (k, 5). The values of slope and y-intercept in the equation of least squares are 1.7 and 1.9 respectively. Can you determine the value of k?

Solution: Here, there are four data points.

So, the value of n is 4

The slope of the least-squares line, m = 1.7

The value of y-intercept of the least-squares line, b = 1.9

Now, to evaluate the value of unknown k, substitute m = 1.7, b = 1.9, ∑x =k, and ∑y = 11 in the formula,

1.9 = (11 - 1.7k)/4

1.9×4 = 11 - 1.7k

1.7k = 11 - 7.6

k = 3.4/1.7

Therefore, the value of k is 2.

Example 3: The following data shows the sales (in million dollars) of a company.

Can you estimate the sales in the year 2020 using the regression line?

Solution: Here, there are 5 data points. So n = 5

We will make the use of substitution t = x-2015 to make the given data manageable.

Here, t represents the number of years after 2015

Find the value of m using the formula,

m = [(5×368) - (142×10)]/(5×30) - (10) 2

m = (1840 - 1420)/150 - 100

Find the value of b using the formula,

b = (142 - 8.4×10)/5

b = (142 - 84)/5

So, the equation of least squares is y(t) = 8.4t + 11.6.

Now, for the year 2020, the value of t is 2020 - 2015 = 5

The estimation of the sales in the year 2020 is given by substituting 5 for t in the equation,

y(t) = 8.4t + 11.6

y(5) = 8.4×5 + 11.6

y(5) = 42 + 11.6

y(5) = 53.6

Therefore, the predicted number of sales in the year 2020 is $53.6 million.

go to slide go to slide go to slide

solve a least squares problem

Book a Free Trial Class

Practice Questions on Least Square Method

go to slide go to slide go to slide go to slide

FAQs on Least Square Method

What are ordinary least squares used for.

The ordinary least squares method is used to find the predictive model that best fits our data points.

Is Least Squares the Same as Linear Regression?

No, linear regression and least-squares are not the same. Linear regression is the analysis of statistical data to predict the value of the quantitative variable. Least squares is one of the methods used in linear regression to find the predictive model.

How do Outliers Affect the Least-Squares Regression Line?

The presence of unusual data points can skew the results of the linear regression. This makes the validity of the model very critical to obtain sound answers to the questions motivating the formation of the predictive model.

What is Least Square Method Formula?

For determining the equation of line for any data, we need to use the equation y = mx + b. The least-square method formula is by finding the value of both m and b by using the formulas:

What is Least Square Method in Regression?

The least-square regression helps in calculating the best fit line of the set of data from both the activity levels and corresponding total costs. The idea behind the calculation is to minimize the sum of the squares of the vertical errors between the data points and cost function.

Why Least Square Method is Used?

Least squares is used as an equivalent to maximum likelihood when the model residuals are normally distributed with mean of 0.

What is Least Square Curve Fitting?

Least square method is the process of fitting a curve according to the given data. It is one of the methods used to determine the trend line for the given data.

Browse Course Material

Course info.

  • Prof. Gilbert Strang

Departments

  • Mathematics

As Taught In

  • Signal Processing
  • Applied Mathematics
  • Computation
  • Linear Algebra

Learning Resource Types

Matrix methods in data analysis, signal processing, and machine learning, lecture 9: four ways to solve least squares problems, description.

In this lecture, Professor Strang details the four ways to solve least-squares problems. Solving least-squares problems comes in to play in the many applications that rely on data fitting.

  • Solve \(A^{\mathtt{T}} Ax = A^{\mathtt{T}}b\) to minimize \(\Vert Ax - b \Vert^2\)
  • Gram-Schmidt \(A = QR\) leads to \(x = R^{-1} Q^{\mathtt{T}}b\).
  • The pseudoinverse directly multiplies \(b\) to give \(x\).
  • The best \(x\) is the limit of \((A^{\mathtt{T}}A + \delta I)^{-1} A^{\mathtt{T}}b\) as \(\delta \rightarrow 0\).

Related section in textbook: II.2

Instructor: Prof. Gilbert Strang

  • Download video
  • Download transcript

Problems for Lecture 9 From textbook Section II.2

8. What multiple of \(a = \left[\begin{matrix}1\\1\end{matrix}\right]\) should be subtracted from \(b = \left[\begin{matrix}4\\0\end{matrix}\right]\) to make the result \(A_2\) orthogonal to \(a\)? Sketch a figure to show \(a\), \(b\), and \(A_2\).

9. Complete the Gram-Schmidt process in Problem 8 by computing \(q_1=a/\|a\|\) and \(q_2=A_2/\|A_2\|\) and factoring into \(QR\): $$\left[\begin{matrix}1 & 4\\ 1 & 0\end{matrix}\right] = \left[\begin{matrix}q_1 & q_2\end{matrix}\right]\left[\begin{matrix}\|a\| & ?\\ 0 & \|A_2\|\end{matrix}\right]$$ The backslash command \(A\backslash b\) is engineered to make \(A\) block diagonal when possible.

MIT Open Learning

scipy.optimize.least_squares #

Solve a nonlinear least-squares problem with bounds on the variables.

Given the residuals f(x) (an m-D real function of n real variables) and the loss function rho(s) (a scalar function), least_squares finds a local minimum of the cost function F(x):

The purpose of the loss function rho(s) is to reduce the influence of outliers on the solution.

Function which computes the vector of residuals, with the signature fun(x, *args, **kwargs) , i.e., the minimization proceeds with respect to its first argument. The argument x passed to this function is an ndarray of shape (n,) (never a scalar, even for n=1). It must allocate and return a 1-D array_like of shape (m,) or a scalar. If the argument x is complex or the function fun returns complex residuals, it must be wrapped in a real function of real arguments, as shown at the end of the Examples section.

Initial guess on independent variables. If float, it will be treated as a 1-D array with one element. When method is ‘trf’, the initial guess might be slightly adjusted to lie sufficiently within the given bounds .

Method of computing the Jacobian matrix (an m-by-n matrix, where element (i, j) is the partial derivative of f[i] with respect to x[j]). The keywords select a finite difference scheme for numerical estimation. The scheme ‘3-point’ is more accurate, but requires twice as many operations as ‘2-point’ (default). The scheme ‘cs’ uses complex steps, and while potentially the most accurate, it is applicable only when fun correctly handles complex inputs and can be analytically continued to the complex plane. Method ‘lm’ always uses the ‘2-point’ scheme. If callable, it is used as jac(x, *args, **kwargs) and should return a good approximation (or the exact value) for the Jacobian as an array_like (np.atleast_2d is applied), a sparse matrix (csr_matrix preferred for performance) or a scipy.sparse.linalg.LinearOperator .

There are two ways to specify bounds:

Instance of Bounds class Lower and upper bounds on independent variables. Defaults to no bounds. Each array must match the size of x0 or be a scalar, in the latter case a bound will be the same for all variables. Use np.inf with an appropriate sign to disable bounds on all or some variables.

Algorithm to perform minimization.

‘trf’ : Trust Region Reflective algorithm, particularly suitable for large sparse problems with bounds. Generally robust method. ‘dogbox’ : dogleg algorithm with rectangular trust regions, typical use case is small problems with bounds. Not recommended for problems with rank-deficient Jacobian. ‘lm’ : Levenberg-Marquardt algorithm as implemented in MINPACK. Doesn’t handle bounds and sparse Jacobians. Usually the most efficient method for small unconstrained problems.

Default is ‘trf’. See Notes for more information.

Tolerance for termination by the change of the cost function. Default is 1e-8. The optimization process is stopped when dF < ftol * F , and there was an adequate agreement between a local quadratic model and the true model in the last step.

If None and ‘method’ is not ‘lm’, the termination by this condition is disabled. If ‘method’ is ‘lm’, this tolerance must be higher than machine epsilon.

Tolerance for termination by the change of the independent variables. Default is 1e-8. The exact condition depends on the method used:

For ‘trf’ and ‘dogbox’ : norm(dx) < xtol * (xtol + norm(x)) . For ‘lm’ : Delta < xtol * norm(xs) , where Delta is a trust-region radius and xs is the value of x scaled according to x_scale parameter (see below).

Tolerance for termination by the norm of the gradient. Default is 1e-8. The exact condition depends on a method used:

For ‘trf’ : norm(g_scaled, ord=np.inf) < gtol , where g_scaled is the value of the gradient scaled to account for the presence of the bounds [STIR] . For ‘dogbox’ : norm(g_free, ord=np.inf) < gtol , where g_free is the gradient with respect to the variables which are not in the optimal state on the boundary. For ‘lm’ : the maximum absolute value of the cosine of angles between columns of the Jacobian and the residual vector is less than gtol , or the residual vector is zero.

Characteristic scale of each variable. Setting x_scale is equivalent to reformulating the problem in scaled variables xs = x / x_scale . An alternative view is that the size of a trust region along jth dimension is proportional to x_scale[j] . Improved convergence may be achieved by setting x_scale such that a step of a given size along any of the scaled variables has a similar effect on the cost function. If set to ‘jac’, the scale is iteratively updated using the inverse norms of the columns of the Jacobian matrix (as described in [JJMore] ).

Determines the loss function. The following keyword values are allowed:

‘linear’ (default) : rho(z) = z . Gives a standard least-squares problem. ‘soft_l1’ : rho(z) = 2 * ((1 + z)**0.5 - 1) . The smooth approximation of l1 (absolute value) loss. Usually a good choice for robust least squares. ‘huber’ : rho(z) = z if z <= 1 else 2*z**0.5 - 1 . Works similarly to ‘soft_l1’. ‘cauchy’ : rho(z) = ln(1 + z) . Severely weakens outliers influence, but may cause difficulties in optimization process. ‘arctan’ : rho(z) = arctan(z) . Limits a maximum loss on a single residual, has properties similar to ‘cauchy’.

If callable, it must take a 1-D ndarray z=f**2 and return an array_like with shape (3, m) where row 0 contains function values, row 1 contains first derivatives and row 2 contains second derivatives. Method ‘lm’ supports only ‘linear’ loss.

Value of soft margin between inlier and outlier residuals, default is 1.0. The loss function is evaluated as follows rho_(f**2) = C**2 * rho(f**2 / C**2) , where C is f_scale , and rho is determined by loss parameter. This parameter has no effect with loss='linear' , but for other loss values it is of crucial importance.

Maximum number of function evaluations before the termination. If None (default), the value is chosen automatically:

For ‘trf’ and ‘dogbox’ : 100 * n. For ‘lm’ : 100 * n if jac is callable and 100 * n * (n + 1) otherwise (because ‘lm’ counts function calls in Jacobian estimation).

Determines the relative step size for the finite difference approximation of the Jacobian. The actual step is computed as x * diff_step . If None (default), then diff_step is taken to be a conventional “optimal” power of machine epsilon for the finite difference scheme used [NR] .

Method for solving trust-region subproblems, relevant only for ‘trf’ and ‘dogbox’ methods.

‘exact’ is suitable for not very large problems with dense Jacobian matrices. The computational complexity per iteration is comparable to a singular value decomposition of the Jacobian matrix. ‘lsmr’ is suitable for problems with sparse and large Jacobian matrices. It uses the iterative procedure scipy.sparse.linalg.lsmr for finding a solution of a linear least-squares problem and only requires matrix-vector product evaluations.

If None (default), the solver is chosen based on the type of Jacobian returned on the first iteration.

Keyword options passed to trust-region solver.

tr_solver='exact' : tr_options are ignored. tr_solver='lsmr' : options for scipy.sparse.linalg.lsmr . Additionally, method='trf' supports ‘regularize’ option (bool, default is True), which adds a regularization term to the normal equation, which improves convergence if the Jacobian is rank-deficient [Byrd] (eq. 3.4).

Defines the sparsity structure of the Jacobian matrix for finite difference estimation, its shape must be (m, n). If the Jacobian has only few non-zero elements in each row, providing the sparsity structure will greatly speed up the computations [Curtis] . A zero entry means that a corresponding element in the Jacobian is identically zero. If provided, forces the use of ‘lsmr’ trust-region solver. If None (default), then dense differencing will be used. Has no effect for ‘lm’ method.

Level of algorithm’s verbosity:

0 (default) : work silently. 1 : display a termination report. 2 : display progress during iterations (not supported by ‘lm’ method).

Additional arguments passed to fun and jac . Both empty by default. The calling signature is fun(x, *args, **kwargs) and the same for jac .

OptimizeResult with the following fields defined:

x ndarray, shape (n,) Solution found. cost float Value of the cost function at the solution. fun ndarray, shape (m,) Vector of residuals at the solution. jac ndarray, sparse matrix or LinearOperator, shape (m, n) Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm. grad ndarray, shape (m,) Gradient of the cost function at the solution. optimality float First-order optimality measure. In unconstrained problems, it is always the uniform norm of the gradient. In constrained problems, it is the quantity which was compared with gtol during iterations. active_mask ndarray of int, shape (n,) Each component shows whether a corresponding constraint is active (that is, whether a variable is at the bound): 0 : a constraint is not active. -1 : a lower bound is active. 1 : an upper bound is active.

Might be somewhat arbitrary for ‘trf’ method as it generates a sequence of strictly feasible iterates and active_mask is determined within a tolerance threshold.

Number of function evaluations done. Methods ‘trf’ and ‘dogbox’ do not count function calls for numerical Jacobian approximation, as opposed to ‘lm’ method.

Number of Jacobian evaluations done. If numerical Jacobian approximation is used in ‘lm’ method, it is set to None.

The reason for algorithm termination:

-1 : improper input parameters status returned from MINPACK. 0 : the maximum number of function evaluations is exceeded. 1 : gtol termination condition is satisfied. 2 : ftol termination condition is satisfied. 3 : xtol termination condition is satisfied. 4 : Both ftol and xtol termination conditions are satisfied.

Verbal description of the termination reason.

True if one of the convergence criteria is satisfied ( status > 0).

A legacy wrapper for the MINPACK implementation of the Levenberg-Marquadt algorithm.

Least-squares minimization applied to a curve-fitting problem.

Method ‘lm’ (Levenberg-Marquardt) calls a wrapper over least-squares algorithms implemented in MINPACK (lmder, lmdif). It runs the Levenberg-Marquardt algorithm formulated as a trust-region type algorithm. The implementation is based on paper [JJMore] , it is very robust and efficient with a lot of smart tricks. It should be your first choice for unconstrained problems. Note that it doesn’t support bounds. Also, it doesn’t work when m < n.

Method ‘trf’ (Trust Region Reflective) is motivated by the process of solving a system of equations, which constitute the first-order optimality condition for a bound-constrained minimization problem as formulated in [STIR] . The algorithm iteratively solves trust-region subproblems augmented by a special diagonal quadratic term and with trust-region shape determined by the distance from the bounds and the direction of the gradient. This enhancements help to avoid making steps directly into bounds and efficiently explore the whole space of variables. To further improve convergence, the algorithm considers search directions reflected from the bounds. To obey theoretical requirements, the algorithm keeps iterates strictly feasible. With dense Jacobians trust-region subproblems are solved by an exact method very similar to the one described in [JJMore] (and implemented in MINPACK). The difference from the MINPACK implementation is that a singular value decomposition of a Jacobian matrix is done once per iteration, instead of a QR decomposition and series of Givens rotation eliminations. For large sparse Jacobians a 2-D subspace approach of solving trust-region subproblems is used [STIR] , [Byrd] . The subspace is spanned by a scaled gradient and an approximate Gauss-Newton solution delivered by scipy.sparse.linalg.lsmr . When no constraints are imposed the algorithm is very similar to MINPACK and has generally comparable performance. The algorithm works quite robust in unbounded and bounded problems, thus it is chosen as a default algorithm.

Method ‘dogbox’ operates in a trust-region framework, but considers rectangular trust regions as opposed to conventional ellipsoids [Voglis] . The intersection of a current trust region and initial bounds is again rectangular, so on each iteration a quadratic minimization problem subject to bound constraints is solved approximately by Powell’s dogleg method [NumOpt] . The required Gauss-Newton step can be computed exactly for dense Jacobians or approximately by scipy.sparse.linalg.lsmr for large sparse Jacobians. The algorithm is likely to exhibit slow convergence when the rank of Jacobian is less than the number of variables. The algorithm often outperforms ‘trf’ in bounded problems with a small number of variables.

Robust loss functions are implemented as described in [BA] . The idea is to modify a residual vector and a Jacobian matrix on each iteration such that computed gradient and Gauss-Newton Hessian approximation match the true gradient and Hessian approximation of the cost function. Then the algorithm proceeds in a normal way, i.e., robust loss functions are implemented as a simple wrapper over standard least-squares algorithms.

New in version 0.17.0.

M. A. Branch, T. F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999.

William H. Press et. al., “Numerical Recipes. The Art of Scientific Computing. 3rd edition”, Sec. 5.7.

R. H. Byrd, R. B. Schnabel and G. A. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”, Math. Programming, 40, pp. 247-263, 1988.

A. Curtis, M. J. D. Powell, and J. Reid, “On the estimation of sparse Jacobian matrices”, Journal of the Institute of Mathematics and its Applications, 13, pp. 117-120, 1974.

J. J. More, “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977.

C. Voglis and I. E. Lagaris, “A Rectangular Trust Region Dogleg Approach for Unconstrained and Bound Constrained Nonlinear Optimization”, WSEAS International Conference on Applied Mathematics, Corfu, Greece, 2004.

J. Nocedal and S. J. Wright, “Numerical optimization, 2nd edition”, Chapter 4.

B. Triggs et. al., “Bundle Adjustment - A Modern Synthesis”, Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, pp. 298-372, 1999.

In this example we find a minimum of the Rosenbrock function without bounds on independent variables.

Notice that we only provide the vector of the residuals. The algorithm constructs the cost function as a sum of squares of the residuals, which gives the Rosenbrock function. The exact minimum is at x = [1.0, 1.0] .

We now constrain the variables, in such a way that the previous solution becomes infeasible. Specifically, we require that x[1] >= 1.5 , and x[0] left unconstrained. To this end, we specify the bounds parameter to least_squares in the form bounds=([-np.inf, 1.5], np.inf) .

We also provide the analytic Jacobian:

Putting this all together, we see that the new solution lies on the bound:

Now we solve a system of equations (i.e., the cost function should be zero at a minimum) for a Broyden tridiagonal vector-valued function of 100000 variables:

The corresponding Jacobian matrix is sparse. We tell the algorithm to estimate it by finite differences and provide the sparsity structure of Jacobian to significantly speed up this process.

Let’s also solve a curve fitting problem using robust loss function to take care of outliers in the data. Define the model function as y = a + b * exp(c * t) , where t is a predictor variable, y is an observation and a, b, c are parameters to estimate.

First, define the function which generates the data with noise and outliers, define the model parameters, and generate data:

Define function for computing residuals and initial estimate of parameters.

Compute a standard least-squares solution:

Now compute two solutions with two different robust loss functions. The parameter f_scale is set to 0.1, meaning that inlier residuals should not significantly exceed 0.1 (the noise level used).

And, finally, plot all the curves. We see that by selecting an appropriate loss we can get estimates close to optimal even in the presence of strong outliers. But keep in mind that generally it is recommended to try ‘soft_l1’ or ‘huber’ losses first (if at all necessary) as the other two options may cause difficulties in optimization process.

../../_images/scipy-optimize-least_squares-1_00_00.png

In the next example, we show how complex-valued residual functions of complex variables can be optimized with least_squares() . Consider the following function:

We wrap it into a function of real variables that returns real residuals by simply handling the real and imaginary parts as independent variables:

Thus, instead of the original m-D complex function of n complex variables we optimize a 2m-D real function of 2n real variables:

Free Mathematics Tutorials

Free Mathematics Tutorials

solve a least squares problem

Solve Least Squares Problems by the QR Decomposition

Least square problem.

In many real life problem solving, when a solution x to a system of equations of the form

x with hat

Examples with Solutions

Matrices A and B

Example 2 Use the \( QR \) decomposition to solve the least square problem related to the inconsistent system \( A x = B \) with \( A = \begin{bmatrix} 1 & 0 & 1\\ 2 & 0 & 0 \\ 0 & -1 & 1\\ 0 & -2 & 0 \end{bmatrix} \) and \( B = \begin{bmatrix} 2 \\ 3 \\ 4 \\ 7 \end{bmatrix} \). Solution to Example 2 Given \( A = \begin{bmatrix} 1 & 0 & 1\\ 2 & 0 & 0 \\ 0 & -1 & 1\\ 0 & -2 & 0 \end{bmatrix} \) Use the Gram-Schmidt process to find the orthogonal matrix \( Q \) and decompose matrix \( A \) as \( A = QR \). \( Q = \begin{bmatrix} \dfrac{\sqrt{5}}{5} & 0 & \dfrac{\sqrt{10}}{5}\\ \dfrac{2\sqrt{5}}{5} & 0 & -\dfrac{\sqrt{10}}{10}\\ 0&-\dfrac{\sqrt{5}}{5} & \dfrac{\sqrt{10}}{5}\\ 0&-\dfrac{2\sqrt{5}}{5} & -\dfrac{\sqrt{10}}{10} \end{bmatrix} \) We now calculate matrix \( R \). Multiply both sides of \( A = QR \) by \( Q^T\) where \( Q^T \) is the transpose of \( Q \). \( Q^T A = Q^T Q R \) One of the properties of orthogonal matrices \( Q \) is that \( Q^T Q = I\), hence the above simplifies to \( R = Q^T A \) \( Q^T = \begin{bmatrix} \dfrac{\sqrt{5}}{5}&\dfrac{2\sqrt{5}}{5}&0&0\\ 0&0&-\dfrac{\sqrt{5}}{5}&-\dfrac{2\sqrt{5}}{5}\\ \dfrac{\sqrt{10}}{5}&-\dfrac{\sqrt{10}}{10}&\dfrac{\sqrt{10}}{5}&-\dfrac{\sqrt{10}}{10} \end{bmatrix} \) Calculate \( R \) \( R = Q^T A = \begin{bmatrix} \dfrac{\sqrt{5}}{5}&\dfrac{2\sqrt{5}}{5}&0&0\\ 0&0&-\dfrac{\sqrt{5}}{5}&-\dfrac{2\sqrt{5}}{5}\\ \dfrac{\sqrt{10}}{5}&-\dfrac{\sqrt{10}}{10}&\dfrac{\sqrt{10}}{5}&-\dfrac{\sqrt{10}}{10} \end{bmatrix} \begin{bmatrix} 1 & 0 & 1\\ 2 & 0 & 0 \\ 0 & -1 & 1\\ 0 & -2 & 0 \end{bmatrix} = \begin{bmatrix} \sqrt{5} & 0 & \dfrac{1}{\sqrt{5}}\\ 0 & \sqrt{5} & -\dfrac{1}{\sqrt{5}}\\ 0 & 0 & \dfrac{2 \sqrt 2}{\sqrt{5}} \end{bmatrix} \) Calculate \( Q^T B \) \( Q^T B = \begin{bmatrix} \dfrac{\sqrt{5}}{5}&\dfrac{2\sqrt{5}}{5} & 0 & 0\\ 0 & 0 & -\dfrac{\sqrt{5}}{5} & -\dfrac{2\sqrt{5}}{5}\\ 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \\ 4 \\ 7 \end{bmatrix} = \begin{bmatrix} \frac{8}{\sqrt{5}}\\ -\frac{18}{\sqrt{5}}\\ \sqrt{\frac{2}{5}} \end{bmatrix} \) We now substitute \( R \) and \( Q^T B \) by their numerical values in the equation \( R \hat x = Q^T B \) and write the system \( \begin{bmatrix} \sqrt{5} & 0 & \dfrac{1}{\sqrt{5}}\\ 0 & \sqrt{5} & -\dfrac{1}{\sqrt{5}}\\ 0 & 0 & \dfrac{2}{\sqrt{5}} \end{bmatrix} \cdot \begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix} = \begin{bmatrix} \frac{8}{\sqrt{5}}\\ -\frac{18}{\sqrt{5}}\\ \sqrt{\frac{2}{5}} \end{bmatrix} \) Solve to obtain \( x_1 = \dfrac{3}{2} \) , \( x_2 = \dfrac{-7}{2} \) , \( x_3 = \dfrac{1}{2} \) \( \hat x = \begin{bmatrix} \dfrac{3}{2} \\ \dfrac{-7}{2}\\ \dfrac{1}{2} \end{bmatrix} \)

More References and links

  • Vector Spaces - Questions with Solutions
  • Solve Least Squares Problems by the Normal Equations .
  • Linear Algebra and its Applications - 5 th Edition - David C. Lay , Steven R. Lay , Judi J. McDonald
  • Elementary Linear Algebra - 7 th Edition - Howard Anton and Chris Rorres

Popular Pages

© analyzemath.com. All rights reserved.

  • Privacy Policy

Help Center Help Center

  • Help Center
  • Trial Software
  • Product Updates
  • Documentation

Solve constrained linear least-squares problems

Description

Linear least-squares solver with bounds or linear constraints.

Solves least-squares curve fitting problems of the form

min x 1 2 ‖ C ⋅ x − d ‖ 2 2  such that  { A ⋅ x ≤ b , A e q ⋅ x = b e q , l b ≤ x ≤ u b .

lsqlin applies only to the solver-based approach. For a discussion of the two optimization approaches, see First Choose Problem-Based or Solver-Based Approach .

x = lsqlin( C , d , A , b ) solves the linear system C*x = d in the least-squares sense, subject to A*x  ≤  b .

x = lsqlin( C , d , A , b , Aeq , beq , lb , ub ) adds linear equality constraints Aeq*x = beq and bounds lb  ≤  x  ≤  ub . If you do not need certain constraints such as Aeq and beq , set them to [] . If x(i) is unbounded below, set lb(i) = -Inf , and if x(i) is unbounded above, set ub(i) = Inf .

x = lsqlin( C , d , A , b , Aeq , beq , lb , ub , x0 , options ) minimizes with an initial point x0 and the optimization options specified in options . Use optimoptions to set these options. If you do not want to include an initial point, set the x0 argument to [] ; however, a nonempty x0 is required for the 'active-set' algorithm.

x = lsqlin( problem ) finds the minimum for problem , a structure described in problem . Create the problem structure using dot notation or the struct function. Or create a problem structure from an OptimizationProblem object by using prob2struct .

[ x , resnorm , residual , exitflag , output , lambda ] = lsqlin( ___ ) , for any input arguments described above, returns:

The squared 2-norm of the residual resnorm =  ‖ C ⋅ x − d ‖ 2 2

The residual residual = C*x - d

A value exitflag describing the exit condition

A structure output containing information about the optimization process

A structure lambda containing the Lagrange multipliers

The factor ½ in the definition of the problem affects the values in the lambda structure.

[ wsout , resnorm , residual , exitflag , output , lambda ] = lsqlin( C , d , A , b , Aeq , beq , lb , ub , ws ) starts lsqlin from the data in the warm start object ws , using the options in ws . The returned argument wsout contains the solution point in wsout.X . By using wsout as the initial warm start object in a subsequent solver call, lsqlin can work faster.

collapse all

Least Squares with Linear Inequality Constraints

Find the x that minimizes the norm of C*x - d for an overdetermined problem with linear inequality constraints.

Specify the problem and constraints.

Call lsqlin to solve the problem.

Least Squares with Linear Constraints and Bounds

Find the x that minimizes the norm of C*x - d for an overdetermined problem with linear equality and inequality constraints and bounds.

Linear Least Squares with Nondefault Options

This example shows how to use nondefault options for linear least squares.

Set options to use the 'interior-point' algorithm and to give iterative display.

Set up a linear least-squares problem.

Run the problem.

Return All Outputs

Obtain and interpret all lsqlin outputs.

Define a problem with linear inequality constraints and bounds. The problem is overdetermined because there are four columns in the C matrix but five rows. This means the problem has four unknowns and five conditions, even before including the linear constraints and bounds.

Set options to use the 'interior-point' algorithm.

The 'interior-point' algorithm does not use an initial point, so set x0 to [] .

Call lsqlin with all outputs.

Examine the nonzero Lagrange multiplier fields in more detail. First examine the Lagrange multipliers for the linear inequality constraint.

Lagrange multipliers are nonzero exactly when the solution is on the corresponding constraint boundary. In other words, Lagrange multipliers are nonzero when the corresponding constraint is active. lambda.ineqlin(2) is nonzero. This means that the second element in A*x should equal the second element in b , because the constraint is active.

Now examine the Lagrange multipliers for the lower and upper bound constraints.

The first two elements of lambda.lower are nonzero. You see that x(1) and x(2) are at their lower bounds, -0.1 . All elements of lambda.upper are essentially zero, and you see that all components of x are less than their upper bound, 2 .

Return Warm Start Object

Create a warm start object so you can solve a modified problem quickly. Set options to turn off iterative display to support warm start.

Create and solve the first problem. Find the solution time.

Add a linear constraint and solve again.

Input Arguments

C — multiplier matrix real matrix.

Multiplier matrix, specified as a matrix of doubles. C represents the multiplier of the solution x in the expression C*x - d . C is M -by- N , where M is the number of equations, and N is the number of elements of x .

Example: C = [1,4;2,5;7,8]

Data Types: double

d — Constant vector real vector

Constant vector, specified as a vector of doubles. d represents the additive constant term in the expression C*x - d . d is M -by- 1 , where M is the number of equations.

Example: d = [5;0;-12]

A — Linear inequality constraints real matrix

Linear inequality constraints, specified as a real matrix. A is an M -by- N matrix, where M is the number of inequalities, and N is the number of variables (number of elements in x0 ). For large problems, pass A as a sparse matrix.

A encodes the M linear inequalities

A*x <= b ,

where x is the column vector of N variables x(:) , and b is a column vector with M elements.

For example, consider these inequalities:

x 1 + 2 x 2 ≤ 10 3 x 1 + 4 x 2 ≤ 20 5 x 1 + 6 x 2 ≤ 30,

Specify the inequalities by entering the following constraints.

Example: To specify that the x components sum to 1 or less, use A = ones(1,N) and b = 1 .

b — Linear inequality constraints real vector

Linear inequality constraints, specified as a real vector. b is an M -element vector related to the A matrix. If you pass b as a row vector, solvers internally convert b to the column vector b(:) . For large problems, pass b as a sparse vector.

b encodes the M linear inequalities

where x is the column vector of N variables x(:) , and A is a matrix of size M -by- N .

x 1 + 2 x 2 ≤ 10 3 x 1 + 4 x 2 ≤ 20 5 x 1 + 6 x 2 ≤ 30.

Aeq — Linear equality constraints real matrix

Linear equality constraints, specified as a real matrix. Aeq is an Me -by- N matrix, where Me is the number of equalities, and N is the number of variables (number of elements in x0 ). For large problems, pass Aeq as a sparse matrix.

Aeq encodes the Me linear equalities

Aeq*x = beq ,

where x is the column vector of N variables x(:) , and beq is a column vector with Me elements.

x 1 + 2 x 2 + 3 x 3 = 10 2 x 1 + 4 x 2 + x 3 = 20,

Example: To specify that the x components sum to 1, use Aeq = ones(1,N) and beq = 1 .

beq — Linear equality constraints real vector

Linear equality constraints, specified as a real vector. beq is an Me -element vector related to the Aeq matrix. If you pass beq as a row vector, solvers internally convert beq to the column vector beq(:) . For large problems, pass beq as a sparse vector.

beq encodes the Me linear equalities

where x is the column vector of N variables x(:) , and Aeq is a matrix of size Me -by- N .

For example, consider these equalities:

x 1 + 2 x 2 + 3 x 3 = 10 2 x 1 + 4 x 2 + x 3 = 20.

Specify the equalities by entering the following constraints.

lb — Lower bounds [] (default) | real vector or array

Lower bounds, specified as a vector or array of doubles. lb represents the lower bounds elementwise in lb  ≤  x  ≤  ub .

Internally, lsqlin converts an array lb to the vector lb(:) .

Example: lb = [0;-Inf;4] means x(1) ≥ 0 , x(3) ≥ 4 .

ub — Upper bounds [] (default) | real vector or array

Upper bounds, specified as a vector or array of doubles. ub represents the upper bounds elementwise in lb  ≤  x  ≤  ub .

Internally, lsqlin converts an array ub to the vector ub(:) .

Example: ub = [Inf;4;10] means x(2) ≤ 4 , x(3) ≤ 10 .

x0 — Initial point zeros(M,1) where M is the number of equations (default) | real vector or array

Initial point for the solution process, specified as a real vector or array.

The 'interior-point' algorithm does not use x0 .

x0 is an optional argument for the 'trust-region-reflective' algorithm. By default, this algorithm sets x0 to an all-zero vector. If any of these components violate the bound constraints, lsqlin resets the entire default x0 to a vector that satisfies the bounds. If any components of a nondefault x0 violates the bounds, lsqlin sets those components to satisfy the bounds.

A nonempty x0 is a required argument for the 'active-set' algorithm. If any components of x0 violates the bounds, lsqlin sets those components to satisfy the bounds.

Example: x0 = [4;-3]

options — Options for lsqlin options created using optimoptions | structure such as created by optimset

Options for lsqlin , specified as the output of the optimoptions function or as a structure such as created by optimset .

Some options are absent from the optimoptions display. These options appear in italics in the following table. For details, see View Optimization Options .

All Algorithms

trust-region-reflective Algorithm Options

interior-point Algorithm Options

'active-set' Algorithm Options

problem — Optimization problem structure

Optimization problem, specified as a structure with the following fields.

You cannot use warm start with the problem argument.

Data Types: struct

ws — Warm start object object created using optimwarmstart

Warm start object, specified as an object created using optimwarmstart . The warm start object contains the start point and options, and optional data for memory size in code generation. See Warm Start Best Practices .

Example: ws = optimwarmstart(x0,options)

Output Arguments

X — solution real vector.

Solution, returned as a vector that minimizes the norm of C*x-d subject to all bounds and linear constraints.

wsout — Solution warm start object LsqlinWarmStart object

Solution warm start object, returned as a LsqlinWarmStart object. The solution point is wsout.X .

You can use wsout as the input warm start object in a subsequent lsqlin call.

resnorm — Objective value real scalar

Objective value, returned as the scalar value norm(C*x-d)^2 .

residual — Solution residuals real vector

Solution residuals, returned as the vector C*x-d .

exitflag — Algorithm stopping condition integer

Algorithm stopping condition, returned as an integer identifying the reason the algorithm stopped. The following lists the values of exitflag and the corresponding reasons lsqlin stopped.

The exit message for the interior-point algorithm can give more details on the reason lsqlin stopped, such as exceeding a tolerance. See Exit Flags and Exit Messages .

output — Solution process summary structure

Solution process summary, returned as a structure containing information about the optimization process.

See Output Structures .

lambda — Lagrange multipliers structure

Lagrange multipliers, returned as a structure with the following fields.

See Lagrange Multiplier Structures .

For problems with no constraints, you can use mldivide (matrix left division). When you have no constraints, lsqlin returns x = C\d .

Because the problem being solved is always convex, lsqlin finds a global, although not necessarily unique, solution.

If your problem has many linear constraints and few variables, try using the 'active-set' algorithm. See Quadratic Programming with Many Linear Constraints .

Better numerical results are likely if you specify equalities explicitly, using Aeq and beq , instead of implicitly, using lb and ub .

The trust-region-reflective algorithm does not allow equal upper and lower bounds. Use another algorithm for this case.

If the specified input bounds for a problem are inconsistent, the output x is x0 and the outputs resnorm and residual are [] .

You can solve some large structured problems, including those where the C matrix is too large to fit in memory, using the trust-region-reflective algorithm with a Jacobian multiply function. For information, see trust-region-reflective Algorithm Options .

Trust-Region-Reflective Algorithm

This method is a subspace trust-region method based on the interior-reflective Newton method described in [1] . Each iteration involves the approximate solution of a large linear system using the method of preconditioned conjugate gradients (PCG). See Trust-Region-Reflective Least Squares , and in particular Large Scale Linear Least Squares .

Interior-Point Algorithm

The 'interior-point' algorithm is based on the quadprog 'interior-point-convex' algorithm. See Linear Least Squares: Interior-Point or Active-Set .

Active-Set Algorithm

The 'active-set' algorithm is based on the quadprog 'active-set' algorithm. For more information, see Linear Least Squares: Interior-Point or Active-Set and active-set quadprog Algorithm .

[1] Coleman, T. F. and Y. Li. “A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables,” SIAM Journal on Optimization , Vol. 6, Number 4, pp. 1040–1058, 1996.

[2] Gill, P. E., W. Murray, and M. H. Wright. Practical Optimization , Academic Press, London, UK, 1981.

A warm start object maintains a list of active constraints from the previous solved problem. The solver carries over as much active constraint information as possible to solve the current problem. If the previous problem is too different from the current one, no active set information is reused. In this case, the solver effectively executes a cold start in order to rebuild the list of active constraints.

Alternative Functionality

The Optimize Live Editor task provides a visual interface for lsqlin .

Extended Capabilities

C/c++ code generation generate c and c++ code using matlab® coder™..

Usage notes and limitations:

lsqlin supports code generation using either the codegen (MATLAB Coder) function or the MATLAB ® Coder™ app. You must have a MATLAB Coder license to generate code.

The target hardware must support standard double-precision floating-point computations. You cannot generate code for single-precision or fixed-point computations.

Code generation targets do not use the same math kernel libraries as MATLAB solvers. Therefore, code generation solutions can vary from solver solutions, especially for poorly conditioned problems.

When solving unconstrained and underdetermined problems in MATLAB, lsqlin calls mldivide , which returns a basic solution. In code generation, the returned solution has minimum norm, which usually differs.

lsqlin does not support the problem argument for code generation.

All lsqlin input matrices such as A , Aeq , lb , and ub must be full, not sparse. You can convert sparse matrices to full by using the full function.

The lb and ub arguments must have the same number of entries as the number of columns in C or must be empty [] .

If your target hardware does not support infinite bounds, use optim.coder.infbound .

For advanced code optimization involving embedded processors, you also need an Embedded Coder ® license.

You must include options for lsqlin and specify them using optimoptions . The options must include the Algorithm option, set to 'active-set' .

Code generation supports these options:

Algorithm — Must be 'active-set'

ConstraintTolerance

MaxIterations

ObjectiveLimit

OptimalityTolerance

StepTolerance

Generated code has limited error checking for options. The recommended way to update an option is to use optimoptions , not dot notation.

Do not load options from a file. Doing so can cause code generation to fail. Instead, create options in your code.

If you specify an option that is not supported, the option is typically ignored during code generation. For reliable results, specify only supported options.

Version History

Introduced before R2006a

mldivide | lsqnonneg | quadprog | Optimize | optimwarmstart

  • Nonnegative Linear Least Squares, Solver-Based
  • Optimize Live Editor Task with lsqlin Solver
  • Jacobian Multiply Function with Linear Least Squares
  • Warm Start Best Practices
  • Least-Squares (Model Fitting) Algorithms

Open Example

You have a modified version of this example. Do you want to open this example with your edits?

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

  • América Latina (Español)
  • Canada (English)
  • United States (English)
  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)

Contact your local office

Veteran economists say a carbon levy would cut emissions, cut inflation and raise billions, but see little prospect of adoption

silhouetted coal mining digger next to a dump truck at a coal mine

Two of the nation's most respected economists have put forward a bold plan they say can lower global carbon emissions by at least 6 per cent, super-charge a new green export industry for Australia, deliver much cheaper power bills and even dramatically cut the rate of inflation.

They also frankly admit their ambitious blueprint, due to be unveiled at the National Press Club today, is likely to be immediately shot down by the major political parties.

Professor Rod Sims and Professor Ross Garnaut have spent the past six months working on what they're calling a pathway to prosperity for Australia in a net-zero world. Their work is for The Superpower Institute, a think tank focused on the economic opportunities of climate change, founded by Professor Garnaut and now chaired by Professor Sims.

Under their plan, a "Carbon Solutions Levy" would be imposed on all fossil fuel extraction sites (around 105 coal and gas projects) and all fossil fuel imports (oil and diesel). The levy would be in place by 2030-31, set at the level of the European Union's carbon price (currently around $90 per tonne of emissions).

The pair suggest the levy would raise well over $100 billion a year, declining slowly as fossil fuels exit the market. That enormous pool of funds would then be used to over-compensate households, slashing their electricity bills by $440 a year and removing all petrol and diesel excise. They estimate this would significantly lower the Consumer Price Index by more than 1.5 percentage points.

"This is the mother of all cost-of-living events", Professor Sims told the ABC, highlighting the benefit to households. "Everyone is a winner except those fossil fuel companies", which he frankly admits "will hate it".

A headshot of Rod Sims. He wears a suit.

The sizeable funds raised by the levy could also be used to repair the budget and fund further tax cuts.

The other key pillar of the plan is to set aside at least $6 billion a year from the funds raised for a new "Superpower Innovation Incentive Scheme".

This scheme would help the first few early movers in green exports get off the ground, with grants covering up to 50 per cent of their capital costs.

The vision is for Australia to become a major manufacturer and exporter of metals and fuels made with renewable energy, including green iron, green aluminium, green polysilicon, green transport fuels and green urea.

Zero and low-emissions industry would be powered by large-scale solar and wind farms in remote areas including the Pilbara.

"We are consumed by how to reduce carbon emissions here", says Professor Sims, arguing the greater focus should be on reducing global emissions, which could be 6-9 per cent lower if Australia were to adopt this plan.

The Superpower Institute is less focused on the ongoing battles over lowering domestic emissions. It's more concerned Australia may miss out on the enormous opportunities in the great global transition to a net-zero future.

Europe has already legislated a Carbon Border Adjustment Mechanism (CBAM), which from 2026 will impose a levy on imports of carbon intensive products including aluminium, iron, steel, cement, and fertilisers.

Professor Sims says this means "green iron will be more competitive than black iron" in the European market from 2026.

He argues China will also need to import "green iron", due to its limited capacity to transition its own industry.

Ross Garnaut leans on a dead tree.

Professor Sims insists the plan being unveiled today is "not a Carbon Tax or an Emissions Trading Scheme". The Carbon Solutions Levy being proposed is confined to fossil fuel companies.

Nonetheless, the plan's authors acknowledge the levy will be branded a tax and will be initially rejected by the major parties, given the bruising history of climate and energy politics in Australia. Even if the scale of ambition in the plan was halved, they doubt either side of politics would embrace the idea.

"It doesn't matter", says Professor Sims. "This is a long-term game. The debate starts now. Every policy idea that involves complex economics takes years to permeate through consciousness".

But the longer Australia waits, the two economists argue, the more opportunities will slip away.

  • X (formerly Twitter)
  • Federal Government
  • Government and Politics

NTRS - NASA Technical Reports Server

Available downloads, related records.

IMAGES

  1. solving linear least squares problems

    solve a least squares problem

  2. The General Solution of the Least Squares Problem

    solve a least squares problem

  3. Solving the Least-Squares Problem Using Geometry

    solve a least squares problem

  4. Linear Algebra- Finding the Least Squares Solution to a System

    solve a least squares problem

  5. Solved a. Solve the following least squares problem using

    solve a least squares problem

  6. Solved Solve the least squares problem Ax = b where A=(2 1

    solve a least squares problem

VIDEO

  1. Mat433,numerical analysis 2,ch1 Discrete Least Squares Approximation L1

  2. NUMERICAL METHODS LEAST SQUARE APPROXIMATION

  3. A Nice Algebra Squares Problem

  4. 6.5 Least-Squares--What is a Least-Squares Problem? (Video 2)

  5. Least Square Method in Regression Analysis

  6. Least Squares Solutions to Linear Systems . (Part 1)

COMMENTS

  1. The Method of Least Squares

    In other words, a least-squares solution solves the equation Ax = b as closely as possible, in the sense that the sum of the squares of the difference b − Ax is minimized. Least Squares: Picture Suppose that the equation Ax = b is inconsistent.

  2. 6.5: The Method of Least Squares

    Recipe 1: Compute a Least-Squares Solution. Let A be an m × n matrix and let b be a vector in Rn. Here is a method for computing a least-squares solution of Ax = b: Compute the matrix ATA and the vector ATb. Form the augmented matrix for the matrix equation ATAx = ATb, and row reduce.

  3. Least squares examples (video)

    Which is just 6, 1, 1, 6 times my least squares solution-- so this is actually going to be in the column space of A --is equal to A transpose times B, which is just the vector 9 4. And this'll be a little bit more straightforward to find a solution for. In fact, there will be a solution. We proved it in the last video.

  4. Least squares

    The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation. The most important application is in data fitting.

  5. Solving Least Squares Problems

    Richard J. Hanson Book Series Textbook Adoption FAQ Home Classics in Applied Mathematics Solving Least Squares Problems Description An accessible text for the study of numerical methods for solving least squares problems remains an essential part of a scientific software foundation. This book has served this purpose well.

  6. PDF Chapter 5 Least Squares

    The term least squares describes a frequently used approach to solving overdeter-mined or inexactly specified systems of equations in an approximate sense. Instead of solving the equations exactly, we seek only to minimize the sum of the squares of the residuals. The least squares criterion has important statistical interpretations.

  7. 9. Four Ways to Solve Least Squares Problems

    MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018Instructor: Gilbert StrangView the complete course: https://o...

  8. Least Square Method

    The least-squares method is a statistical method used to find the line of best fit of the form of an equation such as y = mx + b to the given data. The curve of the equation is called the regression line. Our main objective in this method is to reduce the sum of the squares of errors as much as possible.

  9. PDF Least Squares

    least-squares solutions. In most situations we will encounter there is just one least-squares solution. From a real-world standpoint this is because we typically use least-squares for overdetermined systems (more equations than unknowns) which yields a matrix equation in which the matrix has more rows than columns. This typically results

  10. PDF 5 Least Squares Problems

    5 Least Squares Problems Consider the solution of Ax = b, where A ∈ Cm×n with m > n. In general, this system is overdetermined and no exact solution is possible. Example Fit a straight line to 10 measurements.

  11. PDF Numerically Efficient Methods for Solving Least Squares Problems

    1. Introduction: The Least Squares Problem 2 2. Existence and Uniqueness 2 2.1. Least Squares Solution from Normal Equations 3 3. Norms and Conditioning 4 3.1. Norms: Quantifying Error and Distance 4 3.2. Sensitivity and Conditioning: Perturbations to b 5 3.3.

  12. Lecture 9: Four Ways to Solve Least Squares Problems

    Solving least-squares problems comes in to play in the many applications that rely on data fitting. Summary Solve A T A x = A T b to minimize ‖ A x − b ‖ 2 Gram-Schmidt A = Q R leads to x = R − 1 Q T b. The pseudoinverse directly multiplies b to give x. The best x is the limit of ( A T A + δ I) − 1 A T b as δ → 0. Related section in textbook: II.2

  13. [PDF] Solving least squares problems

    Solving Least Squares Problems via the QR factorization. The purpose of this note is to explain the most powerful and widely used technique for solving the sum of squares problem, and ends with a pointer to software that presents the most sophisticated implementation of the mathematical ideas described here.

  14. scipy.optimize.least_squares

    funcallable Function which computes the vector of residuals, with the signature fun (x, *args, **kwargs), i.e., the minimization proceeds with respect to its first argument. The argument x passed to this function is an ndarray of shape (n,) (never a scalar, even for n=1). It must allocate and return a 1-D array_like of shape (m,) or a scalar.

  15. Solve system of linear equations

    Description example x = lsqr (A,b) attempts to solve the system of linear equations A*x = b for x using the Least Squares Method . lsqr finds a least squares solution for x that minimizes norm (b-A*x). When A is consistent, the least squares solution is also a solution of the linear system.

  16. PDF MATH 3795 Lecture 9. Linear Least Squares. Using SVD Decomposition

    such that A = U V T. = minfm;ng = 0. The decomposition. = U V T. is called Singular Value Decomposition (SVD). It is very important decomposition of a matrix and tells us a lot about its structure. It can be computed using the Matlab command svd. The diagonal entries i of are called the singular values of A. The columns of U are called left ...

  17. Solve Least Squares Problems by the Normal Equations

    The least squares problems is to find an approximate solution such that the distance between the vectors A x and B given by is the smallest. It is shown in Linear Algebra and its Applications that the approximate solution is given by the normal equation where AT is the transpose of matrix A . Examples with Solutions Example 1

  18. How does the SVD solve the least squares problem?

    How to solve this least square problem effectively? 4. Exact solution of overdetermined linear system. 4. Is the pseudoinverse matrix the solution to the least squares problem? See more linked questions. Related. 2. Tikhonov regularized least-squares problem with non-negative constraints. 30.

  19. Solve Least Squares Problems by the QR Decomposition

    decomposition to solve the least square problem related to the inconsistent system Ax = B decomposition with Solution to Example 1 the Gram-Schmidt process to find the orthogonal matrix and decompose matrix A = QR \ ( \) \ ( \) \ ( \) We now calculate matrix \ ( R \).

  20. Solving Least Squares Problems

    An accessible text for the study of numerical methods for solving least squares problems remains an essential part of a scientific software foundation. This book has served this purpose well. Numerical analysts, statisticians, and engineers have developed techniques and nomenclature for the least squares problems of their own discipline. ...

  21. Solve constrained linear least-squares problems

    Solves least-squares curve fitting problems of the form min x 1 2 ‖ C ⋅ x − d ‖ 2 2 such that { A ⋅ x ≤ b, A e q ⋅ x = b e q, l b ≤ x ≤ u b. Note lsqlin applies only to the solver-based approach. For a discussion of the two optimization approaches, see First Choose Problem-Based or Solver-Based Approach. example

  22. PDF Lecture 17-18: Least Squares Optimization

    This very general form is generally broken down into two subsets: linear least-squares problems (Section18.3)andnonlinearleast-squares(Section18.4). ... the LM method is effectively solving the GN problem with an additional weighted diagonal component in the linear least-squares step. When this weighted diagonal is substantial (i.e. ˛1) the ...

  23. 'They will hate it': Two economists say a carbon levy could solve

    Two of the nation's most respected economists have put forward a bold plan they say can lower global carbon emissions by at least 6 per cent, super-charge a new green export industry for Australia ...

  24. Least Squares Reverse Time Migration (LSRTM) for Damage Imaging in

    A method for adapting least squares reverse time migration (LSRTM) for ultrasonic guided wave imaging of composite laminates is proposed in this paper. As composites become more widely used in fields such as the aerospace industry, the need for high-resolution imaging in structural health monitoring (SHM) and nondestructive evaluation (NDE) is also growing.

  25. SoftBank's $118 Billion Arm Problem

    The Japanese firm may have limited time to solve the challenge presented by its stake in the thriving British chip designer.