linear rergression time complexity calculation · 2021. 3. 21. · for t iterations, what is the...
TRANSCRIPT
![Page 1: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/1.jpg)
Linear Rergression Time Complexity
Calculation
Nipun Batra
January 30, 2020
IIT Gandhinagar
![Page 2: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/2.jpg)
Normal Equation
• Consider X ∈ RN×D
• N examples and D dimensions
• What is the time complexity of solving the normal equation
θ̂ = (XTX )−1XT y?
1
![Page 3: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/3.jpg)
Normal Equation
• Consider X ∈ RN×D
• N examples and D dimensions
• What is the time complexity of solving the normal equation
θ̂ = (XTX )−1XT y?
1
![Page 4: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/4.jpg)
Normal Equation
• Consider X ∈ RN×D
• N examples and D dimensions
• What is the time complexity of solving the normal equation
θ̂ = (XTX )−1XT y?
1
![Page 5: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/5.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 6: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/6.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 7: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/7.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 8: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/8.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 9: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/9.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 10: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/10.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 11: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/11.jpg)
Normal Equation
• X has dimensions N × D, XT has dimensions D × N
• XTX is a matrix product of matrices of size: D × N and
N × D, which is O(D2N)
• Inversion of XTX is an inversion of a D × D matrix, which is
O(D3)
• XT y is a matrix vector product of size D × N and N × 1,
which is O(DN)
• (XTX )−1XT y is a matrix product of a D × D matrix and
D × 1 matrix, which is O(D2)
• Overall complexity: O(D2N) + O(D3) + O(DN) + O(D2)
= O(D2N) + O(D3)
• Scales cubic in the number of columns/features of X
2
![Page 12: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/12.jpg)
Gradient Descent
Start with random values of θ0 and θ1
Till convergence
• θ0 = θ0 − α∂
∂θ0(∑ε2i )
• θ1 = θ1 − α∂
∂θ1(∑ε2i )
• Question: Can you write the above for D dimensional data in
vectorised form?
• θ0 = θ0 − α ∂∂θ0
(y − Xθ)> (y − Xθ)
θ1 = θ1 − α ∂∂θ1
(y − Xθ)> (y − Xθ)...
θD = θD − α ∂∂θD
(y − Xθ)> (y − Xθ)
• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)
3
![Page 13: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/13.jpg)
Gradient Descent
Start with random values of θ0 and θ1
Till convergence
• θ0 = θ0 − α∂
∂θ0(∑ε2i )
• θ1 = θ1 − α∂
∂θ1(∑ε2i )
• Question: Can you write the above for D dimensional data in
vectorised form?
• θ0 = θ0 − α ∂∂θ0
(y − Xθ)> (y − Xθ)
θ1 = θ1 − α ∂∂θ1
(y − Xθ)> (y − Xθ)...
θD = θD − α ∂∂θD
(y − Xθ)> (y − Xθ)
• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)
3
![Page 14: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/14.jpg)
Gradient Descent
Start with random values of θ0 and θ1
Till convergence
• θ0 = θ0 − α∂
∂θ0(∑ε2i )
• θ1 = θ1 − α∂
∂θ1(∑ε2i )
• Question: Can you write the above for D dimensional data in
vectorised form?
• θ0 = θ0 − α ∂∂θ0
(y − Xθ)> (y − Xθ)
θ1 = θ1 − α ∂∂θ1
(y − Xθ)> (y − Xθ)...
θD = θD − α ∂∂θD
(y − Xθ)> (y − Xθ)
• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)
3
![Page 15: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/15.jpg)
Gradient Descent
Start with random values of θ0 and θ1
Till convergence
• θ0 = θ0 − α∂
∂θ0(∑ε2i )
• θ1 = θ1 − α∂
∂θ1(∑ε2i )
• Question: Can you write the above for D dimensional data in
vectorised form?
• θ0 = θ0 − α ∂∂θ0
(y − Xθ)> (y − Xθ)
θ1 = θ1 − α ∂∂θ1
(y − Xθ)> (y − Xθ)...
θD = θD − α ∂∂θD
(y − Xθ)> (y − Xθ)
• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)
3
![Page 16: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/16.jpg)
Gradient Descent
Start with random values of θ0 and θ1
Till convergence
• θ0 = θ0 − α∂
∂θ0(∑ε2i )
• θ1 = θ1 − α∂
∂θ1(∑ε2i )
• Question: Can you write the above for D dimensional data in
vectorised form?
• θ0 = θ0 − α ∂∂θ0
(y − Xθ)> (y − Xθ)
θ1 = θ1 − α ∂∂θ1
(y − Xθ)> (y − Xθ)...
θD = θD − α ∂∂θD
(y − Xθ)> (y − Xθ)
• θ = θ − α ∂∂θ (y − Xθ)> (y − Xθ)
3
![Page 17: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/17.jpg)
Gradient Descent
∂∂θ (y − Xθ)>(y − Xθ)
= ∂∂θ
(y> − θ>X>
)(y − Xθ)
= ∂∂θ
(y>y − θ>X>y − y>xθ + θ>X>Xθ
)= −2X>y + 2X>xθ
= 2X>(Xθ − y)
4
![Page 18: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/18.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 19: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/19.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 20: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/20.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 21: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/21.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 22: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/22.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 23: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/23.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once!
5
![Page 24: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/24.jpg)
Gradient Descent
We can write the vectorised update equation as follows, for each
iteration
θ = θ − αX>(Xθ − y)
For t iterations, what is the computational complexity of our
gradient descent solution?
Hint, rewrite the above as: θ = θ − αX>Xθ + αX>y
Complexity of computing X>y is O(DN)
Complexity of computing αX>y once we have X>y is O(D) since
X>y has D entries
Complexity of computing X>X is O(D2N) and then multiplying
with α is O(D2)
All of the above need only be calculated once! 5
![Page 25: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/25.jpg)
Gradient Descent
For each of the t iterations, we now need to first multiply αX>X
with θ which is matrix multiplication of a D × D matrix with a
D × 1, which is O(D2)
The remaining subtraction/addition can be done in O(D) for each
iteration.
What is overall computational complexity?
O(tD2) + O(D2N) = O((t + N)D2)
6
![Page 26: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/26.jpg)
Gradient Descent
For each of the t iterations, we now need to first multiply αX>X
with θ which is matrix multiplication of a D × D matrix with a
D × 1, which is O(D2)
The remaining subtraction/addition can be done in O(D) for each
iteration.
What is overall computational complexity?
O(tD2) + O(D2N) = O((t + N)D2)
6
![Page 27: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/27.jpg)
Gradient Descent
For each of the t iterations, we now need to first multiply αX>X
with θ which is matrix multiplication of a D × D matrix with a
D × 1, which is O(D2)
The remaining subtraction/addition can be done in O(D) for each
iteration.
What is overall computational complexity?
O(tD2) + O(D2N) = O((t + N)D2)
6
![Page 28: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/28.jpg)
Gradient Descent
For each of the t iterations, we now need to first multiply αX>X
with θ which is matrix multiplication of a D × D matrix with a
D × 1, which is O(D2)
The remaining subtraction/addition can be done in O(D) for each
iteration.
What is overall computational complexity?
O(tD2) + O(D2N) = O((t + N)D2)
6
![Page 29: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/29.jpg)
Gradient Descent
For each of the t iterations, we now need to first multiply αX>X
with θ which is matrix multiplication of a D × D matrix with a
D × 1, which is O(D2)
The remaining subtraction/addition can be done in O(D) for each
iteration.
What is overall computational complexity?
O(tD2) + O(D2N) = O((t + N)D2)
6
![Page 30: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/30.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 31: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/31.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 32: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/32.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 33: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/33.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 34: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/34.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 35: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/35.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 36: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/36.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 37: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/37.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7
![Page 38: Linear Rergression Time Complexity Calculation · 2021. 3. 21. · For t iterations, what is the computational complexity of our gradient descent solution? Hint, rewrite the above](https://reader036.vdocuments.net/reader036/viewer/2022070217/6120e974939cc564037764a3/html5/thumbnails/38.jpg)
Gradient Descent (Alternative)
If we do not rewrite the expression θ = θ − αX>(Xθ − y)
For each iteration, we have:
• Computing Xθ is O(ND)
• Computing Xθ − y is O(N)
• Computing αX> is O(ND)
• Computing αX>(Xθ − y) is O(ND)
• Computing θ = θ − αX>(Xθ − y) is O(N)
What is overall computational complexity?
O(NDt)
7