fbpx

Rosenbrock Function Minimization

0

Introduction

In this article, we present different approaches on minimizing the Rosenbrock function, namely thru Newton, Damped Newton and Steepest Descent.

The Rosenbrock Function

Rosenbrock’s function is given as

(1)   \begin{equation*}f(x_1,x_2) = b(x_2 - x_1^2) ^2 + (a-x_1)^2\end{equation*}


where in this article we consider the following example:

(2)   \begin{equation*}f(x_1,x_2) = 100(x_2 - x_1^2) ^2 + (1-x_1)^2\end{equation*}



Since all the methods we will use are gradient-aware and/or Hessian-aware, then we will spend some time in computing a closed form formula of the gradient and Hessian. Let us now start with the gradient of f, that is

(3)   \begin{equation*}\bigtriangledown f(x_1,x_2)=\left( \begin{array}{c}\frac{\partial}{\partial x_1} f(x_1,x_2) \\\frac{\partial}{\partial x_2} f(x_1,x_2)\end{array}\right)=\left( \begin{array}{c}-400x_1(x_2 - x_1^2) - 2 (1-x_1) \\200(x_2 - x_1^2)\end{array}\right)\end{equation*}


and let

(4)   \begin{equation*} \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \end{equation*}


Also let’s compute the Hessian matrix

(5)   \begin{equation*}H = \begin{bmatrix}\dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\,\partial x_2} \\\dfrac{\partial^2 f}{\partial x_2\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2}\end{bmatrix}.\end{equation*}


Therefore, we can compute the four double derivatives gives

(6)   \begin{equation*}\begin{split}\dfrac{\partial^2 f}{\partial x_1^2}&=-400(x_2 - x_1^2) + 800x_1^2 + 2 \\\dfrac{\partial^2 f}{\partial x_2^2}&=200\\\dfrac{\partial^2 f}{\partial x_1\,\partial x_2}&=-400x_1\\\dfrac{\partial^2 f}{\partial x_2\,\partial x_1}&=-400x_1\end{split}\end{equation*}


Minimization with Numerical Methods

Steepest Descent

Steepest descent works as follows

(7)   \begin{equation*}\mathbf{x}_k = \mathbf{x}_{k-1} - \alpha \bigtriangledown f({x}_{k-1} )\end{equation*}


where \alpha is a given parameter, called \textit{step-size}. So, the iteration works as follows

    \begin{equation*}\left( \begin{bmatrix}x_1^{(k)} \\x_2^{(k)}\end{bmatrix}\right)=\left( \begin{<meta charset="utf-8">bmatrix}x_1^{(k-1)} \\x_2^{(k-1)}\end{<meta charset="utf-8">bmatrix}\right)-\alpha\left( \begin{<meta charset="utf-8">bmatrix}-400x_1(x_2^{(k-1)} - (x_1^{(k-1)})^2) - 2 (1-x_1^{(k-1)}) \\200(x_2^{(k-1)} - ( x_1^{(k-1)})^2)\end{<meta charset="utf-8">bmatrix}\right)\end{equation*}


where x_1^{(k)} ,x_2^{(k)} are the values of x_1,x_2 at iteration k.

Newton Method

Newton is a modified steepest descent working on Hessian’s instead as follows

(8)   \begin{equation*}\mathbf{x}_k = \mathbf{x}_{k-1} - \mathbf{H}^{-1}(\mathbf{x}^{(k-1}) \bigtriangledown f({x}_{k-1} )\end{equation*}

Damped Newton Method

In contrast to Newton, Damped Newton method adds a damping factor as

(9)   \begin{equation*}\mathbf{x}_k = \mathbf{x}_{k-1} - \underbrace{\alpha}_{\text{Damping factor}}\mathbf{H}^{-1}(\mathbf{x}^{(k-1}) \bigtriangledown f({x}_{k-1} )\end{equation*}



MATLAB implementation

We will implement the three methods above as

function[x] =  steepdescent(f,df,x0,alpha,Niter)
x(:,1) = x0;

for n = 2:Niter
   x1 = x(1,n-1);
   x2 = x(2,n-1);
   x(:,n) = x(:,n-1) -  alpha*df(x1,x2);
end
function[x] =  newtonmethod(f,df,H,x0,Niter)
x(:,1) = x0;

for n = 2:Niter
   x1 = x(1,n-1);
   x2 = x(2,n-1);
   x(:,n) = x(:,n-1) -  inv(H(x1,x2))*df(x1,x2);
end
function[x] =  dampednewtonmethod(f,df,H,x0,alpha,Niter)
x(:,1) = x0;

for n = 2:Niter
   x1 = x(1,n-1);
   x2 = x(2,n-1);
   x(:,n) = x(:,n-1) -  alpha*inv(H(x1,x2))*df(x1,x2);
end

Furthermore, we implement our main function with three different initializers to test their sensitivity to the final solution as

Niter = 3000;
f = @(x1,x2)100*(x2 - x1^2)^2 + (1-x1)^2;
df = @(x1,x2) [-400*x1*(x2 - x1^2) - 2*(1-x1); 200*(x2 - x1^2)];
H = @(x1,x2) [-400*(x2-x1^2) + 800*x1^2 + 2, -400*x1; -400*x1, 200];

% Initials
x0_1 = [-1;1];
x0_2 = [0;1];
x0_3 = [2;1];
%Steepest Descent
[x_steepest_1] =  steepdescent(f,df,x0_1,1e-3,Niter);
[x_steepest_2] =  steepdescent(f,df,x0_2,1e-3,Niter);
[x_steepest_3] =  steepdescent(f,df,x0_3,1e-3,Niter);
% Newton
[x_newton_1] =  newtonmethod(f,df,H,x0_1,Niter);
[x_newton_2] =  newtonmethod(f,df,H,x0_2,Niter);
[x_newton_3] =  newtonmethod(f,df,H,x0_3,Niter);
%Damped Newton
[x_dampednewton_1] =  dampednewtonmethod(f,df,H,x0_1,1e-2,Niter);
[x_dampednewton_2] =  dampednewtonmethod(f,df,H,x0_2,1e-2,Niter);
[x_dampednewton_3] =  dampednewtonmethod(f,df,H,x0_3,1e-2,Niter);


%plot
figure
subplot(3,1,1)
plot(x_steepest_1(1,:),'r','Linewidth',2)
hold on
plot(x_steepest_2(1,:),'g','Linewidth',2)
plot(x_steepest_3(1,:),'b','Linewidth',2)

plot(x_steepest_1(2,:),'--r','Linewidth',2)
plot(x_steepest_2(2,:),'--g','Linewidth',2)
plot(x_steepest_3(2,:),'--b','Linewidth',2)

xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Steepest descent with step size 10^{-3}')
grid on
grid minor

subplot(3,1,2)
plot(x_newton_1(1,:),'r','Linewidth',2)
hold on
plot(x_newton_2(1,:),'g','Linewidth',2)
plot(x_newton_3(1,:),'b','Linewidth',2)

plot(x_newton_1(2,:),'--r','Linewidth',2)
plot(x_newton_2(2,:),'--g','Linewidth',2)
plot(x_newton_3(2,:),'--b','Linewidth',2)

xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Newton method')
grid on
grid minor

subplot(3,1,3)
plot(x_dampednewton_1(1,:),'r','Linewidth',2)
hold on
plot(x_dampednewton_2(1,:),'g','Linewidth',2)
plot(x_dampednewton_3(1,:),'b','Linewidth',2)

plot(x_dampednewton_1(2,:),'--r','Linewidth',2)
plot(x_dampednewton_2(2,:),'--g','Linewidth',2)
plot(x_dampednewton_3(2,:),'--b','Linewidth',2)

xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Newton method with damping factor 10^{-2} ')
grid on
grid minor

which gives us the plot below

Convergence behavior of Steepest Descent, Newton and Damped Newton w/ different initializers

Furthermore, we see in the above figure that the convergence of all methods using different initializations. We see Newton converges the fastest, the second faster is damped-newton then steepest descent.

Summary

In summary, we have seen how the three methods: Steepest Descent, Newton and damped Newton minimize the Rosenbrock function and their convergence behavior as a function of different initializers.
Finally, Subscribe to my channel to support us ! We are very close to 100K subscribers <3 w/ love. The algorithmic chef – Ahmad Bazzi. Click here to subscribe. Check my other articles here.