Introduction
In this article, we present different approaches on minimizing the Rosenbrock function, namely thru Newton, Damped Newton and Steepest Descent.
The Rosenbrock Function
Rosenbrock’s function is given as
(1)
where in this article we consider the following example:
(2)
Since all the methods we will use are gradient-aware and/or Hessian-aware, then we will spend some time in computing a closed form formula of the gradient and Hessian. Let us now start with the gradient of

(3)
and let
(4)
Also let’s compute the Hessian matrix
(5)
Therefore, we can compute the four double derivatives gives
(6)
Minimization with Numerical Methods
Steepest Descent
Steepest descent works as follows
(7)
where

where



Newton Method
Newton is a modified steepest descent working on Hessian’s instead as follows
(8)
Damped Newton Method
In contrast to Newton, Damped Newton method adds a damping factor as
(9)
MATLAB implementation
We will implement the three methods above as
function[x] = steepdescent(f,df,x0,alpha,Niter)
x(:,1) = x0;
for n = 2:Niter
x1 = x(1,n-1);
x2 = x(2,n-1);
x(:,n) = x(:,n-1) - alpha*df(x1,x2);
end
function[x] = newtonmethod(f,df,H,x0,Niter)
x(:,1) = x0;
for n = 2:Niter
x1 = x(1,n-1);
x2 = x(2,n-1);
x(:,n) = x(:,n-1) - inv(H(x1,x2))*df(x1,x2);
end
function[x] = dampednewtonmethod(f,df,H,x0,alpha,Niter)
x(:,1) = x0;
for n = 2:Niter
x1 = x(1,n-1);
x2 = x(2,n-1);
x(:,n) = x(:,n-1) - alpha*inv(H(x1,x2))*df(x1,x2);
end
Furthermore, we implement our main function with three different initializers to test their sensitivity to the final solution as
Niter = 3000;
f = @(x1,x2)100*(x2 - x1^2)^2 + (1-x1)^2;
df = @(x1,x2) [-400*x1*(x2 - x1^2) - 2*(1-x1); 200*(x2 - x1^2)];
H = @(x1,x2) [-400*(x2-x1^2) + 800*x1^2 + 2, -400*x1; -400*x1, 200];
% Initials
x0_1 = [-1;1];
x0_2 = [0;1];
x0_3 = [2;1];
%Steepest Descent
[x_steepest_1] = steepdescent(f,df,x0_1,1e-3,Niter);
[x_steepest_2] = steepdescent(f,df,x0_2,1e-3,Niter);
[x_steepest_3] = steepdescent(f,df,x0_3,1e-3,Niter);
% Newton
[x_newton_1] = newtonmethod(f,df,H,x0_1,Niter);
[x_newton_2] = newtonmethod(f,df,H,x0_2,Niter);
[x_newton_3] = newtonmethod(f,df,H,x0_3,Niter);
%Damped Newton
[x_dampednewton_1] = dampednewtonmethod(f,df,H,x0_1,1e-2,Niter);
[x_dampednewton_2] = dampednewtonmethod(f,df,H,x0_2,1e-2,Niter);
[x_dampednewton_3] = dampednewtonmethod(f,df,H,x0_3,1e-2,Niter);
%plot
figure
subplot(3,1,1)
plot(x_steepest_1(1,:),'r','Linewidth',2)
hold on
plot(x_steepest_2(1,:),'g','Linewidth',2)
plot(x_steepest_3(1,:),'b','Linewidth',2)
plot(x_steepest_1(2,:),'--r','Linewidth',2)
plot(x_steepest_2(2,:),'--g','Linewidth',2)
plot(x_steepest_3(2,:),'--b','Linewidth',2)
xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Steepest descent with step size 10^{-3}')
grid on
grid minor
subplot(3,1,2)
plot(x_newton_1(1,:),'r','Linewidth',2)
hold on
plot(x_newton_2(1,:),'g','Linewidth',2)
plot(x_newton_3(1,:),'b','Linewidth',2)
plot(x_newton_1(2,:),'--r','Linewidth',2)
plot(x_newton_2(2,:),'--g','Linewidth',2)
plot(x_newton_3(2,:),'--b','Linewidth',2)
xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Newton method')
grid on
grid minor
subplot(3,1,3)
plot(x_dampednewton_1(1,:),'r','Linewidth',2)
hold on
plot(x_dampednewton_2(1,:),'g','Linewidth',2)
plot(x_dampednewton_3(1,:),'b','Linewidth',2)
plot(x_dampednewton_1(2,:),'--r','Linewidth',2)
plot(x_dampednewton_2(2,:),'--g','Linewidth',2)
plot(x_dampednewton_3(2,:),'--b','Linewidth',2)
xlabel('Iteration number n')
ylabel('x')
legend('x1 path initializing x0 = [-1 1]','x1 path initializing x0 = [0 1]','x1 path initializing x0 = [2 1]','x2 path initializing x0 = [-1 1]','x2 path initializing x0 = [0 1]','x2 path initializing x0 = [2 1]')
title('Newton method with damping factor 10^{-2} ')
grid on
grid minor
which gives us the plot below

Furthermore, we see in the above figure that the convergence of all methods using different initializations. We see Newton converges the fastest, the second faster is damped-newton then steepest descent.
Summary
In summary, we have seen how the three methods: Steepest Descent, Newton and damped Newton minimize the Rosenbrock function and their convergence behavior as a function of different initializers.
Finally, Subscribe to my channel to support us ! We are very close to 100K subscribers <3 w/ love. The algorithmic chef – Ahmad Bazzi. Click here to subscribe. Check my other articles here.
Recent Comments