A modified quadratic hybridization of Polak-Ribi è re-Polyak and Fletcher-Reeves conjugate gradient method for unconstrained optimization problems

Article History: Received 15 December 2016 Accepted 23 February 2017 Available 15 July 2017 This article presents a modified quadratic hybridization of the Polak–Ribière– Polyak and Fletcher–Reeves conjugate gradient method for solving unconstrained optimization problems. Global convergence, with the strong Wolfe line search conditions, of the proposed quadratic hybrid conjugate gradient method is established. The new method is tested on a number of benchmark problems that have been extensively used in the literature and numerical results show the competitiveness of the new hybrid method.


Introduction
Nonlinear conjugate gradient method is a very powerful technique for solving large scale unconstrained optimization problems min{f (x) : where f : R n → R is a continuously differentiable function.It has advantages over Newton and quasi-Newton methods in that it only needs the first order derivative and hence less storage capacity is needed.It is also relatively simple to program.
Given an initial guess x 0 ∈ R n , the nonlinear conjugate gradient method generates a sequence {x k } for problem (1) as where α k is a step length which is determined by a line search and d k is a descent direction of f at x k generated as where g k = ∇f (x k ) is the gradient of f at x k and β k is a parameter.
From the literature, it is well known that FR and DY methods have strong convergence properties.However, they may not perform well in practice.On the other side, PRP and HS methods are known to perform better numerically but may not converge in general.Given this, researchers try to devise some new methods, which have the advantages of these two kinds of methods.This has been done mostly by combining two or more β k parameters in the same conjugate gradient method to come up with hybrid methods.Thus, hybrids try to combine attractive features of different algorithms.For example, Touati-Ahmed and Storey [36] proposed this hybrid method to take advantage of the attractive convergence properties of β F R k and numerical performance of β P RP k .Many other hybrids have been proposed by parametrically combining different parameters β k .In Dai and Yuan [11], for instance, a one-parameter family of conjugate gradient methods is proposed as where the parameter Liu and Li [28] proposes a convex combination of where is the Liu-Storey (LS) [26] parameter and γ k ∈ [0, 1].Other hybrid conjugate gradient methods can be found in [2, 4-8, 13, 21, 22, 24, 25, 27, 29, 35, 38, 41].The step length α k is often chosen to satisfy certain line search conditions.It is very important in the convergence analysis and implementation of conjugate gradient methods.The line search in the conjugate gradient methods is often based on the weak Wolfe conditions and or the stronger version of the Wolfe line search conditions and where 0 < µ < σ < 1.More information on these line search methods and other line search methods can be found in the literature [9,14,25,31,34,37,39,41].In this paper, we suggest another approach to get a new hybrid nonlinear conjugate gradient method.
The rest of the paper is organised as follows.In section 2, we present the proposed method.In Section 3 we prove that the proposed algorithm (method) globally converges.Section 4 presents some numerical experiments and conclusion is given in Section 5.

A new hybrid conjugate gradient method
We now present our proposed hybrid conjugate gradient method.The hybrid method we propose is motivated by the work of Babaie-Kafaki [4,5] and Mo, Gu and Wei [29].Babaie-Kafaki [4,5] suggested a quadratic hybridization of β F R k and β P RP k method of the form where and is the solution of the quadratic equation Thus, the author suggested two methods β HQ+ k and β HQ− k .The parameter is a hybrid parameter that was suggested by Gilbert and Nocedal [21] to improve on the convergence properties of β P RP k .In Mo, Gu and Wei [29], the authors suggest a β * k defined by which then modifies the Touati-Ahmed and Storey method [36] to give This method by Mo et al. [29] was shown to be very competitive with the other hybrids in the literature and it was shown to perform much better than the original β P RP k .Now, motivated by this suggestion (9) from [29] and the work of Babaie-Kafaki [4,5], in this work we modify Babaie-Kafaki's method by introducing β S k as where where β * k is as defined in (9), and then define This leads to our hybrid conjugate gradient method presented below.

Global convergence of the proposed method
The global convergence analysis in this section follows that of Babaie-Kafaki [4,5].To analyze the global convergence property of our hybrid method, the following assumptions are required.These assumptions have been used extensively in the literature for the global convergence analysis of conjugate gradient methods.
Assumption 1.Let the level set where x 0 is the initial guess, be bounded.That is, there exists a positive constant B such that Assumption 2. In some neighbourhood N of Ω, the function f is continuously differentiable and its gradient, g(x) = ∇f (x), is Lipschitz continuous, that is, there exists a constant L > 0 such that g(x) − g(y) ≤ L x − y for all x, y ∈ N.
These assumptions imply that there exists a positive constant γ such that Also, under Assumptions 1 and 2, the following lemma can be established.
Lemma 1 (Zoutendijk lemma).Consider any iteration of the form x k+1 = x k + α k d k , where d k is a descent direction and α k satisfies the weak Wolfe conditions (4) and ( 5).Suppose Assumptions 1 and 2 hold, then It follows from Lemma 1 and the sufficient descent condition with the Wolfe line search that Lemma 2. Suppose that Assumptions 1 and 2 hold.Consider any conjugate gradient method in the form of in which, for all k ≥ 0, the search direction d k is a descent direction and the step length α k is determined to satisfy the Wolfe conditions.If then the method converges in the sense that Lemma 3. Suppose that 1 and 2 hold.Consider any conjugate gradient method in the form of , with the conjugate gradient parameter β + k (θ k ) defined by (11), in which the step length α k is determined to satisfy the strong Wolfe conditions ( 6) and (7).Also assume that the descent condition and there exists a positive constant ξ such that If, for a positive constant γ, we have where Proof.Firstly, note that the descent condition (18) guarantees that d k = 0. So, u k is welldefined.Moreover, from (20) and Lemma 2, we have since otherwise (17) holds contradicting (20).Now, we divide β + k (θ k ) into two parts as β and, for all k ≥ 0, we define where k d k .Therefore, from (12) we obtain that Since u k = u k+1 = 1, from (23) we can write Because θ k ∈ [−1, 1], we have δ k+1 ≥ 0. Using the condition δ k+1 ≥ 0, the triangle inequality and (24), we get Also, from ( 13), ( 14), ( 19) and ( 20) we have Now, from ( 22), (25), and ( 26) we have We now define the following property, called property (*).

Definition 1.
[10] Consider any conjugate gradient method in the form of and (12).Suppose that for a positive constant γ the inequality (20) holds.Under this assumption, we say that the method has property (*) if and only if there exist constants b > 1 and λ > 0 such that for all k ≥ 0, and Theorem 1. Suppose that Assumptions 1 and 2 hold.Consider any conjugate gradient method in the form of and ( 12), with the conjugate gradient parameter β + k (θ k ) defined by (11), in which the step length α k determined to satisfy the strong Wolfe conditions ( 6) and (7).If the search directions satisfy the descent condition (18) and there exists a positive constant η such that then the method converges in the sense that Proof.Because of the descent condition and strong Wolfe conditions, we have proven that the sequence {x k } k≥0 is a subset of the level set Ω. Also, since all the assumptions of Lemma 2 hold, the inequality ( 21) holds.Now, to prove the convergence, it is enough to show that the method has property (*).Since θ k ∈ [−1, 1], from (11), (14), and (20) we have Moreover, from Assumption 2 and equations ( 11), ( 14), (20), and (30) we get So, from ( 31) and ( 32), if we let b = 5γ 2 γ 2 and λ = , then ( 28) and ( 29) hold and consequently, the method has property (*).

Numerical Experiments
We now present numerical experiments obtained by our method on some test problems chosen from Morè, et al. [30] and Andrei [3] to analyse its efficiency and effectiveness.A number of these test problems are widely used in the literature for testing unconstrained optimization methods.We present these test problems in Table 1, where the columns 'Prob' and 'Dim', respectively, represent the name and dimension of the test problem, and the dimensions of the problems range from 2 to 20000.We compare our proposed new hybrid conjugate gradient method (β S k ) with the quadratic hybridization β HQ− k of Babaie-Kafaki [4,5] and the method β * k by Mo, Gu and Wei [29].In [4], β HQ− k was shown to be the better hybridization as compared to β HQ+ k , hence our comparison will only focus on β HQ− k .For all the methods, we considered the stopping condition to be ǫ = 10 −5 , that is, the algorithms (methods) were stopped once the condition ||g k || < 10 −5 was satisfied, or the maximum number of iterations of 5000 was reached.For the line search, the strong Wolfe conditions ( 6) and ( 7) were used to find the step length α k , with µ = 0.0001 and σ = 0.16.All the methods were coded in MATLAB R2015b and numerical results are compared based on number of gradient evaluations, function evaluations and CPU time.
In Table 1, we present the number of functions evaluations (N F E) and gradient evaluations (N GE) obtained for the methods β HQ− k , β S k and β * k , where the best results for each problem are indicated in bold.We observe from the table that, overall, the incorporation of β * k in the quadratic hybridization has a positive effect on β HQ− k , even though for some problems it is worse off.We also compare the methods using the performance profiles tool suggested by Dolan and Moré [16] which, over the years, has been used extensively to judge the performance of different methods on a given set of test problems.The tool evaluates and then compares the performance of the set of methods S on a set P of test problems.That is, using the ratio r p,s = t p,s min{t p,s : s ∈ S} , where t p,s is (function, gradient, CPU time) evaluations required to solve p by method s, the overall performance profile function is where n p is the total number of problems in P and τ ≥ 0.
In case the method s fails to solve problem p, the ratio r p,s is set to some sufficiently large number.The function ρ s (τ ) is then plotted against τ to give the performance profile.Notice that the function ρ s (τ ) takes the values ρ s (τ ) ∈ [0, 1] and so the inequality ρ s (τ 1 ) < ρ t (τ 1 ) shows that the method t outperforms the method s at τ 1 .
We now present the plots of these performance profiles on function evaluations, gradient evaluations and CPU time as figures.The function evaluations performance profile is presented in

Conclusion
In this article, a modified quadratic hybridization of Polak-Ribière-Polyak and Fletcher-Reeves conjugate gradient method (β S k ) was presented.Its global convergence under the strong Wolfe line search conditions was also established.The β S k method presented was tested on a number of unconstrained problems that have been extensively used in the literature and compared to the original quadratic hybridization of Polak-Ribière-Polyak and Fletcher-Reeves conjugate gradient method β HQ− k .The numerical results show that this proposed modification has a positive effect on the performance of β HQ− k .However, the numerical results from this study show that further research to improve the efficiency and effectiveness of β HQ− k and other conjugate gradient hybrids is still needed.A number of hybrid conjugate gradient methods have been proposed in the literature but there are many problems that are currently not properly handled by these methods, hence the need for more research in this field.
Figure 1, gradient evaluations in Figure 2 and CPU time in Figure 3.It is clear from the figures that replacing β P RP k by β * k in the quadratic hybridization β HQ− k has a positive effect.From the figures, we observe that β * k is the best method overall.As for β S k and β HQ− k , we see that in Figures 1 and 2, for τ ≤ 0.5, β HQ− k is slightly better than β S k .However, in Figure 3, the plot shows that in terms of CPU time, there is not much difference between β S k and β HQ− k with the methods being much competitive.Overall the figures show the influence of β * k on the quadratic hybridization over the use of β P RP k .

Table 1 .
Results of test problems.