Abstract.In this paper we develop a trust region algorithm for constrained parabolic boundary control problems. The method is a projected form of the Steihaug trust-region-CG method with a smoothing step added at each iteration to improve performance in the global phase and provide mesh-independent sup-norm convergence in the terminal phase.

Key words.Trust region methods,inexact Newton methods,optimal control

AMS subject classi?cations.49K20,65F10,49M15,65J15,65K10

1.Introduction.In this paper we show how methods that combine CG iteration and trust re-gion globalization for optimization problems subject to simple bounds can be applied in a in?nite dimensional setting to parabolic optimal control problems.This paper addresses the global conver-gence questions left open by our previous work[11],[13],[14],on fast multilevel algorithms for local convergence by showing how trust region-CG algorithms can solve the coarse mesh problems needed to initialize the multilevel method in an ef?cient and mesh-independent way.The algorithm uses the postsmoothing step from[14]and[11]to improve the performance of the iteration.

For unconstrained problems,our approach differs from[18]only in that after a step and trust region radius have been accepted,a smoothing iteration like those in[11]and[14]is attempted. Unlike our previous work,however,an Armijo[1]line search is added to the smoothing step to ensure decrease in the objective function.This new form of the smoothing step is a scaled steepest descent algorithm.The local theory from[11]and[14]implies that full smoothing steps are taken near the solution.

The effect of this in the in?nite dimensional case is to allow one to make sup-norm error estimates in the terminal phase of the iteration[11],[14].In the constrained case,we differ from the algorithm in[8]in more ways.We use trust region and solve unconstrained trust region problems,using the reduced Hessian at the current point to build the quadratic model.The reason for this is to make the trust region problem as easy to solve as possible and to eliminate the need to explicitly compute a generalized Cauchy point.We update the active set after the trust region step has been computed with a scaled projected gradient step(similar to[8]).The scaling serves the purpose of becoming an inexact implementation of the algorithm in[11]and[14]when full steps are taken.We then obtain fast local convergence in the norm.Our local convergence theory does not depend on identi?cation of the active set in?nitely many iterations but rather applies the measure-theoretic ideas in[14].Hence,our trust region algorithm becomes an inexact projected Newton method in the terminal phase of the iteration with local convergence properties covered by

Kelley@http://m.wendangku.net/doc/5a580c7e27284b73f2425071.html).The research of this author was supported by National Science Foundation grant#DMS-9321938and North Atlantic Treaty Organization grant#CRG920067. Computing was partially supported by an allocation of time from the North Carolina Supercomputing Center.

Universit¨a t Trier,FB IV–Mathematik and Graduiertenkolleg Mathematische Optimierung,54296Trier,Germany (sachs@uni-trier.de).The research of this author was supported by North Atlantic Treaty Organization grant #CRG920067.



the theory developed in[3],[11],and[14].

We consider the problem of minimizing


where is given and is the solution to the nonlinear parabolic problem (1.2)

In(1.1)–(1.2)is constrained to be in the set


for a.e.

for?xed and the nonlinear function is assumed to satisfy


See[21]for examples of applications.

The gradient of in is


where is the solution of the adjoint problem


The map is completely continuous as a map on and as a map from, ,to,[17].We will make use of these compactness properties in this paper.

We base our methods on the work in[7],[8],and[18](where and

for unconstrained problems).These methods solve the trust region problem and by searching along the piecewise linear path having the CG iterates as nodes,terminating either on the trust region boundary,with an inexact Newton step,or a direction of negative curvature.The compact-ness of the map insures that the performance of the CG iteration is independent of the discretization.This is consistent with results on linear equations and CG(see[9]and[20]). Another bene?t of the compactness is that the reduced Hessian of is a compact perturbation of a constant multiple of the identity and hence no preconditioning is needed for fast convergence.

We close this section with some notation and de?nitions.

Let denote the inner product in or the Euclidean inner product in any?nite dimensional space.We denote the-norm by and the-norm by.

We let be the projection onto de?ned for any measurable on and almost every as







The nonsmooth nonlinear equation is a necessary condition for stationarity[2].

The map,given by


is a completely continuous map from for any[11],[14].

For,we de?ne the active set for as


and the inactive set as.It is clear that for any


for all.

2.Algorithms.The algorithms are all based on the trust region-CG method in[18]and the general convergence analysis in[19].The trust region problem is solved approximately by using a piecewise linear path whose nodes are the CG iterates.This approximate solution of the trust region problem is used in a standard way[10],[18],[16],to test for suf?cient decrease and adjust the trust region radius.We incorporate the CG-TR method into the inexact projected Newton approach of[11]to give a superlinearly convergent algorithm.

2.1.Inexact Projected Newton Algorithm.To specify the algorithm we must de?ne projec-tions that correspond to the active and inactive set.For any measurable we de?ne the multiplication operator by


where is the characteristic function of.In particular,if and and are approximations to and,we will use



We follow[14]and[11]and approximate the active set by


and let

The parameter may be adjusted as the iteration progresses to give local superlinear conver-gence[11],[14].

Note that for all we have


and hence


for all and.

In the constrained case,the necessary conditions for optimality can be expressed as a nondif-ferentiable compact?xed point problem



Recall from1that the map(and hence)is a compact map from to for any.In that sense is a smoother.We will use that property to(1)improve the global convergence properties of our proposed algorithm and(2)provide a uniform norm local convergence theory as in[14]and[11].

We de?ne the reduced Hessian at by



The inexact projected Newton algorithms from[11]and[14]have several stages.We describe the one from[11]here in terms of the transition from a current approximation to a new one. The understanding here is that the parameter in the approximation to the active set and the forcing term in the inexact Newton process change as the iteration progresses.

A LGORITHM2.1.pnstep

1.Identi?cation:Given and set.

2.Error Reduction:Find Im which satis?es



to reduce the error in.

3.Postsmoothing:Set to recover convergence.

In the context of this paper,in which global convergence is the issue,Algorithm pnstep presents two problems.Firstly,the smoothing step is a scaled gradient projection step and may lead to dramatic increases in the objective function when is far from the solution.We remedy this by adding an Armijo line search to this phase of the algorithm but do not demand, which may never be possible,but only that by a certain small amount.The results in[11]and[14]insure that if is suf?ciently near the solution,then the full smoothing step will be accepted and hence the fast local convergence(the precise speed of convergence depends on the choice of forcing term)will not be effected by the line search.Secondly,there is no guarantee that the reduced Hessian will be positive de?nite.We address this problem with an inexact trust region algorithm that will exploit any negative curvature direction that it?nds.

As is standard we use the measure of nonstationarity


For example,a locally convergent algorithm using Algorithm pnstep is

A LGORITHM2.2.pnlocal


1.If,terminate the iteration.


3.Take an inexact step pnstep.

4..Go to step1.

The values of and in step2will ensure superlinear convergence with q-order,[11], [14].

Under standard assumptions,[14],[11],Algorithm pnlocal will produce iterates that con-verge locally q-superlinearly(in the norm)to a minimizer.Q-linear convergence can be ob-tained if the formula for in step2is replaced by and is suf?ciently small.The purpose of this paper is to develop a trust region globalization for this algorithm that preserves the norm local convergence in the terminal phase while converging globally in.

2.2.Solution of the Trust Region Problem.We use a standard solver form[18]for our unconstrained trust region subproblems.The inputs to Algorithm trcg,which approximately solves the trust region problem,are the current point,the objective,a preconditioner,the forcing term,the current trust region radius,and a limit on the number of iterations. The output is the approximate solution of the trust region problem.We formulate the algorithm using the preconditioned CG framework from[12].

We will assume for the present that gradients are computed exactly and that Hessian-vector product is approximated by the difference quotient


We present the algorithm of[18]for approximate solution of the trust region problem

In practice the action of on a vector may be inaccurate and even nonlinear,as would be the case with.However,the effects of such inaccuracy are simply that the method is equivalent to one in which the Hessian is a difference approximation based on the basis for the Krylov subspace.

A LGORITHM2.3.trcg


2.Do While



If then

Find such that






(i)If then

Find such that



Trust region algorithms for bound constrained problems have been analyzed in considerable generality in[7].A concrete algorithm,proposed in[8],follows a piecewise linear path in a search for a generalized Cauchy point,freezes the active set at that point,and then solves the trust region problem approximately on the current active set.This process is important for the theory in[7]not only because it guarantees Cauchy decrease but also for the proof of superlinear convergence after the active set has been identi?ed.

In the problems considered here,where there is a continuum of constraints,it is not clear how to use the method of[8]because the active set,being uncountable,will never be fully identi?ed, and the construction of a path on which to search for a Cauchy point would lead to in?nitely many knots to test.Instead we solve an unconstrained trust region problem for a reduced quadratic model and project the solution of that problem onto the active set.

Our approach to minimization of the reduced quadratic model also differs from that in[8].In that paper,and in the convergence analysis in[7],the fact that all norms in?nite dimension are equivalent was used to justify trust region bounds.We use the standard trust region and therefore do not include the constraints explicitly in the trust region.We then use a smoothing step to deal with the nonequivalence of norms and recover fast uniform convergence in the terminal phase of the iteration.

Given and we consider the reduced quadratic model


In(2.10),the reduced Hessian is given by(2.6).Note that the action of on a function can easily be computed by differences.This is a somewhat nonstandard model in that is used in the?rst order part of(2.10)rather than and

instead of.The reasons for this are that this model performed better in our numerical experiments and also makes a smoother transition to a fast local algorithm that can be analyzed with the ideas from[11]and[14].The nonstandard quadratic term presents no problems,however the linear term must be accounted for in the analysis,but,as we shall show next,this is easy to do because our algorithm is a dog-leg.

TRUST REGION METHODS FOR CONTROL7 The analysis of the global convergence is not affected by the linear term because,in view of (2.4),the model we use and the more standard quadratic model


agree if


for some.To see this note that since,we must have if(2.12)holds.

Algorithm boxtr returns a trial point by using Algorithm trcg for a the reduced quadratic model.

A LGORITHM2.4.boxtr


2.Find the direction by calling



Having computed the trial point,one must decide whether to accept the new point or change the trust region radius.Both decisions are based on a comparison of the actual reduction


to a predicted reduction based on the reduced quadratic model.Here


In a typical trust region algorithm,the step is accepted if


and increased()if



2.3.Termination.No algorithm that depends upon a measurement of decrease like is reliable if the decreases in the function are smaller than the accuracy with which the function is computed.Once we have resolved a local minimum to that point,our view is that the iteration has succeeded.

Hence we terminate the algorithm if either,or


Here and are small tolerances.We test for(2.20)every time the trust region radius is changed and if(2.20)holds at any point during an iteration,we terminate and accept the previous iteration. This stopping criterion is used only in the implementation.

2.4.The Complete Algorithm.In the interest of clarity,we do not make the trust region algorithmic parameters,,,,,,,,,the preconditioner,and the initial trust region radius,formal arguments to the algorithm.The trust region radius is limited to a maximum value.

The notation is that is the current iteration.Upon exit from the trust region phase of the algorithm,we obtain and pass that intermediate iterate to the smoother to produce.Choose a constant.

The inputs are,the bounds,and the function.

A LGORITHM2.5.trmin


2.Test for termination

if or

if and terminate successfully

3.Fix and for this iterate.Set,.

4.Find a new trial point using Algorithm boxtr.

Terminate with failure if more than iterations are taken.


(a)if or(2.18)does not hold,then;,go to step4.


(c)if and,

(d)if,or and,

(e)if and,,go to step4

6.(a)Find the smallest integer such that

then postsmooth,i.e.set

(b);;go to step2

The?ag is used to avoid an in?nite loop of increasing and decreasing the trust region radius.

In the context of a globally convergent algorithm,attention must be paid to the postsmooth-ing step(6a).The line search prevents divergence in the early phase of the iteration when the approximate solutions are not accurate.


3.Convergence Results.In this section we derive global convergence results for Algorithm trmin.Recall that our notation is that(resp.)is the current(resp th)iteration.Upon exit from the trust region phase of the algorithm,we obtain(resp)and pass that intermediate iterate to the smoother to produce(resp).

3.1.Global Convergence.Given and a the quadratic model function from(2.10) we must?rst show that the trust region radius can be bounded from below.

Our assumptions are


1.is twice continuously differentiable in.

2.There is such that for all and


The second assumption is sort of a second order suf?ciency condition as it is common in the convergence analysis of higher order methods.The?nite difference approximation(2.9)satis?es this assumption for suf?ciently small,if the assumption holds for the original Hessian.

In this section we will use some notation from[2].For de?ne


We begin with the lower bound for the gradient projection step from[2].

T HEOREM3.1.Let Assumption3.1hold and let.Then there is such that for any and,


Proof.Since Algorithm trmin will,in the worst case of small trust region radius,take steps of the form

the algorithm will therefore behave like the gradient projection algorithm[2],since by(1.11) .In view of this,we can use known properties of the gradient projection algorithm to bound the trust region radius from below.

Algorithm boxtr,with no preconditioner,will return a trial point of the form



we obtain from Lemma3.2,

Therefore,since,from(3.9)we obtain

which is(2.18).

We summarize the analysis so far.We have shown that if(3.8)holds then for some and(2.18)holds.

We now describe how the right side of(3.8)must be augmented to imply that. Assume that(3.8)holds.By Lemma3.2we have

TRUST REGION METHODS FOR CONTROL11 and hence by the de?nition of(2.14)


Note that


Since we have

So,if the trust region radius ever satis?es

then all acceptance tests will be passed and a further increase will be attempted.Hence,on exit from step5(3.4)will hold with

which completes the proof.

Our main global convergence result is.

T HEOREM3.4.Let Assumption3.1hold and let be the sequence produced by Algorithm trmin Then

Proof.Step6of Algorithm trmin,(2.13)and(2.18)yield


with de?ned by(2.19).The boundedness from below of on implies Since by(2.19),Lemma3.2yields

Theorem3.3implies that either or

TRUST REGION METHODS FOR CONTROL13 where the norm in(3.12)is any of,or.

The other assumption is related to a nondegeneracy assumption on the optimal control and a generalization of the fact that for?nite dimensional problems the active set is identi?ed in?nitely many steps.We refer the reader to[14]for motivation and discussion.

We de?ne sets




A SSUMPTION3.3.There is such for all.Let be given by(2.3)and let be given.Then there are such that for all and all such that





In Assumption3.3,denotes Lebesgue measure.

On these assumptions we have the following local convergence result.The proof is,on the assumptions made above,a variation of those in[11]and[14].

T HEOREM3.5.Let Assumptions3.1,3.2,and3.3hold.Let and be given. Then if is suf?ciently near in,,


and and are given by Algoritm trmin then satis?es(2.7)(i.e.a full inexact Newton step is taken),


4.Numerical Example.All the results reported in this section were obtained on a SUN SPARC-20running SUN fortran f77version SC3.0.1and SUN-OS4.1.3.

Our numerical results are based on the problem posed by(1.1),(1.2),(1.3),(1.5),and(1.6). We set


We report results for both the constrained and unconstrained problems.Our constraints are given by



We used the trust region parameters


The radius of the trust region was initialized to.In the test for postsmoothing we used .

The approximate active set was computed with

Here where is the number of nodes.The forcing term was

For a mesh of size we use the tolerances

and set the relative and absolute local truncation errors in DASPK to be

Following[14]we limited the time step in DASPK to the spatial mesh width.Our difference increment in(2.9)for the numerical reduced Hessian was

The initial iterate was

for both the constrained and unconstrained problems.

The solutions for the unconstrained(left)and constrained(right)problems,both with

are plotted in Figure4.1.From the plot of the constrained minimizer on the right one can see that both the upper and lower bound constraints are attained on different parts of.

For all computations we tabulate the iteration counter,the value of the objective, the actual reduction(for),the norm of the projected gradient,,the number of CG iterations required(for),and the radius of the trust region.For the constrained problem we also tabulate,the fraction of points that are in the approximate active set.means that the steepest descent or gradient projection step either went beyond the trust region boundary or satis?ed the inexact Newton condition.

The iterations for the both the constrained and unconstrained problem,reported in Tables4.1 and4.2terminated when.For the unconstrained problem,full smoothing steps were taken at each iteration.For the constrained problem,a full smoothing step was taken at the?nal iteration only.



Unconstrained Problem,h=1/639.

09.77e+00 4.33e+00 5.00

1 2.81e-01-9.11e+00 2.45e-010 5.00

2 2.21e-01-5.99e-02 3.30e-022 5.00

3 2.20e-01-1.41e-03 1.32e-021 5.00

4 2.19e-01-6.98e-04 2.83e-022 5.00

5 2.19e-01-3.88e-04 1.00e-020 5.00

6 2.19e-01-3.85e-048.75e-041 5.00

7 2.19e-01-2.15e-06 5.93e-054 5.00

8 2.19e-01-4.44e-08 1.34e-053 5.00



Constrained Problem,h=1/639.

09.77e+00 4.33e+00 5.00

1 2.60e+00-9.11e+00 1.91e+000 5.000.209

2 1.53e+00-2.30e+00 1.49e+000 5.000.025

37.90e-01-1.23e+009.81e-011 5.000.000

4 2.81e-01-4.90e-01 2.63e-021 5.000.141

5 2.78e-01-1.88e-037.24e-0310.610.223

6 2.78e-01-4.32e-05 2.64e-0310.030.322

7 2.78e-01-1.06e-048.88e-0310.260.350

8 2.78e-01-6.04e-05 5.87e-0300.260.331

9 2.78e-01-2.16e-059.12e-0410.020.375

10 2.78e-01-6.21e-07 6.70e-0410.020.380

11 2.78e-01-3.19e-07 1.28e-0500.020.380



[1]L.A RMIJO,Minimization of functions having Lipschitz-continuous?rst partial derivatives,Paci?c J.Math.,16


[2] D.B.B ERTSEKAS,On the Goldstein-Levitin-Polyak gradient projection method,IEEE Trans.Autom.Control,



,Consistent initial condition calculation for differential-algebraic systems,Tech.Report UCRL-JC-122175,Lawrence Livermore National Laboratory,August1995.

[7] A.R.C ONN,N.I.M.G OULD,AND P.L.T OINT,Global convergence of a class of trust region algorithms for

optimization problems with simple bounds,SIAM J.Numer.Anal.,25(1988),pp.433–460.


,Iterative Methods for Linear and Nonlinear Equations,no.16in Frontiers in Applied Mathematics, SIAM,Philadelphia,1995.

[13] C.T.K ELLEY AND E.W.S ACHS,Fast algorithms for compact?xed point problems with inexact function

evaluations,SIAM http://m.wendangku.net/doc/5a580c7e27284b73f2425071.htmlput.,12(1991),pp.725–742.