op17 abstractsop17 abstracts 91 local patterns, such as triangles or 4-cliques. such local...

139
90 OP17 Abstracts IP1 Rigorous Guarantees for Nonconvex Optimization? The classical complexity barrier in continuous optimization is between convex (solvable in polynomial time) and non- convex (intractable in the worst-case setting). However, many problems of interest are not constructed adversari- ally, but instead arise from statistical models of scientific phenomena. Can we provide rigorous guarantees for such random ensembles of nonconvex optimization problems? In this talk, we survey various positive answers to this ques- tion, including optimal results for sparse regression with nonconvex penalties, direct approaches to low-rank matrix recovery, and non-local guarantees for the EM algorithm. All of these results involve natural weakenings of convexity that hold for various classes of nonconvex functions, thus shifting the barrier between tractable and intractable. Martin Wainwright University of California Berkeley [email protected] Sivaraman Balakrishnan Carnegie Mellon University [email protected] Yudong Chen Cornell University [email protected] Po-Ling Loh University of Wisconsin [email protected] Bin Yu University of California, Berkeley [email protected] IP2 Innovation in Big Data Analytics Risk and decision models and predictive analytics have long been cornerstones for advancement of business analytics in industrial, government, and military applications. In par- ticular, multi-source data system modeling and big data analytics and technologies play an increasingly important role in modern business enterprise. Many problems aris- ing in these domains can be formulated into mathematical models and can be analyzed using sophisticated optimiza- tion, decision analysis, and computational techniques. In this talk, we will share some of our successes in healthcare, defense, and service sector applications through innovation in predictive and big data analytics. Eva K. Lee Georgia Inst of Technology Sch of Ind & Systems Eng [email protected] IP3 Optimizing Today’s Communication Networks Optimization methods have long played a major role in the design and management of communication systems that we use today. In recent years, the communication networks have grown much larger in size and have to provide a multi- tude of services at increasingly higher rates. These changes have presented new challenges in the design and the man- agement of communication networks. In this talk, we will illustrate the use of modern optimization methods in the provision of large scale heterogeneous communication net- works to ensure energy efficiency and network scalability. Zhi-Quan Luo Univerisity of Minnesota Minneapolis, Minnesota [email protected] IP4 Subgradient Methods Subgradient methods are the most basic of algorithms in convex optimization, applicable quite generally and having simple convergence proofs, modulo assumptions such as or- thogonal projections onto convex sets being readily com- putable. The fundamentals regarding subgradient methods have changed little in decades, perhaps in part due a sup- position by researchers that not much else can be said for algorithms relying on hardly any problem structure other than convexity and Lipschitz continuity. We explain that, to the contrary, further advances are possible. We display an elementary algorithmic device which removes the need for Lipschitz continuity, and that replaces orthogonal pro- jections with line searches. Additionally, we give focus to using subgradient methods for the most elemental of gen- eral optimization problems, computing a feasible point. James M. Renegar Cornell University School of ORIE [email protected] IP5 Using Second-order Information in Training Large- scale Machine Learning Models We will give a broad overview of the recent developments in using deterministic and stochastic second-order informa- tion to speed up optimization methods for problems aris- ing in machine learning. Specifically, we will show how such methods tend to perform well in convex setting but often fail to provide improvement over simple methods, such as stochastic gradient descent, when applied to large- scale nonconvex deep learning models. We will discuss the difficulties faced by quasi-Newton methods that rely on stochastic first order information and Hessian-Free meth- ods that use stochastic second order information. Katya Scheinberg Lehigh University [email protected] IP6 Using Local Measurements to Infer Global Net- work Properties Networks are widely used to model relational data in many applications such as social sciences, biology, finance, and marketing. Analyses of these networks can reveal scien- tific insights, but formulations lead to NP-Hard problems. Heuristics are commonly used without guarantees on solu- tion quality, as even some polynomialtime algorithms be- come impractical due to the sheer sizes of the problems. To achieve scalability, our works has focused on trying to infer global network properties based on distributions of

Upload: others

Post on 30-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

90 OP17 Abstracts

IP1

Rigorous Guarantees for Nonconvex Optimization?

The classical complexity barrier in continuous optimizationis between convex (solvable in polynomial time) and non-convex (intractable in the worst-case setting). However,many problems of interest are not constructed adversari-ally, but instead arise from statistical models of scientificphenomena. Can we provide rigorous guarantees for suchrandom ensembles of nonconvex optimization problems? Inthis talk, we survey various positive answers to this ques-tion, including optimal results for sparse regression withnonconvex penalties, direct approaches to low-rank matrixrecovery, and non-local guarantees for the EM algorithm.All of these results involve natural weakenings of convexitythat hold for various classes of nonconvex functions, thusshifting the barrier between tractable and intractable.

Martin WainwrightUniversity of [email protected]

Sivaraman BalakrishnanCarnegie Mellon [email protected]

Yudong ChenCornell [email protected]

Po-Ling LohUniversity of [email protected]

Bin YuUniversity of California, [email protected]

IP2

Innovation in Big Data Analytics

Risk and decision models and predictive analytics have longbeen cornerstones for advancement of business analytics inindustrial, government, and military applications. In par-ticular, multi-source data system modeling and big dataanalytics and technologies play an increasingly importantrole in modern business enterprise. Many problems aris-ing in these domains can be formulated into mathematicalmodels and can be analyzed using sophisticated optimiza-tion, decision analysis, and computational techniques. Inthis talk, we will share some of our successes in healthcare,defense, and service sector applications through innovationin predictive and big data analytics.

Eva K. LeeGeorgia Inst of TechnologySch of Ind & Systems [email protected]

IP3

Optimizing Today’s Communication Networks

Optimization methods have long played a major role in thedesign and management of communication systems that weuse today. In recent years, the communication networkshave grown much larger in size and have to provide a multi-tude of services at increasingly higher rates. These changes

have presented new challenges in the design and the man-agement of communication networks. In this talk, we willillustrate the use of modern optimization methods in theprovision of large scale heterogeneous communication net-works to ensure energy efficiency and network scalability.

Zhi-Quan LuoUniverisity of MinnesotaMinneapolis, [email protected]

IP4

Subgradient Methods

Subgradient methods are the most basic of algorithms inconvex optimization, applicable quite generally and havingsimple convergence proofs, modulo assumptions such as or-thogonal projections onto convex sets being readily com-putable. The fundamentals regarding subgradient methodshave changed little in decades, perhaps in part due a sup-position by researchers that not much else can be said foralgorithms relying on hardly any problem structure otherthan convexity and Lipschitz continuity. We explain that,to the contrary, further advances are possible. We displayan elementary algorithmic device which removes the needfor Lipschitz continuity, and that replaces orthogonal pro-jections with line searches. Additionally, we give focus tousing subgradient methods for the most elemental of gen-eral optimization problems, computing a feasible point.

James M. RenegarCornell UniversitySchool of [email protected]

IP5

Using Second-order Information in Training Large-scale Machine Learning Models

We will give a broad overview of the recent developmentsin using deterministic and stochastic second-order informa-tion to speed up optimization methods for problems aris-ing in machine learning. Specifically, we will show howsuch methods tend to perform well in convex setting butoften fail to provide improvement over simple methods,such as stochastic gradient descent, when applied to large-scale nonconvex deep learning models. We will discuss thedifficulties faced by quasi-Newton methods that rely onstochastic first order information and Hessian-Free meth-ods that use stochastic second order information.

Katya ScheinbergLehigh [email protected]

IP6

Using Local Measurements to Infer Global Net-work Properties

Networks are widely used to model relational data in manyapplications such as social sciences, biology, finance, andmarketing. Analyses of these networks can reveal scien-tific insights, but formulations lead to NP-Hard problems.Heuristics are commonly used without guarantees on solu-tion quality, as even some polynomialtime algorithms be-come impractical due to the sheer sizes of the problems.To achieve scalability, our works has focused on trying toinfer global network properties based on distributions of

Page 2: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 91

local patterns, such as triangles or 4-cliques. Such localmeasurements are scalable and provide uniquely measur-able properties of the graph. This talk will show examplesof our approach to network modeling and analysis.

Ali PinarSandia National [email protected]

IP7

Recent Progress on Dual Decomposition forStochastic Integer Programming

We will discuss two methods for efficiently computing thevalue of the Lagrangian Dual of a Stochastic Integer Pro-gram (SIP). First, we discuss a primal-dual algorithm, thewell-known progressive hedging method, but unlike previ-ous progressive hedging approaches for SIP, our algorithmcan be shown to converge to the optimal Lagrangian dualvalue. The key improvement in the new algorithm is aninner loop of optimized linearization steps, similar to thosetaken in the classical Frank-Wolfe method. Second, wewill discuss recent our work in improving the performanceof classical stochastic (sub)gradient methods for SIP. En-hancements include both sub-sampling and asynchronousversions that run effectively on large-scale, distributed, het-erogeneous computing platforms.

Jeffrey LinderothUniversity of [email protected]

Natashia BolandGeorgia Institute of [email protected]

Jeffrey ChristiansenMathematical Sciences,School of Science, RMIT [email protected]

Andrew EberhardRMIT [email protected]

Fabricio OliveiraMathematical Sciences,School of Science, RMIT [email protected]

Brian C. DandurandArgonne National [email protected]

Cong Han Lim?University of [email protected]

James LuedtkeUniversity of Wisconsin-MadisonDepartment of Industrial and Systems [email protected]

Stephen J. WrightUniversity of Wisconsin-Madison

[email protected]

SP1

SIAG/OPT Prize Lecture: Proximal MinimizationAlgorithms for Nonconvex and Nonsmooth Prob-lems

We introduce a self-contained convergence analysis frame-work for first order methods in the setting of nonconvex andnonsmooth optimization problems. Our approach buildson the powerful Kurdyka-Lojasiewcz property. It allows foranalyzing, under mild assumptions, various classes of non-convex and nonsmooth problems with semi-algebraic data,a property shared by many optimization models arisingin various fundamental data science paradigms. We illus-trate our results by deriving a new and simple proximal al-ternating linearized minimization algorithm (PALM). Theversatility of PALM permits to exploit structures and datainformation relevant to important applications and pavesthe ways to the derivation of other interesting algorithmicvariants.

Shoham SabachTechnion–Israel Institute of TechnologyFaculty of Industrial Engineering and [email protected]

Jerome BolteUniversite Toulouse [email protected]

Marc TeboulleTel Aviv [email protected]

CP1

Robust Optimization in Adaptive Radiation Ther-apy Planning

Uncertainties in radiation therapy can lead to non-negligible deficiencies in treatment quality which can be ad-dressed by robust optimization. Adaptive radiation ther-apy allows for adaptation during the course of the treat-ment to uncertainties that cannot be accounted for duringthe planning process. We first present a simulation frame-work combining robust planning with adaptive reoptimiza-tion strategies based on stochastic minimax optimizationfor a series of simulated treatments on a one-dimensionalphantom subjected to systematic and random uncertain-ties. The plan applied during the first fractions can han-dle anticipated uncertainties. The individual uncertaintiesand their impact on treatment quality is measured at fixedtime points. The plan is reoptimized based on the measure-ments, if necessary. This adapted plan is applied during thesubsequent fractions until a next adaptation becomes nec-essary. In the adaptive strategy proposed, the measureduncertainty scenarios and their assigned probabilities areupdated to guide the robust reoptimization. We also dis-cuss other adaptation strategies. The simulations showthat robustly optimized plans perform well in presence ofuncertainties similar to the anticipated ones. In case of un-predictably large uncertainties, robust adaptive strategiesmanage to adjust to the unknown probability distribution.We also discuss optimal control strategies applicable in thisframework.

Michelle BoeckKTH Royal Institute of TechnologyKTH Royal Institute of Technology

Page 3: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

92 OP17 Abstracts

[email protected]

Anders ForsgrenKTH Royal Institute of TechnologyDept. of [email protected]

Kjell Eriksson, Bjorn HardemarkRaySearch Laboratories [email protected],bjorn.hardemark@raysearchlabscom

CP1

Explicit Optimization of Plan Quality Measuresin Intensity-Modulated Radiotherapy TreatmentPlanning

The aim of intensity-modulated radiotherapy is to irradiatethe tumor while sparing organs and healthy tissue. Thisis translated into conflicting objective functions of a treat-ment plan multicriteria optimization model. An optimizedplan is approved only if the plan quality is acceptable;if not, the plan undergoes re-optimization with adjustedobjective functions. The conventional approach is to usepenalty-based functions and is known to necessitate sev-eral re-optimizations. We hypothesize that objective func-tions with a more direct relationship to plan quality metricswould reduce the need for fine-tuning. We investigate thepotential of such an explicit approach by approximatingthe optimization of plan quality indices using mean-tail-doses as objective functions. The result is a linear pro-gramming problem. A drawback of using mean-tail-dosesis the largely increased problem size, now of the same orderas the number of voxels (at least 105). However, by us-ing the specific problem structure, a tailored higher-orderinterior-point method significantly reduces the computa-tional cost. Each iteration only requires factorization of amatrix whose size is proportional to the number of beam-lets (of order 103 in many cases). We present results ofapplying the explicit approach to optimize sliding-windowtreatment plans, and compare these to plans automaticallygenerated using a commercially available algorithm for con-ventional planning.

Lovisa EngbergDivision of Optimization and Systems TheoryDepartment of Mathematics, KTH Royal Institute [email protected]

Kjell ErikssonRaySearch Laboratories [email protected]

Anders ForsgrenKTH Royal Institute of TechnologyDept. of [email protected]

Bjorn HardemarkRaySearch Laboratories, [email protected]

CP1

Cluster Analysis of Biological Medicine

Many name-brand biological medicines are currently facingpatent expiration, thus creating an opportunity for phar-

maceutical companies to develop cost-effective alternativesto their name-brand counterparts. These alternatives aretypically referred to as “biosimilars’, and have the potentialto significantly reduce patient costs by introducing compe-tition into the pharmaceutical marketplace. However, be-fore biosimilars are released to consumers, they must un-dergo a rigorous comparison with their name brand coun-terparts to ensure safety and efficacy. Scientists have re-cently started using Nuclear Magnetic Resonance (NMR)experiments as a basis for this comparison. The resultof such an NMR experiment is a highly detailed two-dimensional image, which can reveal the chemical structureunderlying the biosimilar in question. Current analysis ofNMR protein images relies on a combination of principalcomponent analysis and visual inspection. By formulatingthe image comparison question as an optimal transportproblem, we introduce a quantitative metric for comparingprotein structure. Moreover, we employ this metric in aclustering algorithm to group proteins according to theirchemical structure.

Ryan M. Evans, Anthony KearsleyApplied and Computational Mathematics DivisionNational Institute for Standards and [email protected], [email protected]

CP1

A Markov Decision Process Approach to Optimiz-ing Cancer Therapy Using Multiple Modalities

There are several different modalities, e.g., surgery,chemotherapy, and radiotherapy, that are currently used totreat cancer. It is common practice to use a combinationof these modalities to maximize clinical outcomes, a bal-ance between maximizing tumor damage while minimizingnormal tissues toxicity due to treatment. However, multi-modality treatment policies are mostly empirical in cur-rent practice, and therefore subject to individual cliniciansexperiences. We present a novel formulation of optimalmulti-modality cancer management using a finite-horizonMarkov decision process approach. Specifically, at eachdecision epoch the clinician chooses the optimal treatmentmodality based on the patients observed state, which wedefine as a combination of tumor progression and normaltissue side effect. Treatment modalities are categorized as(1) Type 1, which has a high risk and high reward but canbe implemented only once during the course of treatment;(2) Type 2, which has a lower risk and lower reward thanType 1 but may be repeated; and (3) Type 3, no treat-ment (surveillance), which is the only action that has thepossibility of reducing normal tissue side effect at the riskof worsening tumor progression. Numerical simulations us-ing various intuitive reward functions show the structuralinsights of optimal policies and the potential benefits ofusing a rigorous model for optimal multi-modality cancermanagement.

Kelsey MaassUniversity of [email protected]

Minsun KimDepartment of Radiation OncologyUniversity of Washington School of [email protected]

CP1

A Parallel Implementation of the Surrogate Man-agement Framework for Optimization in Cardiovas-

Page 4: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 93

cular Surgery

The evolution of surgical methods in cardiovascular diseasehas traditionally relied on trial & error, outcomes statis-tics, and surgeon experience. With the advent of highperformance computing and patient-specific hemodynamicsimulation capabilities, a systematic approach to patient-specific treatment planning, combining simulations and op-timization, is now feasible. However, clinically-relevantcost functions are typically expensive to compute and areoften non-smooth. In this study, we employ the Surro-gate Management Framework (SMF), a derivative free op-timization method, in conjunction with black-box hemody-namics simulations for cardiovascular optimization. SMFcycles between an exploratory search step that acceleratesconvergence and a pattern search based poll step that en-sures convergence. To make the SMF more tractable fora time budget, we present a parallel framework for opti-mization using openDIEL, which allows for multiple simul-taneous function evaluations in a high performance com-puting environment. We compare performance of severalcompeting infill strategies, as well as methods to addressill-conditioning of the surrogate function. We demonstratethe performance of the parallel SMF framework, comparedto a standard serial implementation, on a set of bench-mark problems and quantify changes in surrogate qualityand performance. Finally, we demonstrate an applicationin shape optimization of a cardiovascular surgery for pa-tients with congenital heart disease.

Aekaansh VermaDepartment of Mechanical Engineering,Stanford [email protected]

Kwai L. WongJoint Institute for Computational ScienceUniversity of Tennessee/[email protected]

Alison MarsdenSchool of MedicineStanford University, [email protected]

CP2

Robust Convex and Conic Quadratic Optimizationwith CONCAVE Uncertainty

We derive (computationally) tractable reformulations ofthe robust counterparts of convex quadratic and conicquadratic constraints with concave uncertainties for abroad range of uncertainty sets. Furthermore, forquadratic problems with linear uncertainty containing amean vector and covariance matrix, we construct a nat-ural uncertainty set using an asymptotic confidence leveland show that its support function is semi-definite repre-sentable. Finally, we apply our results to a portfolio choiceproblem.

Dick Den Hertog, Ahmadreza Marandi, BertrandMelenbergTilburg [email protected], [email protected],[email protected]

Aharon Ben-TalTechnion - Israel Institue of TechnologyIsrael

[email protected]

CP2

Applying Robust Optimization to Convex Nonlin-ear Regression

In many optimization problems, some of the variables aresubject to a convex nonlinear constraint, with the param-eters of that constraint determined through the solutionof a nonlinear regression model. We develop a frameworkfor applying robust optimization to problems of this form,by constructing a piecewise linear approximation to theconvex constraint. The parameters of the different linearpieces are connected through a unifying robust constraint.We fix the location of the boundaries between the pieces ofthe function. Under this assumption, the subproblem fordetermining the parameters can be solved explicitly as afunction of the variables of the original optimization prob-lem. It then follows that the robust version of the originaloptimization problem can be formulated as a single con-vex integer programming problem. Various formulationsof the robust constraint are considered. The approach isillustrated through application to a problem in humanitar-ian logistics. In this situation, part of the cost functioncorresponds to the deprivation costs, which are typicallymodeled as convex exponential functions of the time a per-son is deprived of a resource.

Dan LuRensselaer Polytechnic Institute (RPI)[email protected]

John E. MitchellRensselaer Polytechnic Institute110, 8th Street, Troy, NY, [email protected]

CP2

Robust Convex and Conic Quadratic Optimizationwith CONVEX Uncertainty

For convex quadratic and conic quadratic constraints withconvex uncertainty, it is well-known that the robust coun-terpart is, in general, intractable. We derive tractable in-ner and outer approximations of the robust counterpart ofsuch constraints for various types of convex compact uncer-tainty sets. The approximations are made by replacing thequadratic terms in the uncertain parameters with suitablelinear upper and lower bounds and applying our results toa quadratic constraint with linear uncertainty to derive atractable formulation. Finally, we apply our results to anorm approximation and a regression line problem.

Ahmadreza MarandiTilburg [email protected]

Aharon Ben-TalTechnion - Israel Institue of [email protected]

Dick Den HertogTilburg UniversityFaculty of Economics and Business [email protected]

Bertrand Melenberg

Page 5: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

94 OP17 Abstracts

Tilburg [email protected]

CP2

Robust Optimization Approach for MultivariateSampling Allocation Problems in Stratified Sam-pling

A central problem in survey sampling is the optimal allo-cation of samples to entities of the population. The pri-mary goal of the allocation is to minimize the variancesof estimated totals of the population which, in practice,often has to be treated as a multivariate allocation prob-lem. The optimal solution in stratified sampling, wherethe strata are the entities, relies on the stratum specificvariances of the (multiple) variables of interest that areassumed to be known from earlier surveys or highly corre-lated variables. However, these stratum specific variancesare generally not known precisely. In order to account forthis uncertainty we formulate a robust optimization modelfor this multivariate stratified allocation problem. Usingrobust optimization, we seek an optimal allocation whichis feasible for any realization of the uncertain variances.A real life problem is considered and solved in order tovalidate the applicability of the model.

Mohammad Asim NomaniResearch and Training Group on AlgorithmicOptimizationDepartment of Mathematics, University of Trier,[email protected]

Jan Pablo BurgardEconomic and Social Statistics DepartmentUniversity of Trier, [email protected]

Mirjam DurDepartment of MathematicsUniversity of [email protected]

Ralf MunnichEconomic and Social Statistics DepartmentUniversity of Trier, [email protected]

CP2

Distributed Adjustable Robust Optimization

Conventional robust optimization (RO) considers that un-certain parameters belong to some compact (uncertainty)set and all variables must be determined before the real-ization of uncertain parameters becomes known. All vari-ables represent here and now decisions. Often a subsetof variables referred to as adjustable can be chosen afterthe realization of the uncertain data while others (nonad-justable) must be determined before its realization. Theformer variables represent wait and see decisions that canbe made when revealing part of the uncertain data andthat can adjust to the corresponding part of the data. Inthis case, Adjustable RO (ARO) produces robust counter-parts (ARC) that are less conservative than the usual RCthough computationally harder. In the non-robust con-text, decomposition provides a useful method where thestructure of the original problem can be exploited in or-der to split it into subproblems that are repeatedly solved.

The major strength of decomposition methods stems fromtheir ability to reformulate the original problem and gener-ate subproblems (coupled through a master problem) thatare significantly easier to solve than the original problemand that can be solved concurrently (if subproblems areindependent). However, the decomposable structure of theuncertain original problem (OP) may disappear when for-mulating its ARC. Hence, preserving the decomposabilityproperty becomes essential in order to allow distributedsolving of ARCs.

Dimitri PapadimitriouBell [email protected]

CP2

Adjustable Robust Optimization via Fourier-Motzkin Elimination

We demonstrate how adjustable robust optimization(ARO) problems with fixed recourse can be casted as staticrobust optimization problems via Fourier-Motzkin elimi-nation (FME). Through the lens of FME, we characterizethe structures of the optimal decision rules for a broadclass of ARO problems. A scheme based on a blendingof classical FME and a simple Linear Programming tech-nique that can efficiently remove redundant constraints,is developed to reformulate ARO problems. This genericreformulation technique enhances the classical approxima-tion scheme via decision rules, and enables us to solve ad-justable optimization problems to optimality. We show vianumerical experiments that, for small-size ARO problemsour novel approach finds the optimal solution. For mod-erate or large-size instances, we eliminate a subset of theadjustable variables, which improves the solutions from de-cision rule approximations.

Jianzhe ZhenFaculty of Economics and Business AdministrationTilburg [email protected]

Dick Den HertogTilburg UniversityFaculty of Economics and Business [email protected]

Melvyn SimDepartment of Decision Sciences NUS Business SchoolNational University of [email protected]

CP3

A Distributed Jacobi Algorithm for Large-ScaleConstrained Convex Optimization

We present an iterative distributed Jacobi algorithm forsolving large-scale constrained convex optimization prob-lems that come from distributed model predictive control(DMPC). The centralized problem has convex cost func-tion with respect to affine constraints, assuming there arecouplings between subsystems in both the cost functionand the constraints. The proposed distributed approachinvolves a decomposition based on the sparsity structureof the centralized problem, then a cooperative Jacobi par-allel iteration is performed, the new iterate is obtained by aconvex combination of new local solutions and old iterates.Starting from a feasible solution, the algorithm iterativelyreduces the cost function value with guaranteed feasible so-

Page 6: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 95

lutions at every iteration step, thus it is suitable for DMPCapplications in which hard constraints are important. Weprovide conditions for convergence to centralized optimal-ity using the distributed scheme. This proposed approachis applicable to inter-connected systems where the numberof subsystems is large, the coupling is sparse, and localcommunication is available. This work follows the work onJacobi algorithm for large-scale unconstrained convex op-timization in: Bertsekas, D.P. and Tsitsiklis, J.N. (1989).Parallel and Distributed Computation: Numerical Meth-ods. Prentice-Hall, Upper Saddle River, NJ.

Dang DoanSystems Control and Optimization LaboratoryUniversity of [email protected]

CP3

Frchet Differentiability of the Boussinesq Flow withRespect to Domain Variations

In this talk we investigate shape differentiability proper-ties of the solution operator associated with the unsteadyincompressible Boussinesq equations and the method ofmapping by Murat and Simon. More precisely, by usingthe method of mapping we transform the Boussinesq equa-tions defined on Ω to a reference domain Ωref = τ−1(Ω)and show that the corresponding solution operator τ �→(v, θ, p)(τ ) is Frchet differentiable in a suitable Banachspace setting. To this end, we propose a general analyticalframework beyond the implicit function theorem to showthe Frchet differentiability of the transformation-to-statemapping conveniently. This framework can be applied toa variety of other shape optimization or optimal controlproblems and takes care of the usual norm discrepancyneeded for nonlinear problems to show differentiability ofthe state equation and invertibility of the linearized opera-tor. The analysis is carried out for Ωref being a Lipschitzdomain and aims at minimizing the regularity assumptionson the transformation variable τ. This is in parts based onjoint work with F. Lindemann, M. Ulbrich and S. Ulbrich.

Michael H. FischerTU [email protected]

Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected]

CP3

Nash Equilibrium Strategy for Uncertain MarkovJump Linear Stochastic Systems with Multiple De-cision Makers

Nash equilibrium strategy for uncertain Markov jump lin-ear stochastic systems with multiple decision makers is ad-dressed. First, the stochastic algebraic Riccati inequalityis established such that the closed-loop stochastic systemis exponentially mean-square stable and has a cost bound.Second, a robust strategy set is established such that Nashequilibrium condition is satisfied. It is shown that the nec-essary conditions, which are determined from the Karush-Kuhn-Tucker conditions, for the existence of a strategy setare defined by means of solutions of cross-coupled stochas-tic algebraic Riccati equations. Furthermore, Newton’smethod is used for solving the CSAREs. In an alternate

approach designed to simplify implementation, a recursivetechnique based on linear matrix inequality as the convexoptimization technique is applied. Finally, a numerical ex-ample is provided to demonstrate the effectiveness of themethod.

Hiroaki MukaidaniInstitute of Engineering, Hiroshima [email protected]

CP4

jInv - A Flexible Julia Package for Parallel PDEParameter Estimation

jInv is a software package and framework for the solutionof large-scale PDE constrained optimization problems, im-plemented in the Julia programming language. It supportslinear and nonlinear PDE models and provides many toolscommonly used in parameter estimation problems such asdifferent misfit functions, regularizers, and efficient meth-ods for numerical optimization. Also, it provides easy ac-cess to both iterative and direct linear solvers for solvinglinear PDEs. A main feature of jInv is the provided easyaccess to parallel and distributed computation supportinga variety of computational architectures: from a single lap-top to large clusters of cloud computing engines. Beingwritten in the high-level dynamic language Julia, it is eas-ily extendable and yet fast. We demonstrate the capa-bilities of jInv using numerical experiments motivated bygeophysical imaging.

Patrick BelliveauUniversity of British [email protected]

Eldad HaberDepartment of MathematicsThe University of British [email protected]

Lars RuthottoDepartment of Mathematics and Computer ScienceEmory [email protected]

Eran TreisterBen Gurion University of the [email protected]

CP4

Adaptive, Inexact Optimization with Low-RankTensors for Optimal Control of PDEs under Un-certainty

We present an adaptive, inexact optimization method forthe solution of optimal control problems with PDEs underuncertainty. These parametrized PDEs are formulated ina suitable tensor Banach space, which gives rise to a dis-cretization with coefficients in tensor form. To avoid thecurse of dimensionality these coefficients are representedin a low-rank tensor format (Hierarchical Tucker or TensorTrain). Therefore, the developed methods are tailored tothe usage of low-rank tensor arithmetics, which only offer alimited set of operations and require truncation (rounding)in between. The occurring errors are controlled to ensureglobal convergence of the algorithm.

Sebastian GarreisTU Muenchen, Germany

Page 7: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

96 OP17 Abstracts

[email protected]

Michael UlbrichTechnische Universitaet MuenchenChair of Mathematical [email protected]

CP4

Adaptive Eigenspace Method for Inverse Scatter-ing Problems in the Frequency Domain

A nonlinear optimization method is proposed for the so-lution of inverse scattering problems in the frequency do-main, when the scattered field is governed by the Helmholtzequation. The time-harmonic inverse medium problem isformulated as a PDE-constrained optimization problemand solved by an inexact truncated Newton-type itera-tion. Instead of a grid-based discrete representation, theunknown wave speed is projected to a particular finite-dimensional basis of eigenfunctions, which is iterativelyadapted during the optimization. Truncating the adap-tive eigenspace (AE) basis at a (small and slowly increas-ing) finite number of eigenfunctions effectively introducesregularization into the inversion and thus avoids the needfor standard Tikhonov-type regularization. Both analyti-cal and numerical evidence underpins the accuracy of theAE representation. Numerical experiments demonstratethe efficiency and robustness to missing or noisy data ofthe resulting adaptive eigenspace inversion (AEI) method.

Marcus J GroteDepartment of Mathematics and Computer ScienceUniversity of Basel, [email protected]

Marie Graff-KrayDepartment of Earth, Ocean and Atmospheric SciencesUniversity of British Columbia, Vancouver BC, [email protected]

Uri NahumDepartment of Mathematics and Computer ScienceUniversity of Basel, [email protected]

CP4

Controlling the Footprint of Droplets via SurfaceTension

The development of engineered substrates has progressedto a very advanced level, which allows for control of theshape of sessile droplets on these substrates. Controllinglocal droplet shape via substrate surface tensions may beuseful for directing the growth of bio-films and cell culturesbecause it can affect the distribution of nutrients as well asthe gross shape of the film. In addition, depositing a filmof material onto a substrate in a particular pattern couldbe affected by the droplet shape. Also, droplets can act aslenses, with focal properties controlled by locally modify-ing substrate tensions. In this talk, we present an optimalcontrol for the shape of droplets on substrates. Specifi-cally, we wish to direct the shape of the droplet-substrateinterface (i.e. the liquid-solid interface) by controlling thesubstrate surface tension through the minimization of anappropriate cost functional. We refer to this as dropletfootprint control.

Antoine Laurain

Humboldt-University of [email protected]

Shawn W. WalkerLouisiana State UniversityDepartment of Mathematics and [email protected]

CP5

Approximation of Stochastic Equilibrium Problemsvia Lopsided Convergence

In this talk, a review of an approximation scheme for solv-ing maxinf (minsup) optimization problems is presented,with a special focus on stochastic equilibrium problems.The family of the problems that can be modeled underthis setting is appealingly large, ranging from linear andnonlinear complementarity problems, fixed points, varia-tional inequalities, equilibrium problems, as well as op-timization under stochastic ambiguity, robust optimiza-tion, risk-based optimization, and semi-infinite optimiza-tion. The mathematical programming problem is set tofind a maxinf (minsup) point of a given bifunction. Giventhis setting, a solution strategy is proposed using an ap-proximation scheme, based on the application of lopsidedconvergence theory. The first definition of lopsided con-vergence was established during the eighties for extendedreal value bifunctions, then revised by 2010, for finite valuefunctions over product sets. Lately, it was revisited by ex-tending the definition to more general domains and propos-ing an associated metric, the lopsided distance. Finally,some numerical results are presented for three examples:1) a general equilibrium model for an exchange economywith uncertainty; 2) a stochastic general equilibrium modelwith financial markets, and 3) an infrastructure planningfor fast electric vehicles charging station problems.

Julio DerideUniversity of California [email protected]

CP5

On Stochastic Nonlinear Minimax Optimization

We study the formulation minx∈X g(x) where g(x) =maxy∈Y h(x, y) = Eξ[H(x, y, ξ)], when h(x, y) is concavein y for fixed x but g(x) is not convex in x. These arise,for instance, in optimizing total profit by maximizing overexogenous revenue g obtained from sales x and minimiz-ing production cost h over supply y, where the randomquantity ξ may represent uncertainty in production yield orsupply chain unreliability etc. The convergence guaranteesof the standard technique of sample average approximation(SAA) of a Lagrangian reformulation, provided for stochas-tic convex-concave minimax formulations, do not apply inthis case. We propose a stochastic first-order (gradient fol-lowing) recursion for the indefinite outer objective g andcombine it with SAA for the concave h. An estimate for thegradient of g is obtained from the concave inner problemvia parametric programming, but is biased because of thefinite sample approximation of h. We provide conditionsunder which the bias in the iterations can be controlled toobtain convergence. Conditions under which the algorithmis computationally efficient, i.e. when it achieves the fastestpossible convergence rate (in a certain precise sense), arespecified.

Soumyadip Ghosh, Mark SquillanteIBM Research

Page 8: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 97

[email protected], [email protected]

Ebisa WollegaIE, Colorado State, [email protected]?

CP5

Stochastic Algorithms for Minimizing RelativelySmooth Functions

Relative smoothness - a notion recently introduced by Lu,Freund and Nesterov - generalizes the standard notion ofsmoothness typically used in the analysis of gradient typemethods. We develop several randomized algorithms forminimizing relatively-smooth functions such as stochasticgradient descent and randomized coordinate descent, thusextending the reach of modern fast stochastic methods toa wider class of functions.

Filip Hanzely, Peter RichtarikUniversity of [email protected], [email protected]

CP5

Reconfigurable Optimization

We develop a novel modeling framework for a class of op-timization problems in which operational decisions are al-tered as a system transitions stochastically between worldstates. Our framework addresses the problem of minimiz-ing reconfiguration costs associated with changing the op-erational decisions and ensuring good quality solutions ineach individual state. We compare this framework withexisting modeling frameworks like Markov decision pro-cesses, stochastic programming, and robust optimization.We then use our framework to formulate a large real-world power systems problem and develop a Lagrangian-relaxation based algorithm to solve it.

Prateek R. Srivastava, Nediako DimitrovUniversity of Texas at [email protected], [email protected]

David AldersonNaval Postgraduate [email protected]

CP5

Stochastic Methods for Stochastic Variational In-equalities

We consider the stochastic variational inequality problem(svi): solve a variational inequality over a convex closedset whose operator is only revealed trough an unbiasedstochastic oracle. In particular, svi includes first order con-ditions of stochastic optimization, saddle-point problemsand Nash games. We propose stochastic approximatedmethods based on cheap first-order methods for large-scaleproblems assuming only monotonicity and Lipschitz conti-nuity. We weaken significantly the standard assumptionsin stochastic approximation: (1) the feasible set and oper-ator may unbounded without strong-monotonicity and (2)we do not require an oracle with uniformly bounded vari-ance (e.g. multiplicative noise). One class of methods arebased on extragradient schemes. Under a mini-batch pro-cedure, we obtain rate O(1/k) with near-optimal samplecomplexity O(ε−2) under the setting of (1)-(2), improv-

ing the previous rate O(1/√k) assuming boundedness or

strong-monotonicity or uniformly bounded variance. A sec-ond class of methods include randomized alternating pro-jections when the number of constraints is too large or itscomponents are revealed in an online fashion.

Philip Thompson

Center for Mathematical Modeling (CMM)[email protected]

Alfredo N. IusemIMPA, Rio de [email protected]

Alejandro JofreUniversidad de Chile, Santiago, [email protected]

Roberto OliveiraIMPA, Rio de [email protected]

CP5

Structural Properties of Probability Constraints

A probability function results from a measure acting ona random inequality system depending both on a decisionand random vector. Probability constraints requests thatsuch a system holds with high enough probability in or-der to ensure safety of the decision. In this talk we willdiscuss recent insights concerning differentiability of prob-ability functions, but too new insights on convexity of fea-sible sets for probability constraints. The discussed resultscover nonlinear systems as well as elliptically distributedrandom vectors.

Wim Van AckooijEDF R&DDepartment [email protected]

Rene HenrionWeierstrass InstituteBerlin, [email protected]

Jerome MalickUniversity of [email protected]

CP6

On the Relation Between the Cauchy and the Gra-dient Sampling Method

The minimization of nonsmooth functions appears in manyreal world applications. Therefore, robust and efficient al-gorithms for this kind of problem are a necessity. Recently,a method known as Gradient Sampling (GS) has gained at-tention by its good numerical results and by its intuitivefunctioning. Although one can view the GS method as ageneralization of the steepest descent method, studies ex-ploring this relation are scarce. Since the GS method findsits search direction in a nondeterministic approach, it is nota trivial task to establish a connection between the descentdirection taken by the Cauchy method and the GradientSampling algorithm in each iteration. In order to have abetter understanding of this relation, this study presents alocal convergence result for the GS method, showing howthe linear rate of the steepest descent method can be ob-

Page 9: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

98 OP17 Abstracts

tained for the GS algorithm under special circumstances.

Lucas Eduardo Azevedo SimoesUniversity of [email protected]

Elias Salomao HelouUniversity of Sao [email protected]

Sandra Augusta SantosUniversity of [email protected]

CP6

Weak Subgradient Based Solution Algorithm inNonconvex Unconstrained Optimization

In this work, we study a weak subgradient based solutionalgorithm in nonconvex unconstrained optimization. Theconcept of weak subgradients in nonconvex analysis, is in-troduced by Azimov and Kasimbeyli, which is one of thenatural generalizations of the classic subdifferential of theconvex analysis. The weak subdifferentiability of a func-tion at the given point means the existence of a supportingcone (instead of a supporting hyperplane in convex anal-ysis) to the epigraph of the function at the point underconsideration. Because of this property the weak subd-ifferentiability does not require convexity condition. Wepropose a solution algorithm for unconstrained minimiza-tion problem, where no convexity is assumed and at everyiteration the weak subgradient of the objective functionis used to generate new solution. The paper investigatesconvergence properties for the sequence generated by thealgorithm. The performance of the proposed algorithm isdemonstrated on test problems from the literature.

Gulcin Dinc Yalcin, Refail KasimbeyliAnadolu [email protected], [email protected]

CP6

Optimality Conditions for Nonsmooth ConstrainedOptimization Problems

Nonsmoothness arises in many practical optimization prob-lems and can often be recast in so-called abs-normal formwhere every occurence of nonsmoothness is expressed interms of the absolute value function. Necessary and suffi-cient first and second order optimality conditions for thisclass of unconstrained nonlinear nonsmooth minimizationhave recently been developed. We discuss illustrative ex-amples and extend the theory to nonsmooth constrainedoptimization.

Lisa C. HegerhorstLeibniz Universitat HannoverInstitute for Apllied [email protected]

CP6

Clustering in Large Data Sets with the LimitedMemory Bundle Method

Clustering is among most important tasks in data min-ing. This problem in large data sets is challenging formost existing clustering algorithms. Here we introduce

a new algorithm based on nonsmooth optimization tech-niques for solving the minimum sum-of-squares clusteringproblems in large data sets. The clustering problem is firstformulated as a nonsmooth optimization problem. Thenthe limited memory bundle method [Haarala et.al. Math.Prog., Vol. 109, No. 1, pp. 181–205, 2007] is modified andcombined with an incremental approach to solve this prob-lem. The algorithm is evaluated using real world data setswith both large numbers of attributes and large numbersof data points. The new algorithm is also compared withsome other optimization based clustering algorithms. Thenumerical results demonstrate that the new algorithm isboth efficient and accurate and it can be used to providereal-time clustering in large data sets.

Napsu KarmitsaDepartment of MathematicsUniversity of [email protected]

Adil BaghirovCIAO, School of ITMS,The University of [email protected]

Sona TaheriFederation University AustraliaFaculty of Science and [email protected]

CP6

Nonsmooth DC Optimization Algorithm for Clus-terwise Linear L1 Regression

The aim of this research is to develop an algorithm for solv-ing the clusterwise linear least absolute deviations regres-sion problem. This problem is formulated as a nonsmoothnonconvex optimization problem. The objective functionin the problem is represented as a difference of two convexfunctions. In this representation the first DC componentis expressed as a sum of simple nonsmooth functions. Weproposed to smooth the first DC component. Such an ap-proach allows to design algorithms for which strong con-vergence results can be obtained. Optimality conditionsare derived using this representation. An algorithm is de-signed based on the difference of convex representation andan incremental approach. The proposed algorithm is testedusing synthetic, real world small and large scale data setsfor regression analysis.

Sona TaheriFederation University AustraliaFaculty of Science and [email protected]

Adil BaghirovCIAO, School of ITMS,The University of [email protected]

CP6

An SQP Trust-Region Algorithm for ConstrainedOptimization Without Derivatives

Constrained optimization without using derivatives is anactive area of research as there are several reasons, deriva-tive information can not be retrieved. Derivative-free opti-mization algorithms modeling the objective and constraintfunctions by any kind of polynomials have shown to be

Page 10: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 99

very efficient. We want to present our SQP trust-regionalgorithm for constrained optimization without derivativesusing different interpolation techniques. The applicationof variable size models between linear and quadratic de-gree and under-determined interpolation, where signifi-cantly less sample points are needed, is of special inter-est. Our implementation is compared to other well-knownDFO-libraries. Numerical experiments will be carried outon a subset of the CUTEst test problem library and on areal-life application.

Anke TroeltzschGerman Aerospace Center (DLR)Cologne, [email protected]

CP7

Design and Optimization of a Cylindrical Explosive

The design of explosive devices has a wide reaching impactin mining, demolition, and military applications. Tradi-tionally the design of explosives is an expensive and timeconsuming process; involving extensive machining, build-ing experimental setups, and going to great lengths to en-sure the safety of everyone involved. These material andlabor costs can be significantly reduced by using compu-tational tools to assist in the design of an explosive. Herea cylindrically contained high explosive was analyzed us-ing Dakota, an optimization toolkit developed at SandiaNational Laboratories, coupled with two hydrocodes: KO,a one dimensional Lagrangian formulation, and CTH, anindustry standard Eulerian shock physics code. Initial stud-ies were performed with KO to optimize the thickness ofa cylinder for varying outer radii; the explosive was mod-elled as a bulk gas with pressure and velocity taken from astandard STEX test. This resulted in an intuitive paramet-ric surface which was both smooth and convex. These re-sults agreed with a second study of a one dimensional CTHsimulation using an HEBURN explosive TNT model. Fur-ther investigation showed a degree of similitude in the onedimensional model, and a potential to reduce the systemdown to a scalar relationship. This provides a foundationto explore and simplify the three dimensional case. Theeffects of shell geometry will also be explored by parame-terizing the outer profile as an input to the optimizer.

Logan E. Beaver, John BorgMarquette [email protected], [email protected]

Geremy KleiserEglin Air Force [email protected]

CP7

A Stochastic Programming Framework for Multi-Stakeholder Decision-Making and Conflict Resolu-tion

Engineering decision-making is inherently multiobjectiveand involves multiple stakeholders. As such, it suffersfrom ambiguity and dimensionality. We propose a decision-making framework to compute compromise solutions thatbalance conflicting priorities of multiple stakeholders. Inour setting, we shape the stakeholder dissatisfaction dis-tribution by minimizing a strongly monotone risk met-ric, such as conditional value-at-risk (CVaR) and entropicvalue-at-risk (EVaR). Such risk metrics are often param-eterized by a probability level that shapes the tail of the

dissatisfaction distribution. The proposed approach allowsus to compute a family of compromise solutions and gen-eralizes multi-stakeholder settings previously proposed inthe literature. For the CVaR metric, we use the conceptof the CVaR norm for geometric interpretation and to es-tablish strong monotonicity, which guarantees Pareto effi-cient compromise solutions under mild assumptions aboutstakeholder priorities. We discuss a broad range of po-tential applications that involve complex decision-makingprocesses. We demonstrate the developments by balanc-ing stakeholder priorities on transportation, safety, waterquality, and capital costs in a biowaste facility location casestudy. We also discuss extensions to energy system designand operation, such as combined cooling, heat and powersystems, and solar power generators with energy storage.

Alexander W. DowlingUniversity of Wisconsin-MadisonDepartment of Chemical and Biological [email protected]

CP7

Optimal Power Flow Based on Holomorphic Em-bedded Load Flow

Optimal Power Flow (OPF) is known as a difficult globaloptimization problem. Due to its importance in its ownright and as a subproblem in other power systems opti-mization problems it has attracted a lot of attention egby SDP relaxations. If solved by a standard NLP solverto local optimality the resulting solution may not just besuboptimal but also non-physical, in that it violates busstability criteria. The Holomorphic Embedded Load Flow(HELM) method has been suggested recently by AntonioTrias as an alternative (and efficient) method of solving therelated load flow problem. The claim (observed in practice,but not formally proven) is that HELM will never returnunstable solutions. We present a novel method to wrap theHELM method in an SQP framework to arrive at a OPFsolver that avoids unstable solutions. This does not avoidall local solutions, but a significant number depending onthe problem in question. HELM-OPF is shown to be com-petetive with other local OPF solvers, but has a higherchance of finding the global solution.

Andreas GrotheySchool of MathematicsUniversity of [email protected]

Ian WallaceUniversity of [email protected]

CP7

Geometry and Topology Optimization of SheetMetal Profiles by Using a Branch-and-BoundFramework

We present a new approach combining well established pro-cedures from PDE-constrained and discrete optimization inorder to find an optimal design of a multi-chambered pro-file. Given a starting profile design, a load case and corre-sponding design constraints (e.g., sheet thickness, chambersizes), we want to find an optimal subdivision into a pre-defined number of chambers with optimal shape subjectto structural stiffness. Within the presented optimizationscheme a branch-and-bound tree is generated to model thetopological decisions. Therefore, each level of the tree rep-

Page 11: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

100 OP17 Abstracts

resents one additional chamber of the profile. Before thenext chamber is added, the geometry of the current profiledesign is optimized by an SQP method. This is followed byan application of the SIMP approach to solve a relaxationof a topology optimization problem, preparing the furthersubdivision of the profile. Based on this relaxation, a bestfitting feasible topology subject to manufacturability con-ditions is determined using a new mixed integer methodemploying shortest paths. To improve the running time,the finite element simulations for the geometry optimiza-tion and topology relaxation are performed with differentlevels of accuracy. Additionally, we present numerical ex-periments including different starting geometries, load sce-narios and mesh sizes.

Benjamin M. Horn, Hendrik Luthen, Marc PfetschResearch Group Optimization, Department ofMathematicsTU [email protected],[email protected],[email protected]

Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected]

CP7

Classification of Mass Spectra Using a Multi-Objective Control Approach

Mass spectrometry is an analytical chemistry techniquecommonly employed for chemical identification. The re-sulting mass spectrum generated during mass spectrome-try can be used to elucidate structural information aboutthe chemical sample. Identification of an unknown chemi-cal, given its mass spectrum and a curated library of massspectra of known compounds, can usually be accomplishedthrough a library search assuming that the instrument pro-duces a sufficiently accurate spectrum and the spectrum ofthe unknown chemical is present in the library. Unfortu-nately, many applications see the synthesis of new chemicalcompounds occurring at a rate that exceeds the growth ofcurated libraries (e.g. illicit drug development). As such,identification must be supplemented with rigorous classifi-cation approaches to generate expectations of an unknownchemical samples identity. In this presentation, we dis-cuss the unique challenges of mass spectral data analysisand introduce a multi-objective control-based mathemati-cal framework for improved classification and identificationof chemical compounds. Methods analysis and preliminaryresults will be discussed in detail.

Arun S. MoorthyNational Institute of Standards and [email protected]

Anthony KearsleyApplied and Computational Mathematics DivisionNational Institute for Standards and [email protected]

William E. WallaceNational Institute of Standards and Technology

[email protected]

CP7

Pricing and Clearing Combinatorial Markets withSingleton and Swap Orders

We present an algorithm that permits polynomial timemarket-clearing and -pricing for a certain class of combi-natorial markets. The results are presented in the contextof our main application: the futures opening auction prob-lem. Futures contracts are an important tool to mitigatemarket risk and counterparty credit risk. The contractscan be traded with varying expiration dates and under-lyings. Several exchanges also offer so-called futures con-tract combinations, which allow the traders for swappingtwo futures contracts with different expiration dates. Intheory, the price is in both cases the difference of the twoinvolved futures contracts. However, in particular in theopening auctions price inefficiencies often occur due to sub-optimal clearing, leading to potential arbitrage opportuni-ties. We present a minimum cost flow formulation of thefutures opening auction problem that guarantees consis-tent prices. The core ideas are to model orders as arcs ina network, to enforce the equilibrium conditions with thehelp of two hierarchical objectives, and to combine theseobjectives into a single weighted objective while preserv-ing the price information of dual optimal solutions. Theresulting optimization problem can be solved in polyno-mial time and computational tests establish an empiricalperformance suitable for production environments.

Johannes C. MullerFAU [email protected]

Sebastian PokuttaGeorgia [email protected]

Alexander MartinFAU [email protected]

Susanne PapeFAU Erlangen-Nurnberg, GermanyDiscrete [email protected]

Andrea PeterUniversity of [email protected]

Thomas WinterEurex Frankfurt [email protected]

CP8

Cybersecurity Using Interdiction

Recent cyber-attacks on private and public groups high-light the importance of a proper cybersecurity structure.Having well-structured cybersecurity decreases the vulner-ability of the cyber physical system. We present an MDPto compute an optimal attack policy. The MDP has anexponential number of states, and is based on tracking theset of available attacks for each link in the network. Weshow that it is possible to compute values for each MDPstate, and optimal attack policies using s-t reliability. We

Page 12: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 101

also present a network interdiction model to find the op-timal defense strategy for a cyber physical network. Un-like many cybersecurity models, our model considers thespecifics of the network structure under attack. Softwareresulting from the model can assess the optimal defensestrategy for a cyber physical system.

Murat Karatas, Nedialko DimitrovThe University of Texas at [email protected], [email protected]

CP8

Concavity and Bounds for the Stochastic DynamicProgramming Model on Social Networks

The problem of selecting an initial set of nodes in any net-work to create a cascade effect also known as the InfluenceMaximization (IM) problem has been extensively studiedand has become an effective means of optimizing revenuefor advertising companies via social networks. In this paperwe investigate a Stochastic Dynamic Programming (SDP)approach to the IM problem which involves dividing theproblem into stages or sub-problems, computing the solu-tions to these sub-problems and utilizing these solutionsin future stages. The objective function in the SDP ap-proach defines the value of an optimal solution recursivelyby expressing it in terms of optimal solutions for smallerproblems and solutions are obtained in a bottom-up fash-ion. Our SDP algorithm not only identifies an optimalallocation but also its optimal value. In this paper, we in-vestigate the Stochastic Dynamic Programming (SPD) ap-proach to solving the Influence Maximization problem anddemonstrate through our experimental results, the concav-ity of the objective function. It was found that the ob-jective function was consistently concave, when tested onnetworks with varying node degrees. By defining and ob-taining a secant line on our objective function, we providea theoretical proof that the objective function for our SDPmodel is concave. Consequently, we obtain an upper boundfor our optimal solution.

Trisha LawrenceThe University of [email protected]

CP8

Solution Attractor of Local Search System for theTraveling Salesman Problem

Although both the TSP and local search have huge litera-ture, there is still a variety of open problems. The study oflocal search for TSP continues to be an interesting problemin combinatorial optimization and computational mathe-matics. We study local search for the TSP from the per-spective of dynamical systems and treat a local search sys-tem as a discrete dynamical system. The attractor theoryin dynamical systems provides the necessary and sufficienttheoretical foundation to study the search behavior of localsearch system. In a local search system, all search trajec-tories converge into a small region of the solution space,called solution attractor. We will describe a procedure forconstructing solution attractor of a local search system forTSP. This procedure can be used to build an attractor-based search system to solve the TSP and its variations.We will also present our empirical study on some importantproperties of the solution attractor, including convergenceof local search trajectories, the size of the solution attrac-

tor, and quality of the tours in the solution attractor.

Weiqi LiUniversity of [email protected]

CP8

A Robust Supply Chain Network EquilibriumModel and its Analysis

Mathematical models associated with supply chain are oneof the most important objects in management science, andhence many researchers have studied these models. Be-cause supply chains involve many decision-makers (play-ers) and they independently decide their behaviors, com-petitive situations often occur. To investigate competitivesupply chain networks, Nagurney et al. (2002) proposed asupply chain network equilibrium (SCNE) model. On theother hand, recently, particular attention is paid to the riskmanagement of supply chain. In this talk, to analyze com-petitive supply chain networks involving uncertainty, wepropose a robust SCNE model by reconstructing the usualSCNE model. In the proposed model, each player can-not know exactly other players strategies, and they decidetheir strategy by assuming the worst case. We formulatethe robust SCNE model as a variational inequality problem(VIP). The set associated with the VIP is constructed bysecond-order cone constraints. We show the existence andthe uniqueness of the equilibrium under mild assumptions.Finally, some numerical results are given.

Yasushi NarushimaYokohama National UniversityDepartment of Manegement System [email protected]

Tatsuya HiranoYokohama National UniversityFaculty of International Social [email protected]

CP8

Nonlinear Transient Optimization in Gas Networks

Non-constant operation scenarios and operation costs re-quire non-stationary models for gas transport in pipelines.We present a transient optimization model that incorpo-rates gas dynamics and technical network elements. Directdiscretization leads to an NLP that can be solved for ex-ample with interior point methods. We discuss differentideas on exploitation of the problem structure and presentresults for simple networks and scenarios that show the po-tential to store highly volatile generated renewable energyin terms of pressure increase in gas networks.

Jan ThiedauLU HannoverInstitute for Applied [email protected]

Marc C. SteinbachLeibniz Universitat HannoverInstitute for Applied [email protected]

CP9

Geneva: Parametric Optimization in Distributed

Page 13: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

102 OP17 Abstracts

and Parallel Environments

The Geneva library of parametric optimization algorithmscovers execution on parallel devices ranging from GPG-PUs and many-core systems over clusters to Grids andClouds. Four optimization algorithms, including parti-cle swarms and evolutionary algorithms have been imple-mented, and best solutions from one algorithm may betransferred to another. Parallelization is mostly transpar-ent to the user, leaving little work for him to be done forexecution on a wide range of devices, once his optimizationproblems have been defined. The entire library is availableas Open Source, and targets particularly problems withvery long running, computationally expensive optimiza-tion problems, often involving simulations. The definitionof optimization problems may involve constraints betweenmultiple parameters. Geneva was originally developed forand used in science, particularly particle physics (hencethe name), but is today also used commercially. The pre-sentation covers the architecture of Geneva and introducesuse-cases. Note to organizers: This is meant as a presen-tation only, no paper-submission is intended.

Ruediger D. Berlich, Ariel Garcia, Sven GabrielGemfony [email protected], [email protected],[email protected]

CP9

Crash-Starting the Dual Simplex Method

This talk discusses how to find a good advanced basis startfor the dual simplex method. In particular, it will discussthe applicability of techniques in the spirit of the primalsimplex “idiot crash’ implemented by Forrest in Clp. Viapresolve, one-dimensional quadratic minimizations, cross-over and further presolve, this unpublished technique canobtain a very good advanced basis for an LP of very muchsmaller dimension than the original problem. Once this re-duced problem is solved using the primal simplex method,it yields a good (if not optimal) basis for the original prob-lem, allowing it to be solved quickly using the primal sim-plex method. After giving an insight into the primal sim-plex “idiot crash’ and discussing the LP problem charac-teristics for which the crash is effective, this talk will dis-cuss similar techniques for crash-starting the dual simplexmethod.

Ivet Galabova, Julian HallUniversity of [email protected], [email protected]

CP9

Update on Expression Representations and Auto-matic Differentiation for Nonlinear AMPL Models

AMPL facilitates stating and solving nonlinear pro-gramming problems involving algebraically defined objec-tives and constraints. For solving such problems, theAMPL/solver interface library provides routines that com-pute objective functions, constraint residuals, and associ-ated derivatives. Objectives and constraint bodies hithertohave been represented by “executable’ expression graphs,in which each node points to its operands and to a func-tion that computes the node’s result. Nodes also storepartial derivatives for use in computing gradients and Hes-sians by automatic differentiation. Storing these valuesmakes the graphs nonreentrant. To enable several threadsto evaluate the same expression at different points with-

out having separate copies of the expression graphs, suchdetails as variable values and partial derivatives must bestored in thread-specific arrays. I will describe and com-pare some expression-graph representations for use in com-puting function, gradient, and Hessian values, and for ex-tracting some auxiliary problem information. This is anupdated version of an earlier talk.

David M. GayAMPL Optimization, [email protected]

CP9

An Open-Source High Performance Dual SimplexSolver

This talk presents the techniques underlying hsol, an open-source high performance dual simplex solver, as well asobservations on the effectiveness of its presolve, advancedbasis crash and dual edge weight pricing strategies. Specif-ically, an insight will be given into three features. Firstly,the solver’s two novel update procedures [described in anarticle awarded the prize for the best paper of 2015 in Com-putational Optimization and Applications]. Secondly, itsexploitation of task and data parallelism via suboptimiza-tion and, thirdly, its advanced basis crash. Historically, ithas not been considered worthwhile to start the dual sim-plex method from an advanced basis following presolve andcrash, but this talk will present results demonstrating itseffectiveness for important classes of LP problems.

Julian Hall, Ivet GalabovaUniversity of [email protected], [email protected]

Qi [email protected]

CP9

Model Enhancement via Annotations: Extensionof EMP for Lagrangian Modifications

In this work we present a tool that performs Lagrangianmodifications, like the ones occurring in the context ofQuadratic Support (QS) functions [Aravkin et al., arXiv1301:4566] used as penalties, or risk measures like CVaR[Rockafellar, Optimization of conditional value-at-risk,2000]. It also enables the modeler beyond black andwhite constraints [Rockafellar, Lagrange Multipliers andOptimality, 1993] This tool is implemented within theEMP framework that is available in the GAMS model-ing language. A particular attention has been paid tothe ease of use: the modeler specify the data necessaryto encode the transformation. The model is then aug-mented/transformed programmatically, solved and the so-lution is automatically reported. The user has the choiceto up to 3 different schemes to solve the modified prob-lem. An important feature of our approach is that it fullyexploits the structure of the problem. We illustrate theapproach with an electricity generation problem [Philpottet al., Equilibrium, uncertainty and risk in hydro-thermalelectricity systems, 2016], and fitting problems. The valueof having multiple formulations is highlighted by the com-putational results.

Olivier HuberWisconsin Institute for DiscoveryUniversity of Wisconsin-Madison

Page 14: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 103

[email protected]

Michael C. FerrisUniversity of WisconsinDepartment of Computer [email protected]

CP10

A Second-Order Optimality Condition with First-and Second-Order Complementarity Associated toGlobal Convergence of Algorithms

We develop a new notion of second-order complementar-ity with respect to the tangent subspace associated tosecond-order necessary optimality conditions by the intro-duction of so-called tangent multipliers. We prove thataround a local minimizer, a second-order stationarity resid-ual can be driven to zero while controlling the growth ofLagrange multipliers and tangent multipliers, which gives anew strong second-order optimality condition without con-straint qualifications. We prove that second-order variantsof augmented Lagrangian and interior point methods sat-isfy our optimality condition. Finally, we present a com-panion minimal constraint qualification, weaker than theones known for second-order methods, that ensures usualglobal convergence results to a classical second-order sta-tionary point.

Gabriel HaeserUniversity of Sao [email protected]

CP10

Some Convergence Properties of Regularizationand Penalization Schemes for MPCCs

The mathematical programs with complementarity con-straints (MPCCs) have been the subject of much recentresearch because of their theoretical and practical appli-cations. From a nonlinear programming point of view,MPCCs are among the most highly degenerate prob-lems, they violate at every feasible point the Mangasarian-Fromovitz constraint qualification which is a key ingre-dient for stability. As a consequence, most of the welldeveloped theory for nonlinear programming cannot beapplied directly to MPCCs. Another difficulty in deal-ing with MPCCs is their combinatorial nature due to thecomplementarity constraints, which implies that the opti-mality conditions for MPCCs are complex and not easyto verify. Therefore, developing an efficient algorithm forMPCCs is of a great interest. In this work, we proposean approach that combines a regularization scheme with apenalty method to solve the NLP reformulation of MPCCs.The relaxation scheme is to handle the lack of regularity ofthese problems and the penalty method is to penalize theequality nonlinear constraints and to update the relaxationparameter associated to the complementarity constraints.We investigate the convergence properties of this approachand we propose sufficient conditions to ensure that the clus-ter points of the sequence of stationary points generated bythis approach are feasible for MPCCs.

Abdeslam KadraniInstitut National de Statistique et d’Economie [email protected]

Jean-Pierre DussaultUniversite de Sherbrooke

[email protected]

Mounir Haddou, Tangi MigotInstitut National des Sciences Appliquees de [email protected], [email protected]

Emilie Joannopoulos

Insa de Rennes / Universite de [email protected]

CP10

On Weak Conjugacy, Augmented Lagrangians andDuality in Nonconvex Optimization

The general duality theory for convex and nonconvex opti-mization problems, was comprehensively studied by Rock-afellar and Wets. They used augmented Lagrangian func-tions in general forms and presented conditions for strongduality relations. The approach developed by Azimov andKasimbeyli is based on a duality scheme, which uses theso-called weak conjugate functions. This conjugacy is con-structed with respect to superlinear conic functions definedas an augmented norm term with a linear part added, in-stead of only linear functions used in convex analysis. Agraph of such a function is a conical surface which serves asa supporting surface for a certain class of nonconvex sets.By using the mentioned class of superlinear functions, Azi-mov and Kasimbeyli introduced the concept of weak subd-ifferential – one of the natural generalizations of the classicsubdifferential of the convex analysis and derived a col-lection of optimality conditions and duality relations fora wide class of nonconvex optimization problems. In thispaper, we present the results obtained by both approaches.The duality results formulated in general forms, are simi-lar in both approaches. We present duality results obtainedfor particular cases, that is for problems with equality andinequality constraints with illustrative examples.

Refail KasimbeyliAnadolu [email protected]

CP10

Mathematical Programms with Equilibrium Con-straints: A Sequential Optimality Condition, NewConstraint Qualifications and Algorithmic Conse-quences

Mathematical programs with equilibrium (or complemen-tarity) constraints, MPECs for short, is a difficult class ofconstrained optimization problems. The feasible set hasa very special structure and violates most of the stan-dard constraint qualifications (CQs). Thus, the standardKKT conditionsare not necessary satisfied by minimizersand the convergence assumptions of many standard meth-ods for solving constrained optimization problems are notfulfilled. This makes it necessary, both from a theoreticaland numerical point of view, to consider suitable optimal-ity conditions, tailored CQs, and specially designed algo-rithms for solving MPECs. In this paper, we present a newsequential optimality condition useful for the convergenceanalysis for several relaxation methods for solving MPECs.We also introduce a companion CQ for M-stationary thatis weaker than the recently introduced MPEC relaxed con-stant positive linear dependence (MPEC-RCPLD). Rela-tions between the old and new CQs as well as the algorith-

Page 15: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

104 OP17 Abstracts

mic consequences will be discussed.

Alberto RamosFederal University of [email protected]

CP10

Generalized Derivatives of Nonlinear Programs

Parametric sensitivities are obtained for NLPs exhibitingactive set changes. By reformulating the KKT system asa nonsmooth equation system using any suitable nonlin-ear complementarity problem function, sensitivity infor-mation is furnished under conditions implied by, for ex-ample, a KKT point satisfying LICQ and strong second-order sufficiency. To accomplish this, a piecewise differen-tiable implicit function theorem that describes generalizedderivative information is presented, which takes advantageof newly-found theory in the form of the lexicographic di-rectional derivative. The sensitivity system found is a non-smooth equation system that admits primal and dual sen-sitivities as its unique solution and recovers the classicalresults set forth by Fiacco and McCormick in the absenceof active set changes. The results are computationally rel-evant, thanks to a recently developed nonsmooth vectorforward mode of automatic differentiation; algorithms aredetailed for calculating the sensitivity systems unique so-lution in a tractable way.

Peter G. StechlinskiProcess Systems Engineering LaboratoryMassachusetts Institute of [email protected]

Kamil KhanArgonne National LaboratoryMathematics and Computer Science [email protected]

Kamil Khan, Paul I. BartonMassachusetts Institute of TechnologyDepartment of Chemical [email protected], [email protected]

CP10

On a Conjecture in Second-Order Optimality Con-ditions

In this work we deal with optimality conditions that canbe verified by a nonlinear optimization algorithm, whereonly a single Lagrange multiplier is available. In par-ticular, we deal with a conjecture formulated in [R. An-dreani, J.M. Martnez, M.L. Schuverdt, “On second-orderoptimality conditions for nonlinear programming”, Opti-mization, 56:529–542, 2007], which states that whenever alocal minimizer of a nonlinear optimization problem ful-fills the Mangasarian-Fromovitz Constraint Qualificationand the rank of the set of gradients of active constraintsincreases at most by one in a neighborhood of the min-imizer, a second-order optimality condition that dependson one single Lagrange multiplier is satisfied. This con-jecture generalizes previous results under a constant rankassumption or under a rank deficiency of at most one. Inthis work we prove the conjecture under the additional as-sumption that the jacobian matrix has a smooth singularvalue decomposition and we review previous attempts tosolve it.

Daiana S. Viana

Avenida Prof. Almeida Prado1280, Butanta, Sao Paulo(SP)CEP [email protected]

Roger BehlingFederal University of Santa [email protected]

Gabriel HaeserUniversity of Sao [email protected]

Alberto RamosFederal University of [email protected]

CP11

Set-Valued Derivatives and Subdifferentials Basedon Differences of Convex Sets

Arithmetic set operations for convex compact sets are anessential part of many set-valued numerical methods, e.g.,in the calculation of reachable sets of control systems,in the interpolation of set-valued maps or in problems ofset optimization. While the Minkowski sum is commonlyaccepted, there is a variety of convex difference notions,e.g., the geometric difference by Hadwiger/Pontryagin andthe Demyanov difference. Alternatively, the set of convexcompact subsets of Rn can also be embedded into vec-tor spaces via pairs of convex sets, limits of differencesof support functions and directed sets. Directed sets inRn are described via a scalar continuous function and alower-dimensional directed set, both parametrized by unitvectors. The embedding of a convex compact set into thespace of directed sets suggests an obvious way for a realiza-tion in computational methods. Via directions attached togeneralized supporting faces, a difference of directed setsbeing inverse to the embedded Minkowski sum can be in-troduced. From the visualized difference (either a convexor nonconvex subset of Rn), other convex set differencescan be recovered. Based on this approach various notionsof Lipschitz and differentiable set-valued maps and a newnonconvex subdifferential with exact calculus rules will besketched. A visual comparison to the Dini, Michel-Penot,Clarke and Mordukhovich subdifferential is given offeringa geometrical way to compute them.

Robert BaierDepartment of Mathematics, University of Bayreuth,[email protected]

CP11

Extending the Applicability of the Global SearchMethod DIRECT

Don Jones DIRECT algorithm for global optimization fillsan interesting niche in the derivative-free-optimization tax-onomy, with both significant advantages and significantlimitations. We present a generalized version of this algo-rithm that allows mixed continuous/discrete search spacesin asynchronous parallel computing environments. Themethod can be effective for local, global, and multi-localoptimization, for finding Pareto sets associated with mul-tiple objectives, and for problems with hidden constraints.

Richard G. Carter

Page 16: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 105

DNV [email protected]

CP11

A Superlinearly Convergent Smoothing NewtonContinuation Algorithm for Variational Inequali-ties Over Definable Sets

We use the concept of barrier-based smoothing approxi-mations introduced by Chua and Li Li [C. B. Chua andZ. Li, A barrier-based smoothing proximal point algorithmfor NCPs over closed convex cones, SIAM J. Optim., 23(2013), pp. 745–769] to extend the smoothing Newtoncontinuation algorithm of Hayashi at el [S. Hayashi, N.Yamashita, and M. Fukushima, A combined smoothingand regularization method for monotone second-order conecomplementarity problems, SIAM J. Optim., 15 (2005), pp.593–615] to variational inequalities over general closed con-vex sets X. We prove that when the underlying barrierhas a gradient map that is definable in some o-minimalstructure, the iterates generated converge superlinearly toa solution of the variational inequality. We further provethat if X is proper and definable in the o-minimal struc-

ture RRalgan , then the gradient map of its universal barrier is

definable in the o-minimal expansion Ran,exp. Finally, weconsider the application of the algorithm to complementar-ity problems over epigraphs of matrix operator norm andnuclear norm, and present preliminary numerical results.

Chek Beng ChuaNanyang Technological [email protected]

Le Thi Khanh HienNational University of [email protected]

CP11

Basis Function Selection in Approximate DynamicProgramming

Many complex problems arising in business, health care,and transportation can be modeled as sequential decisionmaking problems under uncertainty, meaning that a deci-sion maker has to make decisions periodically while somerandom events unfold over time. These problems can beconveniently modeled in the form of dynamic programs.However, for many practical problems, this formulationsuffers from the curse of dimensionality, making exact cal-culation of the optimal policy intractable. In order toovercome this issue, approximate dynamic programming(ADP) methods have been developed to find an approxi-mate optimal solution. A cornerstone of many ADP algo-rithms is defining a set of basis functions (or an approxi-mation architecture) for approximating the cost-to-go func-tion. Currently, the choice of basis functions requires priorexpert knowledge about the problem, and is considered asmore of an art than a science. We present, study, and ap-ply a novel algorithm that automates generation of basisfunctions by efficiently selecting a subset of functions froma large pool of potential basis functions, and updating thisset as more information becomes known about the problem.The benefit of our algorithm is twofold: first, it reduces theburden to come up with a well-informed set of basis func-tions that requires significant prior knowledge about theproblem; second, since many potential candidates are con-sidered for basis functions, it improves the quality of the

approximation.

Alireza SabouriHaskayne School of BusinessUniversity of [email protected]

CP11

A Primal Majorized Newton-CG Augmented La-grangian Method for Large-Scale Linearly Con-strained Convex Programming

In this paper, we propose a primal majorized semismoothNewton-CG augmented Lagrangian method for large-scalelinearly constrained convex programming problems, espe-cially for some difficult problems which are nearly degener-ate. The basic idea of this method is to apply the majorizedsemismooth Newton-CG augmented Lagrangian methodto the primal convex problem. And we take two specialnonlinear semidefinite programming problems as examplesto illustrate the algorithm. Furthermore, we establish theglobal convergence and the iteration complexity of the al-gorithm. Numerical experiments demonstrate that ourmethod works very well for the testing problems, especiallyfor many ill-conditioned ones.

Chengjing WangSouthwest Jiaotong [email protected]

Peipei TangZhejiang University City [email protected]

CP11

Multi-Criteria Quadratic Programming Problemswith Application to Portfolio Selection

Mathematical optimization is one of the most importanttools for solving practical decision-making problems. It iswidely applied in a variety of areas, including finance, net-work design and operation, supply chain management andengineering. It is worth noting that most problems in thereal world are concerned with one or more criteria. How-ever, there has not been much research into multi-objectivequadratic programming problems with a norm regulariza-tion. In this paper, we investigate the tri-criteria quadraticprogramming problems with a cardinality constraint forportfolio management. The cardinality constrained invest-ment strategy arises naturally due to the presence of vari-ous forms of market friction, such as transaction costs andmanagement fees, or even due to the consideration of men-tal cost. In the theoretical framework, we study the op-timality conditions for the proposed tri-criteria quadraticprogramming problem and show that the λ-space can bepartitioned into some polyhedra and that on each polyhe-dron, the corresponding solution can be found by solvingone scalarized quadratic programming problem. We willalso discuss the properties of the Pareto optimal solutionset of the problems. Acknowledgements: The work de-scribed in this paper was partially supported by grantsfrom the Research Grants Council of the HKSAR, China(UGC/FDS14/P03/14 and UGC/FDS14/P02/15).

Carisa K. W. YuDepartment of Mathematics and StatisticsHang Seng Management College

Page 17: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

106 OP17 Abstracts

[email protected]

CP12

Density Estimation with Total Variation PenalizedMaximum Likelihood Estimation

We introduce a new method for nonparametric density esti-mation from empirical data, which is conducive to scenario-based stochastic optimization methods. By penalizing amaximum likelihood approach with a total variation term,we avoid overfitting and the ”dirac curse”. Though the cor-responding optimization problem is variational, we showthat it reduces to a finite dimensional one. We investigateefficient methods for solving this convex problem, and thenfocus on the statistical properties of this density estimationmethod and its applications to stochastic programming.

Robert Bassett, James SharpnackUniversity of California [email protected], [email protected]

CP12

Differential Evolution for Solving Maximum CliqueProblem

The maximum clique problem (MCP) finds many impor-tant applications in different contexts. MCP is a classicalNP-hard problem in combinatorial optimization, but it isstill hard to get satisfactory results, especially when thenumber of vertices exceeds 200. In this talk, we applyDifferential Evolution (DE) algorithm to MCP. DE is oneof stochastic function minimizer and gets attention frommany researches due to its effectiveness. To employ DEalgorithm, we converted MPC problem into a standardquadratic optimization problem over a unit simplex. WhenDE algorithm generates the next generation by random-ization steps, there is a possibility that the next candidatecan get out the feasible region. However, the projectiononto the feasible set can be done by the simple structure ofthe unit simplex. We can reduce the computation time ofthe projection by combining existing methods specializedfor the unit simplex. Numerical results showed a favor-able performance on most benchmark instances, some ofwhich DE got better results than other existing algorithm.By choosing better parameters, we can further improve theresults. DE can be considered as a very potential algorithmfor solving MPC.

Hui FangTokyo Institute of [email protected]

Makoto YamashitaTokyo Institute of TechnologyDept. of Mathematical and Computing [email protected]

CP12

High-Frequency Trading in Risk-Averse PortfolioOptimization with Higher-Order Risk Measures

This study examines the application of risk-averse opti-mization techniques to high-frequency trading (HFT) inportfolio management. First, I develop efficient cluster-ing methods for scenario tree construction. Then, I con-struct a two-stage stochastic programming problem withhigher-order conditional measures of risk, which is usedto re-balance the portfolio on a rolling horizon basis, with

transaction costs included. Finally, I present an extensivesimulation study on both interday and high-frequency in-traday real-world data of the methodology.

Sitki GultenStockton [email protected]

CP12

Improving Consistency in Intertemporal Decisionsvia Stochastic Optimization and Conditioning Co-ordination

Optimization models have been used since long to supportoperational planning in several areas and typically in dif-ferent time frames: long-term strategic decisions, mid termtactical decisions and short term operational decisions. Ingeneral, longer term decisions impose constraints to thedecision process in shorter horizons and one would like tocompute efficient shorter term decisions. However, manytimes there are inconsistencies between these models, spe-cially due to various sources of uncertainty, the dynamicsof the process and different degrees of aggregation. Weexplore the question of how to improve consistency andaddress this problem using stochastic approaches and ro-bust optimization. We also show how the implicit geometryand conditioning of the underlying problems might affectconsistency, and we propose a coordination approach us-ing condition measures instead of cost measures. We showsome simulated results as well as applications to some realproblems in the forest industry and also for a problem inthe management of health systems.

Jorge R. VeraPontificia Universidad Catolica de [email protected]

Alfonso LobosUniversity of California, [email protected]

Ana BatistaPontificia Universidad Catolica de [email protected]

CP13

Splitting Methods in Penalized Zero-Variance Dis-criminant Analysis

We propose improved methods for solving the sparse zero-variance discriminant analysis problem based on the al-ternating direction method of multipliers. Our approachavoids the expensive calculation of between and within-class scatter matrix typically used in linear discriminantanalysis, and instead uses only the data matrix to computediscriminative directions. We present per-iteration com-plexity analyses and numerical simulations which establishimprovement upon classical approaches for zero-variancediscriminant analysis and high-dimensional LDA, as wellas convergence results for our approach.

Brendan P. AmesUniversity of AlabamaDepartment of [email protected]

CP13

Fast Monte Carlo Algorithms for Tensors: Approx-

Page 18: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 107

imating Tensor Multiplication and Decomposition

Motivated by applications in which the data may be for-mulated as a tensor, we consider methods for several com-mon tensor factorization problems. These methods re-solve many of the key algorithmic questions regardingthe randomized tensor multiplication, low-rank decompo-sition, and nuclear norm minimization using a small num-ber of lateral and/or horizontal slices of the data tensor.These methods make more efficient use of computationalresources, such as the computation time, random accessmemory, and the number of passes over the data. Inspiredby the recent development of randomized matrix factoriza-tion methods with rich theory and computational complex-ity, we propose the first polynomial time algorithms for thelow-rank tensor(noisy) approximations that come with rel-ative error guarantees. All theoretical and practical resultsare presented for third order tensors. However, they canbe easily extended to the order-p(p > 3) case. We concludewith some experimental results that show the promise ofour approach for analyzing large-scale datasets.

Michailidis George, Davoud Ataee TarzanaghUniversity of [email protected], [email protected]

CP13

Online Convolutional Dictionary Learning

Dictionary learning is a significant topic of signal process-ing. It is modeled by a nonconvex optimzation problem.Convolutional dictionary learning, a structured dictionarylearning model which deals with convolutional signals, ispowerful for inverse problems of images, such as denoisingand super-resolution. However, traditional batch convolu-tional dictionary learning leads to huge memory costs, sinceall the training images are handled simultaneously and keptin memory. Thus, batch learning scheme can only adoptstraining set with limited size. Online learning deals with asingle or a small number of training images in one iteration,and then accumulates the information step by step. Conse-quently, it can adopt large training set, even with infinitelysize. There are existing works for online dictionary learningwithout convolution, but online convolutional dictionarylearning is novel. The key issue is that the structures ofsparse coding in standard signals and convolutional signalsare different. Its not trivial to extend online scheme fromstandard dictionary learning to convolutional structures.We use down-sampling and mask-simulation techniques tosolve this problem and get good results. Moreover, sometechniques in optimization algorithms, such as coordinatedescent, are used to accelerate our method.

Jialin LiuMathematics Department,University of California, Los Angeles (UCLA)[email protected]

Cristina Garcia-CardonaCCS Division, Los Alamos National LaboratoryLos Alamos, [email protected]

Brendt WohlbergLos Alamos National [email protected]

Wotao YinUniversity of California, Los Angeles

[email protected]

CP13

Sparsity-Constrained Gaussian Graphical Models

Gaussian graphical models (GGM) are graph matrices usedto model conditional dependences between variables in anetwork. Our goal is to learn a sparse GGM by enforcingcardinality constraints for either interpretability of modelor to avoid overfitting, which can be used in anomaly de-tection of sensor networks to detect faulty behaviors, avoiddowntime of assets. We have developed an efficient algo-rithm incorporating projected gradients and an active-setmethod with Newton subproblems to solve the nonlinearoptimization problem.

Dzung PhanIBM T.J. Watson Research [email protected]

Matt MenickellyLehigh [email protected]

CP13

Numerical Identification of Sparse Chemical Reac-tion Networks

We are concerned with the numerical identification ofsparse chemical reaction networks from noisy time seriesdata of the species concentrations. After choosing suitablekinetics, the dynamics of the reaction network are usuallymodeled by a parameter-dependent system of ordinary dif-ferential equations. In classical approaches of system iden-tification, after a discretization in time, the unknown pa-rameters are estimated via least-squares approximations.It is well-known that such strategies yield inferior resultsin the case of a low network connectivity. In the lastyears, several recovery techniques have been proposed inthe literature that promote sparse networks in the recon-structions. We report on the application of nonsmoothTikhonov regularization with sparsity-promoting penaltiesto system identification problems. The optimality condi-tions are solved by globally convergent semismooth Newtontechniques which were developed recently.

Thorsten RaaschUniversity of MainzInstitute of [email protected]

CP13

Proximal DC Algorithm for Sparse Optimization

Many applications such as in signal processing, machinelearning and operations research seek sparse solutions byadopting the cardinality constraint or rank constraint. Weformulate such problems as DC (Difference of two Convexfunctions) optimization problems and apply DC Algorithm(DCA) to them. While the DCA has been widely used forthis type of problems, it often requires a large computationtime to solve a sequence of convex subproblems. Our al-gorithm, which we call Proximal DC Algorithm (PDCA),overcomes this issue of the ordinary DCA by employinga special DC decomposition of the objective function. InPDCA, closed-form solutions can be obtained for the con-vex subproblems, leading to efficient performance in nu-merical experiments. We also discuss the theoretical as-

Page 19: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

108 OP17 Abstracts

pects: PDCA can be viewed as a nonconvex variant of theproximal gradient methods (PGM), which provides an in-sight on the relation between PGM and DCAs.

Akiko TakedaThe Institute of Statistical [email protected]

Jun-ya GotohChuo [email protected]

Katsuya TonoThe University of Tokyokatsuya [email protected]

CP14

A Semidefinite Programming Approach for EnergyNetwork Planning

This talk presents a mixed-integer quadratic formulationfor transmission expansion planning with an AC networkmodel. The model is solved by combining a semidefiniteprogramming relaxation with valid inequalities in a branch-and-cut framework. Computational results are presentedto show the performance of the proposed method.

Bissan GhaddarIBM [email protected]

CP14

Acyclic Orientations and Nonlinear Flow Problems

We consider optimization problems for natural gas net-works with complex topologies and many cycles like theGerman one. The existence of cycles implies that the flowdirection of the gas is not known beforehand, which leadsto weaker relaxations for the MINLP optimization prob-lems arising in planning and operating gas networks. Toimprove these models, we study the underlying nonlinearflow problem from a combinatorial perspective. In fact, theflow in gas and water networks is driven by so-called po-tentials, which implies that the resulting flows are acyclicin the following sense: If each network arc is oriented inthe direction of flow over this arc, then there is no directedcycle in the resulting network. We exploit the interplay ofthis acyclicity property with the network flow structure toconstruct stronger models and present some computationalresults.

Benjamin HillerZuse Insitut [email protected]

CP14

Simultaneous Truss Topology - and Static OutputFeedback Controller Design via Nonlinear Semidef-inite Programming

We consider the problem of static output feedback con-troller design via the H∞-control problem. The basic ideaof the H∞-control problem is to find an optimal controllerthat minimizes the H∞-norm of the closed loop trans-fer function. With the help of the Bounded-Real Lemmathe H∞-control problem can be formulated as a nonlinearsemidefinite programming problem. Moreover, we combinethe H∞-control problem with a topology optimization for

truss structures under uncertain dynamic loads. The mainadvantage of our approach is that we can simultaneouslyoptimize the topology of the mechanical system and thedesign of the static output feedback controller. We solvethe resulting nonlinear semidefinite programming problemusing a sequentially semidefinite programming approach.The considerations are complemented by numerical resultsfor truss topology design under uncertain dynamic loads.

Anja KuttichTU [email protected]

Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected]

CP14

Stochastic Planning of Environmentally FriendlyTelecommunication Networks Using Column Gen-eration

We present the network planning problem of an LTE cel-lular network with switchable base stations, ie. wherethe configuration of the antennae is dynamic, dependingon demand. This is formulated as a two-stage stochasticprogram under demand uncertainty with integers in bothstages. We develop its Dantzig-Wolfe reformulation andsolve it using column generation. Preliminary results con-firm the efficiency of the approach. Interestingly, the re-formulated model often has a tight LP-gap.

Joe Naoum-SawayaIBM [email protected]

CP14

Gradient Multicut: A Second Derivative PottsModel for Image Segmentation Using MILP

Unsupervised image segmentation and denoising are twofundamental tasks in image processing. Usually, graphbased models such as graph cuts or multicuts are usedfor globally optimum segmentations and variational modelsare employed for denoising. Our approach addresses bothproblems at the same time. A (depth) image can be seen asa function y : P → R giving the (depth) intensity of pixelslocated on a finite 2-dimensional grid P . The segmenta-tion problem is tackled by fitting a piecewise linear func-tion f : P → R to y minimizing

∑p∈P |f(p)−y(p)| with an

additional �0 regularization term to count the number ofvariations (with respect to segments) per row and column.Previous attempts are usually based on continuous or con-vex relaxations. We propose a novel MILP formulation of asecond derivative Potts model, where binary variables areintroduced to directly deal with the �0 norm. The modelapproximates the values of pixels in a segment by an affineplane and incorporates multicut constraints to enforce theconnectedness of segments. As a by-product the image isdenoised. Our approach could also be interpreted as non-parametric discontinuous piecewise linear fitting in 2D. Tothe best of our knowledge, it is the first mathematical pro-gramming model for finding globally optimum solutions.Numerical experiments demonstrate its performance.

Ruobing ShenHeidelberg UniversityInstitute of Computer Science

Page 20: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 109

[email protected]

Stephane CanuINSA de RouenNormandie [email protected]

Gerhard ReineltUniversity of HeidelbergInstitute of Computer [email protected]

CP14

Disjoint K-Path Min-Sum Approach Towards Ro-bust Power Network Design

An electric power supplier needs to build a transmissionline between two jurisdictions. Ideally, the power line con-figuration would optimize some utility, e.g., minimize thecost. Due to reliability concerns, the developer has to in-stall not just 1, but 2 lines, separated by a certain distancefrom one to another, so that if one line fails, the electricitystill gets delivered. Similar situations arise in other applica-tions like communication networks, etc. We investigate theabove problem, with emphasis on 2-path instances wherethe network topology representations are highly granular,resulting in large scale models. Specifically, we contrastthe performance of a mixed-integer programming (MIP)model with that of a new graph-based model. On the MIPside, we observe that unlike the min-cost flow formulationof the shortest path, a similar MIP model of the disjoint2-path min-sum problem suffers from loss of the total uni-modularity. On the graph-theory side, we classify the ex-act problem as NP-complete. At the same time, we pro-pose a polynomial approximation scheme, which appears toavoid local-search pitfalls encountered by other heuristics.Under two mild practical assumptions, the latter schemeyields an optimal solution to the problem, and appears tobe extremely efficient computationally. We substantiateour finding with numerical experiments using real-life datafrom the Northeast region of Alberta, Canada, with net-work graphs nearing a hundred thousand nodes.

Yuriy ZinchenkoUniversity of CalgaryDepartment of Mathematics and [email protected]

Haotian SongStern School of Business, New York University, KMC 8-154, 44 W 4th St, New York, [email protected]

CP15

Copositive Optimization on Infinite Graphs

Combinatorial optimization problems can be formulatedas conic convex optimization problems. Especially NP-hard problems can be written as copositive programs. Inthis case the hardness is moved into the copositivity con-straint. We will show how to lift this theory to infinite di-mension and we study operators in Hilbert spaces insteadof matrices. For that purpose we develop a new theory ofsemidefinite and copositive optimization in infinite dimen-sional spaces. In this context we discuss some propertieswhich are equivalent to the ones in finite dimension and wealso point out differences. Understanding these propertiesis essential for developing solution approaches for problems

of that kind.

Claudia AdamsDepartement of MathematicsTrier [email protected]

Mirjam DurDepartment of MathematicsUniversity of [email protected]

Leonhard FrerickDepartement of MathematicsTrier [email protected]

CP15

The Cone of Positive Type Measures on a CompactGroup

The cone of positive type measures on a compact groupcan be regarded as the natural dual of the cone of contin-uous positive type functions, and this duality can be seenas an infinite analog of the self-duality of the cone of posi-tive semidefinite matrices. We present some details of thisduality, and as an application we explain how the usualprimal and dual semidefinite programming definitions ofthe Lovasz theta-function can be extended to accommo-date infinite Cayley graphs over compact groups. Fromthese extensions, one can deduce the Delsarte bounds forspherical codes, the Fourier analysis upper bounds of Val-lentin and Oliveira for the measurable chromatic numberof Euclidean space, and the upper bounds of DeCorte andPikhurko for the Witsenhausen problem on orthogonal-freesubsets of the unit sphere. Duality for these extensionswill be investigated. Even though the known strong dual-ity proofs for the theta-function fail in this setting, we canprove zero duality gap from Fourier analysis; attainmentof the optimal value sometimes fails.

Evan DecorteMcGill [email protected]

CP15

A Factorization Method for Completely PositiveMatrices

A symmetric matrix A ∈ Rn×n is called completely pos-

itive if it can be factorized as A = BBT for some entry-wise nonnegative matrix B ∈ R

n×k+ . These matrices play a

role in optimization, since Burer (2009) showed in a sem-inal paper that any optimization problem with quadraticobjective, linear constraints and binary variables can beequivalently formulated as a linear problem over the coneof completely positive matrices. Given a matrix completelypositive matrix A, it is highly nontrivial to actually find afactorization A = BBT of the above form. Being able tocompute such a factorization would help recover the solu-tion of the underlying quadratic or combinatorial problem.In this talk, we present a factorization method for generalcompletely positive matrices which is heuristic in naturebut in our extensive numerical tests gave very fast resultsfor almost all test instances.

Mirjam DuerUniversity of [email protected]

Page 21: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

110 OP17 Abstracts

Patrick GroetznerUniveristy of [email protected]

CP15

Convergence Rates of the Moment-Sum-of-SquaresHierarchy for Volume Approximation of Semialge-braic Sets

We derive bounds on the convergence rate of the moment-sum-of-squares hierarchy of semidefinite programs for ap-proximating the volume of a compact basic semialgebraicset. We show that the rate is O(1/log log d) where d is thedegree of the moments or polynomial sum-of-squares.

Didier HenrionLAAS-CNRS, University of Toulouse, [email protected]

Milan KordaUniversity of California at Santa [email protected]

CP15

A Bound on the Caratheodory Number

The Caratheodory number κ(K) of a pointed closed convexcone K is the minimum integer k for which every elementof K can be written as a sum of k elements belonging toextreme rays. The dimension of K is an upper bound ofκ(K) as a consequence of Caratheodory’s Theorem. In thistalk we provide a smaller upper bound �(K) − 1 of κ(K),where �(K) is the length of the longest chain of nonemptyfaces of K, thus tying the Caratheodory number with akey quantity that appears in the analysis of facial reduc-tion algorithms. This bound is tight for several families ofcones, which include symmetric cones, the so-called smoothcones, and rank one generated spectrahedral cones. Wefurnish a new proof of a result by Guler and Tuncel whichstates that the Caratheodory number of a symmetric coneis equal to its rank. We also extend Hildebrand’s result forthe Caratheodory number of rank one generated spectra-hedral cones to Jordan algebra. We give a simple exampleshowing that our bound fail to be sharp. Finally, we con-nect our discussion to the notion of cp-rank for completelypositive matrices.

Masaru ItoTokyo Institute of [email protected]

Bruno LourencoSeikei [email protected]

CP15

Conic Program Certificates from ADMM/Douglas-Rachford Splitting

Many practical optimization problems are formulated asconic programs, a current active area of research. Onechallenge with conic programs is that one often does notknow, a priori, whether there is a solution to the conicprogram; it may be infeasible, unbounded or has a finitebut unattainable optimal value. Solvers like SeDuMi andSDPT3 provide certificates for strong infeasibility, but theystruggle with weak infeasibility. To resolve this, we propose

a method that finds (1) a solution when one exists (2) acertificate of strong/weak infeasibility when the problemis strongly/weakly infeasible (3) an unbounded directionwhen there is one and (4) a sufficient condition for iden-tifying programs with finite but unattainable optimal val-ues. The method is based on Douglas-Rachford Splitting(ADMM), and we establish the efficacy of it through the-oretical analysis and numerical experiments.

Yanli Liu, Ernest RyuUniversity of California, Los AngelesDepartment of [email protected], [email protected]

Wotao YinUniversity of California, Los [email protected]

CP16

Worst-Case Complexity Analysis of Convex Non-linear Programming

We will review recent studies on the worst-case complexityanalysis of derivative-based and derivative-free methods.Within convex smooth unconstrained setting, we presenta unified framework for worst-case complexity analysis ofboth first and second-order methods when deriving the sizeof the gradient below some given threshold is desired. Wethen present two algorithms along with their complexityanalysis for the minimization of nonsmooth convex con-strained problems.

Rohollah GarmanjaniCMUC, Department of Mathematics, University [email protected]

CP16

Alternating Projection on Convex Sets and Mani-folds

Finding a feasible point for an optimization problem isequivalent to finding a point in the intersection of finitelymany constraint sets. For convex sets, alternating projec-tion is a common method to solve this problem. This hasrecently been extended to alternating projection on man-ifolds. Under certain conditions, there are results on theconvergence speed in terms of the angle between the sets.In this talk I will give a short survey of these results andextend the given ideas to the case of alternating projectiononto different types of sets.

Patrick GroetznerUniveristy of [email protected]

Mirjam DurDepartment of MathematicsUniversity of [email protected]

CP16

A Two-Phase Algorithm for Large-Scale QplogdetOptimization Problem

In this paper, we present a two-phase algorithm to solvea large-scale nonlinear semidefinite programming problemwhose objective function is a sum of a convex quadratic

Page 22: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 111

function and a log-determinant term (QPLogdet). Inphase I, we adopt an inexact symmetric Gauss-Seidel (sGS)technique based alternating direction method of multipli-ers (ADMM)-type method to obtain a moderately accu-rate solution or generate a reasonably good initial point.In Phase II, we design an inexact accelerated block co-ordinate descent (ABCD) based augmented Lagrangianmethod (ALM) to obtain a highly accurate solution, wherethe inexact sGS techinque and the semismooth Newton-CG method are employed to solve the inner problem ateach iteration. Numerical experiments demonstrate thatour two-phase algorithm is efficient and robust.

Tang PeipeiZhejiang University City [email protected]

Wang ChengjingSouthwest Jiaotong [email protected]

CP16

Large Gaps Between Gauss-Seidel Type Methodsand Randomized Versions

A simple yet powerful idea for solving large-scale computa-tional problems is to iteratively solve smaller subproblemsand this idea appeared in, e.g., Gauss-Seidel (G-S), Kacz-marz, coordinate descent (CD), and ADMM. We prove forthe first time that for all these methods, there are largegaps between the deterministic cyclic versions and the ran-domized versions. First we show an O(n2) gap in the con-vergence rate for CD/G-S/Kaczmarz methods. There aresome examples showing cyclic CD (C-CD) can be muchslower than randomized coordinate descent (R-CD), butthere also exist many practical examples where C-CD isfaster. We show that C-CD indeed can be O(n2) timesslower than R-CD in the worst case, by establishing a lowerbound that matches the upper bound. One difficulty is thatthe spectral radius of a non-symmetric iteration matrixdoes not necessarily constitute a lower bound for the con-vergence rate. Second we show a gap between divergenceand convergence for ADMM. In particular, although cyclicmulti-block ADMM was recently found to be possibly di-vergent, we show that RP-ADMM (randomly permutedADMM) converges in expectation for solving linear sys-tems (can be extended to quadratic objective). Our analy-sis reveals that random permutation has a symmetrizationeffect that nicely changes the spectrum of the iteration ma-trix. We believe RP-ADMM is potentially a competitivealgorithm for large-scale linearly constrained problems.

Ruoyu SunUIUCFacebook AI [email protected]

CP16

Fully Adaptive Admm for Deterministic andStochastic Non-Smooth Optimization

Recently, alternating direction method of multipliers(ADMM) has received tremendous interests for solving nu-merous problems in machine learning, statistics and signalprocessing. A variety of algorithms of ADMM with fastconvergence rates (e.g., linear rate) have been developedunder smoothness, strong convexity and other regularityconditions. However, for general convex optimization prob-lems (e.g., risk minimization with a non-smooth loss and

a non-smooth regularizer), the best known iteration com-plexity of deterministic and stochastic ADMM is O(1/ε)and O(1/ε2), respectively. In this paper, we propose fullyadaptive ADMM for general convex optimization withoutsmoothness and strong convexity assumptions but undera generic local error bound condition to enjoy faster con-vergence. In particular, we show that the proposed deter-ministic ADMM enjoys an improved iteration complexity

of O(1/ε1−θlog(1/ε)), where θ ∈ (0, 1] is a constant in thelocal error bound condition - a property of the objectivefunction, and the proposed stochastic ADMM enjoys an

iteration complexity of O(1/ε2(1−θ)), which is lower thanO(1/ε2) of standard stochastic ADMM. Without smooth-ness and strong convexity assumptions, we demonstratethat the proposed ADMMs can achieve linear convergencefor minimizing a non-smooth loss with generalized lassoregularizers.

Yi Xu, Mingrui Liu, Qihang Lin, Tianbao YangUniversity of [email protected], [email protected], [email protected], [email protected]

CP17

An Algorithm with Infeasibility Detection for Lin-early Constrained Optimization Problems

When solving a constrained optimization problem, there isa balance between moving toward a stationary point andsatisfying the constraints. One method for minimizing vi-olation of the constraints is to generate a sequence of it-erates {xn} by applying Newton’s method to the equalityconstraints, c(x). That is, xn+1 = xn + dn where

dn = arg min{‖z − dn−1‖2 : ∇c(dn−1)(z − dn−1) + c(dn−1) = 0, z ≥ 0

}.

However, even when the original constrained optimizationproblem is feasible, the Newton step may be infeasible. Inthis presentation, we discuss an algorithm with infeasibilitydetection for

min{‖z‖2 : Az = b, z ≥ 0

}based on recent work by Burke, Curtis, Johnson, Robinson,Wachter, and Wang.

James D. DiffenderferUniversity of [email protected]

William HagerUniversity of FloridaDepartment of [email protected]

CP17

Convex Relaxations with Second Order Cone Con-straints for Nonconvex Quadratically ConstrainedQuadratic Programming

We present new convex relaxations for nonconvex quadrat-ically constrained quadratic programming (QCQP) prob-lems. Since the basic semidefinite programming relaxationis often too loose for general QCQP, recent research has fo-cused on strengthening convex relaxations using valid lin-ear or second order cone (SOC) inequalities. In this paper,we construct valid second order cone constraints for non-convex QCQP and reduce the duality gap using these valid

Page 23: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

112 OP17 Abstracts

constraints. Specifically, we decompose and relax the non-convex constraints to two SOC constraints and then lin-earize the products of the SOC constraints and linear con-straints to achieve some new valid constraints. Moreover,we introduce and generalize two recent techniques for gen-erating valid inequalities to further enhance our method.We demonstrate the efficiency of our results with numericalexperiments.

Rujun JiangRm810C, ERB, CUHK, Shatin, N.T., Hong [email protected]

Duan LiThe Chinese University of Hong [email protected]

CP17

Differentiable McCormick Relaxations for Deter-ministic Global Optimization

Deterministic methods for nonconvex optimization requirelower bounding information that is typically obtained usingconvex underestimators of the objective function and con-straints. McCormick’s relaxation scheme [Math. Program.,10 (1976): 147-175] is an established, efficient method forgenerating and computing appropriate convex underesti-mators of factorable functions, and was recently general-ized and improved by Tsoukalas and Mitsos [J. Glob. Op-tim., 59 (2014): 633-662]. However, both of these schemesmay yield nonsmooth underestimators, necessitating theuse of nonsmooth optimization solvers to evaluate lowerbounds. Such solvers exhibit poorer convergence prop-erties and performance than their smooth counterparts.This presentation describes conditions under which theTsoukalas-Mitsos scheme produces C1 relaxations, andshows how these conditions may be satisfied for any fac-torable function. The result is a C1 variant [J. Glob. Op-tim., in press] of McCormick’s relaxations that retain thecomputational benefits of established schemes: the new re-laxations may be evaluated automatically, cheaply, and ac-curately, apply even to nonsmooth functions, and convergerapidly to the relaxed function as the considered domainshrinks. Gradients may be evaluated using standard auto-matic differentiation techniques. A C++ implementationis discussed, along with extensions to implicit functions anddynamic systems.

Kamil KhanArgonne National LaboratoryMathematics and Computer Science [email protected]

Kamil KhanMassachusetts Institute of TechnologyDepartment of Chemical [email protected]

Harry [email protected]

Paul I. BartonMassachusetts Institute of TechnologyDepartment of Chemical Engineering

[email protected]

CP17

SQP Method for Constrained Vector OptimizationProblem

In this paper a descent scheme is developed to solve non-linear vector optimization problem with inequality con-straints. At every iteration of the scheme, a decent direc-tion is determined by solving a single objective quadraticsubproblem using linear approximation of the objectivefunctions as well as constraints and a quadratic restric-tion. A non-differentiable penalty function (l∞) is usedto restrict the constraint violations. It is proved that thedescent sequence generated in this process converges to aKKT point under Mangaserian Fromovitz constraint qual-ifications. Global convergence of this scheme is justifiedwith some mild assumptions. The paper contains numeri-cal support for the theoretical developments.

Md Abu T. AnsaryIndian Institute of Technology [email protected]

Geetanjali PandaIndian Institute of Technology, Kharagpur, [email protected]

CP17

On the Fruitful Use of Damped Techniques WithinNonlinear Conjugate Gradient Methods

In this work we deal with the use of damped techniqueswithin Nonlinear Conjugate Gradient (NCG) methods forlarge scale unconstrained optimization. Damped tech-niques were introduced by Powell for SQP LagrangianBFGS methods and recently extended by Al-Baali to quasi-Newton methods [Al-Baali, Damped techniques for enforc-ing convergence of quasi-Newton methods, Optim MethodSoftw, 29, 919–936, 2014]. We consider their use for pos-sibly improving the efficiency and the robustness of NCGmethods in tackling difficult problems. In particular, wepropose the use of damped techniques for a twofold aim: formodifying (unpreconditioned) NCG methods and for con-structing preconditioners based on quasi-Newton updates[Caliciotti, Fasano, Roma, Novel preconditioners basedon quasi-Newton updates for nonlinear conjugate gradi-ent methods, to appear on Optim Lett], to be used in thepreconditioned NCG schemes. In the first case, we obtaina novel NCG method, hence the necessity of ensuring itsglobal convergence. In the second case, we propose a newgeneral methodology to be used for defining new precondi-tioning strategies for NCG methods. In the latter case, thedamped techniques enable to define efficient precondition-ers which approximate the inverse of the Hessian matrix,while still preserving information provided by the secantequation or some of its modifications. An extensive nu-merical experience confirms the effectiveness of both theapproaches proposed.

Massimo RomaSAPIENZA - Universita di [email protected]

Mehiddin Al-BaaliSultan Qaboos [email protected]

Andrea Caliciotti

Page 24: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 113

Dip. Ingegneria Informatica, Automatica e Gest.SAPIENZA, Universita’ di Roma, Rome, [email protected]

Giovanni FasanoDip. Matematica ApplicataUniversita’ Ca’ Foscari di [email protected]

CP17

Solving Two-Stage Adaptive Nonlinear Models viaDual Formulations

In this talk we derive a dualized formulation for two-stageadaptive nonlinear optimization models. In the literature,the original primal formulations are deemed intractable dueto nonlinearity in the second stage decisions. In contrast tothe primal formulation, our new dualized formulation is lin-ear in the second stage decisions. Therefore, we can applya wide array of methods to solve this model, including pop-ular techniques such as affine policies. We also show howthe dual formulation can be used to provide stronger lowerbounds on the performance of these affine policies. Ourresults are demonstrated on a variety of examples wherethe nonlinearity in the second stage decisions appears nat-urally.

Frans de Ruiter, Dick Den HertogTilburg [email protected],[email protected]

CP18

Efficient Combinatorial Optimization for GraphPartitioning Using Quantum Annealing

The recent availability of commercial D-Wave quantumannealers offers a new tool for solving NP-hard problemsalongside existing approaches based on heuristics and ex-haustive search (such as branch-and-bound). However, itis still unclear whether, and to what extent, the currentstate of quantum computing provides any advantage overexisting classical approaches for a given problem class. Inthis study, we address this question by assessing the per-formance of the D-Wave 2X quantum computer on graphpartitioning (GP), a well-studied and important NP-hardgraph problem. We study several versions of the GP prob-lem by providing formulations as quadratic unconstraintbinary optimization (QUBO) problem, the type acceptedby D-Wave, and by implementing those using three quan-tum solvers provided by D-Wave Systems (QSage, QBSolv,Sapi). We compare the performance of our quantum imple-mentations to classical GP algorithms (METIS, simulatedannealing, Gurobi) on a set of test graphs. We show thatD-Wave can solve the GP problems to optimality in mostcases, but so can the classical solvers, and in much shortertimes. This is probably due to the restriction on graphsizes (between 6-20 vertices) which are compatible withcurrent D-Wave hardware (with about 1000 qubits). Weobserve that the efficiency gap shrinks fast as the problemsize increases, thus leading us to conjecture that for graphswith several hundred vertices a noticeable advantage of thequantum versions can be expected.

Georg HahnImperial College [email protected]

Hristo Djidjev

Los Alamos National [email protected]

Guillaume RizkINRIA/Irisa [email protected]

Guillaume ChapuisLos Alamos National [email protected]

CP18

Combinatorial Optimization in Survey Statistics

Many problems in statistics require the solution of large-scale combinatorial and discrete optimization problems.Oftentimes, heuristics are used to find a near-optimal solu-tion, even though finding the global optimum is tractablein many cases. This observation is in particular addressedfor the constrained nearest neighbor hot deck imputationproblem. This problem arises in statistics where almostall population surveys suffer from missing responses. Tocreate a complete and coherent data set, usually single im-putation method are applied. The nearest neighbor hotdeck imputation replaces the missing values with observedvalues from the closest donor with respect to a distance,e.g., the Gower distance. The constrained variant of themethod additionally limits the number of donations persurvey unit. This setup can be transformed to a weighted b-matching problem and solved efficiently via minimum costflow algorithms.

Dennis Kreber, Ulf Friedrich, Jan BurgardTrier [email protected], [email protected],[email protected]

Sven De VriesUniversitat [email protected]

CP18

A New Approach for Solving Nonlinear Mixed In-teger DC Programs Based on a Continuous Re-laxation Without Integrality Gap and SmoothingTechnique

In this talk, we consider a mixed integer DC program(MIDCP) whose objective function is a dc function, that is,a function that can be represented as the difference of twoconvex functions. Recently, T. Maehara, N. Marumo, andK. Mutota (2015) proposed a new continuous relaxationtechnique for solving discrete DC programs with only in-tegral variables. Specifically, their approach does not giverise to a so-called integrality gap between the optimal val-ues of the original discrete DC program and continuouslyrelaxed problem. In this study, we firstly extend their re-sult to the MIDCP case. Based on this, we present a newalgorithm for solving the MIDCP. We further incorporatethe smoothing technique into the proposed one to handlenonsmooth functions. Finally, we give some convergenceresults for the proposed algorithms.

Takayuki OkunoDepartment of Applied Mathematics and PhysicsGraduate School of Informatics Kyoto Universityt [email protected]

Page 25: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

114 OP17 Abstracts

Yoshiko IkebeDepartment of Information and Computer TechnologyTokyo University of [email protected]

CP18

PIPS-SBB: A Distributed-Memory Linear-Programming-Based Branch-and-Bound Solver forStochastic Mixed-Integer Programs

Deterministic equivalent formulations of stochastic MIPsfrom applications such as unit commitment (UC) canexceed available memory on a single workstation. Toovercome this issue, we have developed PIPS-SBB, adistributed-memory parallel stochastic mixed-integer pro-gramming (SMIP) solver based on the distributed-memoryparallel stochastic linear programming solver PIPS-S. Wewill overview how PIPS-SBB uses the block-angular struc-ture of constraints in SMIPs to illustrate the parallel datastructures used in our branch-and-bound algorithm. Wethen discuss how to adapt common algorithms used withinMIP solvers to preserve both the block-angular structureof SMIPs and our parallel data layout. These algorithms,such as cutting plane generators, presolve, and primalheuristics, are used to accelerate convergence of commercialand open-source MIP solvers, and are necessary to solvelarge problem instances quickly. We also discuss the ef-fects of grouping natural blocks within SMIPs into larger,process-local blocks, which enables us to trade off betweenusing more MPI processes and generating tighter cuts thatalso consume more memory. Finally, we report the per-formance of PIPS-SBB on test problem instances in theSMIP benchmarking test set SIPLIB and on some UC in-stances to show that our software is competitive with ex-isting state-of-the-art MIP solver implementations.

Geoffrey M. OxberryLawrence Livermore National [email protected]

Lluis-Miquel MunguiaGeorgia Institute of TechnologyStewart School of Industrial & Systems [email protected]

Cosmin G. PetraLawrence Livermore NationalCenter for Advanced Scientific [email protected]

Pedro Sotorrio, Thomas Edmunds, Deepak RajanLawrence Livermore National [email protected], [email protected], [email protected]

CP18

Binary Variable Decompositions via ADMM inMixed Integer Programs

We will consider a class of mixed integer programs wherethe problem is convex except for a binary constraint on thevector x. We will discuss heuristics for problems where theobjective can be written as f(g(x, y)), for which we willreplace the inner function g(x, y) with its unconstrainedversion, while still being able to use ADMM to identifythe binary variable such that g(x, y) would be closest to itsunconstrained version by only checking 2n values, ratherthan exponential 2n, where n is the dimension of the binaryvariable. We will compare this heuristic both in terms of

the performance and complexity to the case where only thebinary vector x is replaced with its unconstrained counter-part, and then the unconstrained solution is rounded off toobtain the binary vector.

Michael RotkowitzElectrical and Computer EngineeringUniversity of Maryland, College [email protected]

Alborz AlavianUniversity of Maryland College [email protected]

CP19

On Shape-Changing Trust-Region Subproblems

We propose a method for solving large-scale L-SR1 trust-region subproblems where the trust-region is defined byso-called shape-changing norms that depend on the eigen-decomposition of the approximation to the Hessian. Theseshape-changing norms simplify solving the subproblems. Inparticular, one of the norms results in an explicit expressionof the solution, and in the other norm the solution can becharacterized by a set of optimality conditions. Since theL-SR1 matrix is not guaranteed to remain positive definite,we also propose an analytic expression for the subproblemsolution when the ”hard-case” occurs.

Johannes J. BrustUC [email protected]

Oleg BurdakovLinkoping [email protected]

Jennifer ErwayWake Forest [email protected]

Roummel F. MarciaUniversity of California, [email protected]

Ya-Xiang YuanChinese Academy of SciencesInst of Comp Math & Sci/Eng, [email protected]

CP19

Scalable Adaptative Cubic Regularization Methods

Adaptative cubic regularization (ARC) methods for uncon-strained optimization compute steps from linear systemswith a shifted Hessian in the spirit of the modified New-ton method. In the simplest case, the shift is a multiple ofthe identity, which is typically identified by trial and error.We propose a scalable implementation of ARC in whichwe solve a set of shifted systems concurrently by way of anappropriate Krylov solver

Jean-Pierre DussaultUniversite de [email protected]

Dominique OrbanGERAD and Dept. Mathematics and Industrial

Page 26: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 115

EngineeringEcole Polytechnique de [email protected]

CP19

A Characterization of Symmetric Quasi-NewtonUpdate Matrices for Quadratic Problems

In nonlinear optimization, a fundamental subproblem is tosolve a linear system corresponding to an unconstrainedquadric problem where the Hessian is symmetric positivedefinite. For quadratic problems, we give explicit condi-tions for the quasi-Newton update matrices which yieldsearch directions parallel to those of the conjugate gradi-ent method, if exact linesearch is used. The conditions aregiven for the quasi-Newton update matrices at a given iter-ate and provide a characterization of a family of symmetricrank-2 update matrices. We analyze the matrices and givenecessary and sufficient conditions for preserving positivedefiniteness of the quasi-Newton matrix. We show that byutilizing the secant condition, Broyden’s family is obtained,but in general it is not necessary for the final quasi-Newtonmatrix to coincide with the Hessian. We also discuss andevaluate possible extensions to nonquadratic problems.

David EkKTH Royal Institute of [email protected]

Anders ForsgrenKTH Royal Institute of TechnologyDept. of [email protected]

CP19

A Solver for Nonconvex Bound-ConstrainedQuadratic Optimization

We present a new algorithm for nonconvex bound-constrained quadratic optimization. In the strictly convexcase, our method is equivalent to the state-of-the-art al-gorithm by Dostal and Schoberl [Comput. Optim. Appl.,30 (2005), pp. 23–43]. Unlike their method, however, weestablish a convergence theory for our algorithm that holdseven when the problems are nonconvex. This is achievedby carefully addressing the challenges associated with di-rections of negative curvature, in particular, those thatmay naturally arise when applying the conjugate gradi-ent algorithm to an indefinite system of equations. Ourpresentation and analysis deal explicitly with both lowerand upper bounds on the optimization variables, whereasthe work by Dostal and Schoberl considers only strictlyconvex problems with lower bounds. To handle this gen-erality, we introduce the reduced chopped gradient thatis analogous to the reduced free gradient previously used.The reduced chopped gradient leads to a new conditionthat is used to determine when optimization over a givensubspace should be terminated. This condition, althoughnot equivalent, is motivated by a similar condition usedby Dostal and Schoberl. Numerical results illustrate thesuperior performance of our method over commonly usedsolvers that employ gradient projection steps and subspaceacceleration.

Hassan Mohy-Ud-DinYale UniversityPostdoctoral Associate

[email protected]

CP19

Global Convergence of Memoryless Quasi-NewtonMethods Based on Broyden Family for Uncon-strained Optimization

Quasi-Newton methods are widely used for solving uncon-strained optimization problems. For example, the BFGSand the DFP formulae are well-known as quasi-Newtonupdate formulae. The Broyden family is a one parame-ter family of quasi-Newton update formulae which includesthe BFGS and the DFP formulae. Although quasi-Newtonmethods are efficient for small or middle scale problems,it is difficult to apply quasi-Newton methods directly tolarge-scale unconstrained optimization problems, becausethey need the storage of memories for matrices. In or-der to overcome this difficulty, particular attention is paidto memoryless quasi-Newton methods. In this talk, wepropose memoryless quasi-Newton methods based on theBroyden family that do not need any storage for matrices.We also deal with a scaling technique. The presented meth-ods always satisfy the sufficient descent condition. Fur-thermore, we prove the global convergence property of theproposed methods for general objective functions under theassumption that the Wolfe conditions hold in line search.Finally, some numerical results are shown.

Shummin NakayamaTokyo University of Science,Graduate School of Science,[email protected]

Yasushi NarushimaYokohama National UniversityDepartment of Manegement System [email protected]

Hiroshi YabeTokyo University of ScienceDepartment of Mathematical Information [email protected]

CP19

Necessary Optimality Conditions and Exact Penal-ization for Non-Lipschitz Nonlinear Programs

Including a non-Lipschitz term in the objective functionhas significantly enlarged the applicability of standard non-linear programs. In particular it has recently been discov-ered that when the objective function belongs to a certainclass of non-Lipschitz functions, local minimizers are oftensparse. However when the objective function is not Lips-chitz, standard constraint qualifications are no longer suf-ficient for Kaush-Kuch-Tucker (KKT) conditions to holdat a local minimizer, let alone ensuring an exact penaliza-tion. In this paper we extend quasi-normality and relaxedconstant positive linear dependence condition to allow thenon-Lipschitzness of the objective function and show thatthey are sufficient for KKT condition to be necessary foroptimality. Moreover we also derive some exact penaltyresults.

Jane YeUniversity of VictoriaDept of Math & [email protected]

Lei Guo

Page 27: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

116 OP17 Abstracts

Shanghai Jiao Tong [email protected]

CP20

Solving Integer Programming Problems via Nu-merical Complex Integration

A novel analytic approach for the solution of integer pro-gramming problems with non-negative input is presented.The method combines generating function techniques withresults from complex analysis in several variables and isprimarily based on a multipath version of Cauchy’s multi-variate integral formula. The input data of the optimiza-tion problem is used as a parameter of an analytic func-tion in the unit disc, which links the optimization problemto the evaluation of a complex path integral. This repre-sentation allows the formulation of an algorithm that re-lies on numerical quadrature. The theoretical descriptionis supplemented with a discussion of challenges in prac-tical implementations of the method. In particular, it isdemonstrated how preprocessing with so-called path adap-tion methods can help to improve the condition number ofthe quadrature problem, whose efficient solution is essen-tial for the algorithm. An especially promising variant ofthe path adaption idea solves a shortest path problem on apredefined grid inside the unit disc. This leads to a refinedversion of the algorithm with better numerical stability andoverall performance.

Ulf FriedrichTrier [email protected]

CP20

Changing Graph Structure for Performing Fast,Approximate Inference in Graphical Models

Complexity of finding exact marginal distribution in prob-abilistic graphical models, known as marginal inference,is exponential in the treewidth of the underlying graph.We develop a method to perform approximate inference ondiscrete graphical models by modifying the graph to a newgraph structure having low treewidth; on which perform-ing exact inference is tractable. Performing exact inferenceon the new graph gives an approximate solution to the in-ference on original graph. We derive error bounds on theapproximate inference solution as compared to the exactinference solution. We show that the problem of findingparameters of the new graph which gives the tightest errorbounds can be formulated as a Linear Program (LP). Thenumber of constraints in the LP grow exponentially withthe number of nodes in the graph. To solve this issue, wedevelop a row generation algorithm to solve the LP. Wealso discuss heuristics for choosing the new graph.

Areesh MittalUniversity of Texas at [email protected]

CP20

Intersection Cuts for Convex Mixed Integer Pro-grams from Translated Cones

We develop a general framework for linear intersection cutsfor convex integer programs with full-dimensional feasibleregions by studying integer points of their translated tan-gent cones, generalizing the idea of Balas. For proper (i.e,full-dimensional, closed, convex, pointed) translated cones

with fractional vertices, we show that under certain mildconditions all intersection cuts are indeed valid for the in-teger hull, and a large class of valid inequalities for theinteger hull are intersection cuts, computable via polyhe-dral approximations. We also give necessary conditions fora class of valid inequalities to be tangent halfspaces of theinteger hull of proper translated cones. We also show thatvalid inequalities for non-pointed regular translated conescan be derived as intersection cuts for associated propertranslated cones under some mild assumptions.

Umakanta PattanayakIIT [email protected]

Vishnu NarayananAssistant professor, Industrial Engineering andOperationsResearch, Indian Institute of Technology [email protected]

CP20

An Approximation Algorithm for the Partial Cov-ering 0-1 Integer Program

We deal with the partial covering 0-1 integer program. Thisproblem is a generalization of some important problemssuch as the covering 0-1 integer program, the partial setmulti-cover problem and the partial set cover problem. Inthis paper, we propose a max{f, p + 1}-approximation al-gorithm for the problem, where f is the largest number ofnon-zero coefficients in the constraints and p is the maxi-mum number of constrains which are not satisfied.

Yotaro Takazawa, Shinji Mizuno, Tomonari KitaharaTokyo Institute of [email protected],[email protected], [email protected]

CP21

On the Minimization of the k-th Singular Value ofa Matrix

We consider the problem of minimizing a particular singu-lar value of a matrix variable, which is then subject to someconvex constraints. Convex heuristics for this problem arediscussed, including some counter-intuitive results regard-ing which is best, which then provide upper bounds on thevalue of the problem. The use of polynomial optimizationformulations is considered, particularly for obtaining lowerbounds on the value of the problem. We show that themain problem can also be formulated as an optimizationproblem with a bilinear matrix inequality (BMI), and dis-cuss the use of this formulation.

Alborz AlavianUniversity of Maryland College [email protected]

Michael RotkowitzElectrical and Computer EngineeringUniversity of Maryland, College [email protected]

CP21

A Stable Lagrangian-DNN Relaxation Algorithm

Page 28: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 117

for Polynomial Optimization Problems

We discuss the Lagrangian doubly nonnegative (LDNN)relaxation method (Kim, Kojima, and Toh, 2016) for aclass of polynomial optimization problems (POPs). TheLDNN relaxation method first formulates a DNN relax-ation of a POP and then apply the Lagrangian relaxationto the DNN problem. The resulting LDNN problem has alinear equality and two cone constraints. Since the dual ofthe LDNN problem has a single variable, it can be solvedby the bisection method; the feasibility of a point can beevaluated by first-order methods (FOMs). In theory, whenthe parameter of the Lagrangian relaxation increases, thelower bounds obtained by LDNN improve. In practice,however, selecting the parameter is not obvious becausethe bisection method often suffers from numerical instabil-ity of FOMs when the parameter is large. In this work, weshow that some of the constraints of DNN can be includedin the cone constraints without increasing the computationcost at each iteration of FOMs. This allows us to avoid us-ing Lagrangian relaxation for such constraints of DNN andmake the bisection method stable. Numerical results arepresented to demonstrate the stability of the proposed al-gorithm.

Naoki ItoUniversity of Tokyonaoki [email protected]

Sunyoung KimEwha W. UniversityDepartment of [email protected]

Masakazu KojimaChuo [email protected]

Akiko TakedaThe Institute of Statistical [email protected]

Kim-Chuan TohNational University of SingaporeDepartment of [email protected]

CP21

Completely Positive and Positive Semidefinite Ten-sor Relaxations for Polynomial Optimization

Completely positive (CP) tensors, which correspond toa generalization of CP matrices, allow to reformulateor approximate general polynomial optimization problems(POPs) with a conic optimization problem over the coneof CP tensors. Similarly, completely positive semidefi-nite (CPSD) tensors, which correspond to a generaliza-tion of PSD matrices, can be used to approximate generalPOPs with a conic optimization problem over the cone ofCPSD tensors. In this paper, we study CP and CPSD ten-sor relaxations for general POPs and compare them withthe bounds obtained via a Lagrangian relaxation of thePOP. This shows that existing results in this direction forquadratic POPs extend to general POPs. Also, we providesome tractable approximation strategies for CP and CPSDtensor relaxations. These approximation strategies showthat, with a similar computational effort, bounds obtainedfrom them for general POPs can be tighter than boundsfor these problems obtained by reformulating the POP as a

quadratic POP and subsequently using approximations forthe quadratic problem using CP and PSD matrices. To il-lustrate our results, we present numerical results for theserelaxation approaches on small scale fourth-order degreePOPs.

Xiaolong KuangPhD Candidate at Lehigh [email protected]

Luis ZuluagaLehigh [email protected]

CP21

Combining SOS and Moment Relaxations withBranch and Bound to Extract Solutions to GlobalPolynomial Optimization Problems

We consider the problem of extracting approximate so-lutions to Global Polynomial Optimization (GPO) us-ing a combination of branch and bound and Sum-of-Squares(SOS)/Moment relaxations. Specifically, we usehyperplanes to branch the feasible set of GPO problem.For each of the branches, SOS/Moment relaxations areused to find lower bounds to the optimal objective valuesof those GPO problems defined over the subdivided fea-sible sets. These bounds allow us to select branches thatshrink the search domain of the GPO problem. For anygiven ε > 0 and η > 0, there exist l and k sufficientlylarge s.t. after l branchings with SOS/Moment relaxationsof degree k, the algorithm finds a point that is within ε-distance from a feasible and η-suboptimal point of the GPOproblem. For a fixed degree of relaxations, the complex-ity of the algorithm is linear the number of branches andpolynomial in the number of constraints. We illustrate thealgorithm for a 6-variable, 6-constraint GPO problem forwhich the ideal generated by the equality constraints is notzero-dimensional - a case where the existing Moment-basedapproach for extracting the global minimizer fails.

Hesameddin Mohammadi, Matthew PeetArizona State [email protected], [email protected]

MS1

Finding Infimum Point with Respect to the SecondOrder Cone

We define the notion of infimum and supremum of a set ofpoints with respect to the second order cone. These prob-lems can be formulated as second order cone optimizationand thus solvable by interior point methods in polynomialtime. We present an extension of the simplex method tosolve these problems. We outline both primal and dualversions of the simplex method. We also show some appli-cations of infimum and supremum problems.

Marta CavaleiroRutgers [email protected]

MS1

The Conic Optimizer in Mosek: A Status Report

We discuss recent developments in the conic optimizer inMOSEK. We present numerical results illustrating the im-provements on challenging benchmark problems, and we

Page 29: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

118 OP17 Abstracts

discuss how many difficult problems can be solved moreeasily and reliably using model reformulations.

Joachim [email protected]

Erling D. AndersenMOSEK ApSCopenhagen, [email protected]

MS1

Analysis of Newton’s Method via Semidefinite Pro-gramming Performance Estimation Problems

We first consider the gradient (or steepest) descent methodwith exact line search applied to a strongly convex func-tion with Lipschitz continuous gradient. We establish theexact worst-case rate of convergence of this scheme. Wealso give the tight worst-case complexity bound for a noisyvariant of gradient descent method, where exact line-searchis performed in a search direction that differs from nega-tive gradient by at most a prescribed relative tolerance.Finally, we show that this analysis may be extended toanalyse the worst-case performance of Newton’s methodfor self-concordant functions. The proofs are computer-assisted, and rely on the resolution of semidefinite program-ming performance estimation problems as introduced inthe paper [Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex minimization: a novelapproach. Mathematical Programming, 145(1-2):451-482,2014].

Etienne De KlerkTilburg [email protected]

Francois GlineurUniversite Catholique de Louvain (UCL)Center for Operations Research and Econometrics(CORE)[email protected]

Adrien TaylorUCL, [email protected]

MS1

Perturbation Analysis of Singular SemidefiniteProgramming via the Facial Reduction

In this talk, we study behavior of the optimal values andthe solutions of semidefinite programs under perturbationson the coefficient matrices . If the feasible set of the pri-mal problem has a nonempty interior, the optimal valuechanges continuously when only the constant terms (usu-ally in the right hand sides) are perturbed. However theoptimal value could change both continuously and discon-tinuously when we perturb the coefficient matrices of thevariables in the constraints. We will use the Facial Re-duction algorithm to find what kind of perturbations doesnot raise discontinuity. We note that, although the FacialReduction algorithm is formulated with SDPs for generalcases, it can be done without solving SDPs, if the givenproblem comes from matrix completion problems, polyno-mial optimization problems or H-infinity control problems.

This is a joint work with H. Waki.

Yoshiyuki SekiguchiTokyo University of Marine Science and [email protected]

MS2

Polynomial Norms

We establish when the d-th root of a multivariate degree-dpolynomial produces a norm. In the quadratic case, it iswell-known that this is the case if and only if the matrixassociated with the quadratic form is positive definite. Wepresent a generalization of this result to higher order poly-nomials and show that there is a hierarchy of semidefiniteprograms that can detect all polynomial norms. We alsoshow that any norm can be approximated arbitrarily wellwith a polynomial norm.

Amir Ali AhmadiPrinceton Universitya a [email protected]

Etienne De KlerkTilburg [email protected]

Georgina HallPrinceton [email protected]

MS2

Convergence Rate of SoS Based Upper Bounds inPolynomial Optimization

We consider the problem of minimizing a polynomial fover a compact region K. As shown by Lasserre (2011),hierarchies of upper bounds converging to the minimum off over K can be obtained by searching for a probabilitymeasure μ on K with a sum-of-squares density function hminimizing the expected value

∫Kh(x)f(x)μ(dx). We will

discuss several recent results about the convergence rateof these hierarchies. For a general compact set K satis-fying a mild boundary condition (and selecting for μ theLebesgue measure), one can show a convergence rate inO(1/

√r), where 2r is the degree of h. When K is a convex

body an improved convergence rate in O(1/r) can be shownby exploiting a connection with simulated annealing basedbounds. For simple regions like the hypercube, faster con-vergence rates can be shown by using other choices for themeasure μ, as well a simpler hierarchy leading to boundswhose computation requires only elementary operations.

Monique LaurentCWI, Amsterdamand Tilburg [email protected]

Etienne De KlerkTilburg [email protected]

Roxana HessLAAS CNRS [email protected]

Jean B. LasserreLAAS CNRS and Institute of Mathematics

Page 30: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 119

[email protected]

Zhao SunSymetrics, [email protected]

MS2

Symmetric Tensor Nuclear Norm

This talk discusses nuclear norms of symmetric tensors. Asrecently shown by Friedland and Lim, the nuclear norm ofa symmetric tensor can be achieved at a symmetric decom-position. We show how to compute symmetric tensor nu-clear norms, depending on the tensor order and the groundfield. Lasserre relaxations are proposed for the computa-tion. The theoretical properties of the relaxations are stud-ied. For symmetric tensors, we can compute their nuclearnorms, as well as the nuclear decomposition. The proposedmethods can also be extended to nonsymmetric tensors.

Jiawang NieUniversity of California, San [email protected]

MS2

Investigation of Crouzeix’s Conjecture via Nons-mooth Optimization

Crouzeix’s conjecture is among the most intriguing devel-opments in matrix theory in recent years. Made in 2004by Michel Crouzeix, it postulates that, for any polynomialp and any matrix A, ‖p(A)‖ ≤ 2 max(|p(z)| : z ∈ W (A)),where the norm is the 2-norm and W (A) is the field of val-ues (numerical range) of A, that is the set of points attainedby v∗Av for some vector v of unit length. Crouzeix provedin 2007 that the inequality above holds if 2 is replaced by11.08, and very recently this was greatly improved by Pa-lencia, replacing 2 by 1+

√2. Furthermore, it is known that

the conjecture holds in a number of special cases, includ-ing n = 2. We use nonsmooth optimization to investigatethe conjecture numerically by attempting to minimize the“Crouzeix ratio”, defined as the quotient with numeratorthe right-hand side and denominator the left-hand side ofthe conjectured inequality. We present numerical resultsthat lead to some theorems and further conjectures, includ-ing variational analysis of the Crouzeix ratio at conjecturedglobal minimizers. All the computations strongly supportthe truth of Crouzeix’s conjecture.

Michael L. OvertonNew York UniversityCourant Instit. of Math. [email protected]

Anne GreenbaumUniversity of WashingtonMathematics [email protected]

Adrian LewisCornell UniversityOper. Res. and Indust. [email protected]

MS3

Optimization and Control in Free Boundary Fluid-

Structure Interactions

We consider optimization and optimal control problemssubject to free and moving boundary fluid-structure inter-actions. As the coupled fluid-structure state is the solutionof a system of PDEs that are coupled through continuityrelations defined on the free/moving interface, the investi-gation (existence of optimal controls, sensitivity equations,optimality conditions, etc.) is heavily dependent on the ge-ometry of the problem, and falls into moving shape analysisframework.

Lorena BociuN. C. State UniversityRaleigh, [email protected]

Lucas CastleNorth Carolina State [email protected]

MS3

Traffic Regulation via Controlled Speed Limit

We study an optimal control problem for traffic regula-tion via variable speed limit. The traffic flow dynamicsis described with the Lighthill-Whitham-Richards (LWR)model with Newell-Daganzo flux function. We aim at min-imizing the L2 quadratic error to a desired outflow, givenan inflow on a single road. We first provide existence ofa minimizer and compute analytically the cost functionalvariations due to needle-like variation in the control policy.Then, we compare three strategies: instantaneous policy;random exploration of control space; steepest descent us-ing numerical expression of gradient. We show that thegradient technique is able to achieve a cost within 10% ofrandom exploration minimum with better computationalperformances.

Maria Laura Delle [email protected]

Benedetto PiccoliRutgers [email protected]

Francesco RossiLSIS, Aix-Marseille [email protected]

MS3

Sensitivity Analysis for Hyperbolic Systems ofConservation Laws

Sensitivity analysis (SA) concerns the quantification ofchanges in Partial Differential Equations (PDEs) solutiondue to perturbations in the model input. SA has many ap-plications, among which uncertainty quantification, quickevaluation of close solutions, and optimization based on de-scent methods. Standard SA techniques for PDEs, such asthe continuous sensitivity equation method, rely on the dif-ferentiation of the state variable. However, if the governingequations are hyperbolic PDEs, the state can exhibit dis-continuities yielding Dirac delta functions in the sensitivity.The aim of this work is to modify the sensitivity equationsto obtain a solution without delta functions. This is mo-tivated by several reasons: firstly, a delta function cannotbe seized numerically, leading to an incorrect solution for

Page 31: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

120 OP17 Abstracts

the sensitivity in the neighborhood of the state discontinu-ity; secondly, for some applications like optimization andevaluation of close solutions, the peaks appearing in thenumerical solution of original sensitivity equations makesensitivities unusable. We propose a two-steps procedure:(i) definition of a correction term added to the right-handside of the sensitivity equations; (ii) definition of a shockdetector ensuring that the correction term is added onlywhere needed. We show how this procedure can be appliedto the Euler barotropic system with two different finite-volume formulations, based either on an exact Riemannsolver or the approximate Roe solver.

Camilla FioriniUniversity of [email protected]

Regis DuvigneauINRIA Sophia [email protected]

Christophe Chalonst Universite Versailles Saint-Quentin-en-Yvelines,[email protected]

MS3

A Fokker-Planck Nash Game to Model PedestrianMotion

Fokker-Planck-Kolmogorov(FPK) equations are PDEswhich govern the dynamics of the probability density func-tion (PDF) of continuous-time stochastic processes (e.g.Ito processes). In [1] a FPK-constrained control frame-work, where the drift was considered as control variable isdeveloped and applied to crowd motion.

We consider herein the extension of the latter framework tothe case where two crowds (or pedestrian teams) are com-peting through a Nash game. The players strategies are thedrifts, which yield two uncoupled FPK equations for thecorresponding PDFs. The interaction is done through costfunctions : each player would prefer to avoid overcrowd-ing (w.r.t. the other one, hence the coupling) additionallyto have her own preferred trajectory and obstacle avoid-ance. In this particular setting, we prove the existenceand uniqueness of the Nash equilibrium (NE). The NE iscomputed by means of a fixed point algorithm and adjoint-state method is used to compute the pseudo-gradients. Wefinally present some numerical experiments to illustratewhich dynamics may arise from such equilibria.

[1] S. Roy and M. Annunziato and A. Borzı, A Fokker-Planck feedback control-constrained approach for mod-elling crowd motion, Journal of Computational and Theo-retical Transport, 45(6), 2016, 442-458.

Alfio BorziInstitute for MathematicsUniversity of [email protected]

Abderrahmane HabbalUNIVERSITY COTE D’AZUR UCA, UNS, CNRS andInria, [email protected]

Souvik RoyInstitute for MathematicsUniversity of Wurzburg

[email protected]

MS4

Low Rank Factor Analysis

Factor Analysis (FA) is a technique that is widely used toobtain a parsimonious representation of correlation struc-ture among a set of variables in terms of a smaller numberof common hidden factors. In this talk we revisit the clas-sical rank-constrained FA problem and propose a flexiblefamily of exact, smooth reformulations for this task. Bycoupling state-of-the-art techniques from nonlinear opti-mization with methods from global optimization, we showthat our approach often finds certifiably optimal solutionsand significantly outperforms existing publicly availablemethods across a variety of real and synthetic examples.

Dimitris BertsimasMassachusetts Insitute of [email protected]

Martin S. CopenhaverMassachusetts Institute of [email protected]

Rahul [email protected]

MS4

Fitting Convex Sets to Data via Matrix Factoriza-tion

High-dimensional datasets arise prominently in a range ofcontemporary problem domains throughout science andtechnology. In many of these settings, the data are oftenconstrained structurally so that they only have a few de-grees of freedom relative to their ambient dimension. Meth-ods such as manifold learning, dictionary learning, andothers aim at computationally identifying such latent low-dimensional structure. We describe a new approach to in-ferring the low-dimensional structure underlying a datasetby fitting a convex set with favorable facial structure tothe data. The convex set identifies a regularizer that istractable to compute, and is useful for subsequent process-ing tasks. Our procedure is based on computing a struc-tured matrix factorization and it includes several previoustechniques as special cases. We illustrate the utility of ourmethod with experimental demonstrations.

Yong Sheng SohCalifornia Institute of [email protected]

MS4

Interpreting Latent Variables in Factor Models viaConvex Optimization

Latent or unobserved phenomena pose a significant dif-ficulty in data analysis as they induce complicated andconfounding dependencies among a collection of observedvariables. Factor analysis is a prominent multivariate sta-tistical modeling approach that addresses this challenge byidentifying the effects of (a small number of) latent vari-ables on a set of observed variables. However, the latentvariables in a factor model are purely mathematical ob-jects that are derived from the observed phenomena, and

Page 32: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 121

they do not have any interpretation associated to them.In this talk, we describe a systematic approach for at-tributing semantic information to the latent variables ina factor model. Our method is based on solving computa-tionally tractable convex optimization problems, and it canbe viewed as a generalization of the minimum-trace factoranalysis procedure for fitting factor models via convex op-timization. We analyze the theoretical consistency of ourapproach in a high-dimensional setting as well as its util-ity in practice via experimental demonstrations with realdata.

Armeen TaebCalifornia Institute of TechnologyCalifornia institute of [email protected]

MS4

Generalized Permutohedra from ProbabilisticGraphical Models

We consider the problem of learning a Bayesian networkor directed acyclic graph (DAG) model. We propose thesparsest permutation algorithm, a nonparametric approachbased on finding the ordering of the variables that yieldsthe sparsest DAG. In order to compute the sparsest per-mutation, we introduce the DAG associahedron. This isa generalized permutohedron, the analog of the prominentgraph associahedron for DAGs. We end by introducing asimplex-type algorithm on this convex polytope that prov-ably converges to the sparsest permutation.

Caroline UhlerMassachusetts Institute of [email protected]

MS5

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-LojasiewiczCondition

In 1963, Polyak proposed a simple condition that is suffi-cient to show a global linear convergence rate for gradientdescent. This condition is a special case of the Lojasiewiczinequality proposed in the same year, and it does not re-quire strong convexity (or even convexity). In this work,we show that this much-older Polyak-Lojasiewicz (PL) in-equality is actually weaker than the five main conditionsthat have been explored to show linear convergence rateswithout strong convexity over the last 25 years. We alsouse the PL inequality to give new analyses of coordinate de-scent and stochastic gradient for many non-strongly-convex(and some non-convex) functions. We further propose ageneralization that applies to proximal-gradient methodsfor non-smooth optimization, leading to simple proofs oflinear convergence for support vector machines and L1-regularized least squares without additional assumptions.

Hamed Karimi, Julie Nutini, Mark SchmidtUniversity of British [email protected], [email protected],[email protected]

MS5

Stochastic Heavy Ball Method for Solving LinearSystems

Linear systems form the backbone of most numerical codesused in academia and industry. With the advent of theage

of big data, practitioners are looking for ways to solve lin-ear systems of unprecedented sizes. In this talk we presenta stochastic variant of the Heavy Ball method for solvinglarge scale linear systems. We prove linear convergencewith an accelerated rate under no assumptions on the sys-tem beyond consistency.

Nicolas LoizouThe University of [email protected]

Peter RichtarikUniversity of [email protected]

MS5

Linear Convergence of the Randomized FeasibleDescent Method Under the Weak Strong Convex-ity Assumption

In this paper we generalize the framework of the FeasibleDescent Method (FDM) to a Randomized (R-FDM) anda Randomized Coordinate-wise Feasible Descent Method(RC-FDM) framework. We show that the famous SDCAalgorithm for optimizing the SVM dual problem, or thestochastic coordinate descent method for the LASSO prob-lem, fits into the framework of RC-FDM. We prove linearconvergence for both R-FDM and RC-FDM under the weakstrong convexity assumption. Moreover, we show that theduality gap converges linearly for RC-FDM, which impliesthat the duality gap also converges linearly for SDCA ap-plied to the SVM dual problem.

Chenxin MaLehigh [email protected]

Martin TakacUniversity of [email protected]

MS5

Randomized Projection Methods for Convex Fea-sibility Problems

Finding a point in the intersection of a (finite or infinite)collection of convex sets, that is the convex feasibility prob-lem, represents a modeling paradigm which was used formany decades for posing and solving various problems aris-ing in engineering, computer science, applied mathematics,physics and other fields. In this paper, we first discuss dif-ferent representations for a collection of convex sets andderive several stochastic reformulations of the convex fea-sibility problem. We then introduce a new general ran-dom projection algorithmic framework that at each itera-tion chooses randomly N sets from this collection, projectsin parallel the present iterate on each chosen set and thencomputes the new update using an average of these pro-jections with a certain step size. We derive convergencerate results for this general random projection method forboth cases: sublinear convergence when we deal with gen-eral convex sets and linear convergence when the collectionof sets satisfies a certain linear regularity condition. More-over, our analysis allows large step sizes (that is step sizeslarger than one) and we prove that the convergence ratesdepend directly on the number N of sets that are chosenrandomly. We also show that some well-known projectionmethods, such as the alternating projection or the averageprojection schemes, result as special cases of our algorith-

Page 33: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

122 OP17 Abstracts

mic framework.

Ion NecoaraUniversity Politehnica [email protected]

Peter RichtarikUniversity of [email protected]

Andrei PatrascuUniversity Politehnica of [email protected]

MS6

Fractional Elliptic QVI: Theory and Numerics

This talk introduces an elliptic quasi-variational inequal-ity (QVI) problem class with fractional diffusion of orders ∈ (0, 1), studies existence and uniqueness of solutionsand develops a solution algorithm. As the fractional diffu-sion prohibits the use of standard tools to study the QVI,instead we realize it as a Dirichlet-to-Neumann map fora nonuniformly elliptic equation posed on a semi-infinitecylinder. We first study existence and uniqueness of solu-tion for this extended QVI and then transfer the resultsto the fractional QVI. This introduces a new paradigm inthe field of fractional QVIs. We truncate the semi-infinitecylinder and show that the solution to the truncated prob-lem converges to the solution to the extended problem,under fairly mild assumptions, as truncation parameterτ → ∞. Since the constraint set changes with the solu-tion, we develop an argument using Mosco convergence.We state an algorithm to solve the truncated problem andshow its convergence in function spaces. Finally, we con-clude with several illustrative numerical examples.

Harbir AntilGeorge Mason UniversityFairfax, [email protected]

Carlos N. RautenbergHumboldt-University of [email protected]

MS6

A Priori Error Estimates for Space-Time FiniteElement Discretization of Parabolic Time-OptimalControl Problems

We consider the following time-optimal control problem(P) subject to a linear parabolic equation with control qand state u:

Minimize j(T, q) := T +α

2

∫ T

0

‖q(t)‖2L2(Ω) dt

subject toT > 0, q ∈ Qad

∂tu+Au = Bq in (0, T ) × Ω, u(0) = u0in Ω,

‖u(T ) − ud‖L2(Ω) ≤ δ,

where ud ∈ H10 (Ω) is the desired state and δ > 0 is given.

Moreover, Qad denotes the set of admissible controls andα > 0 the regularization parameter. Our aim is the numer-ical analysis for finite element discretizations of (P). It isworth mentioning that (P) is non-convex, possesses a non-linear dependence on T and is subject to state constraints.

We derive second order necessary as well as sufficient opti-mality conditions for (P) using a critical cone that leads toa minimal gap between necessity and sufficiency. For thenumerical solution we consider a discontinuous Galerkindiscretization scheme in time and a continuous Galerkinscheme in space. The main novelty are a priori discretiza-tion error estimates of optimal order for T and q under thehypothesis of second order sufficient optimality conditions.

Lucas BonifaciusTechnical University of MunichCentre for Mathematical Sciences, [email protected]

Boris VexlerTechnische Universitaet MuenchenCentre for Mathematical Sciences, [email protected]

Konstantin PieperFlorida State UniversityDept of Scientific [email protected]

MS6

Numerical Aspects and Error Estimates for Dirich-let Boundary Control Problems

In this talk we discuss convergence results for finite ele-ment discretized Dirichlet control problems in polygonaldomains. We investigate unconstrained as well as controlconstrained problems. In both cases we discretize the stateand the control by piecewise linear and continuous func-tions. The error estimates, which we obtain, mainly de-pend on the size of the interior angles but also on the pres-ence of control constraints and the structure of the under-lying mesh. For instance, considering non-convex domains,the convergence rates of the discrete optimal controls in theunconstrained case can even be worse than in the controlconstrained case.

Johannes PfeffererTechnische Universitat MunchenZentrum [email protected]

Thomas ApelUniversitat der Bundeswehr [email protected]

Mariano MateosUnevesidad de Oviedo33203 Gijon (Asturia), [email protected]

Arnd RoschUniversity of [email protected]

MS6

Sparse Control for Inverse Point Source Locationwith the Helmholtz Equation

We address the problem of recovering a superposition oflocalized sound sources, described by a limited number offrequencies, from point-wise measurements of the corre-sponding acoustic pressure. To this purpose, we investi-

Page 34: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 123

gate a family of weighted, sparse, measure-valued controlproblems in combination with the Helmholtz equation on abounded domain. The observation is given at a finite col-lection of points (representing M microphones) and solu-tions of the problem are finite sums of Dirac-delta functions(acoustic monopoles). To obtain well-posedness in a gen-eral setting, the weights are chosen unbounded near the ob-servation points. Furthermore, optimality conditions andconditions for recovery in the small noise case are discussed,which motivates concrete choices of the weight. The nu-merical realization is based on an accelerated conditionalgradient method and a finite element discretization. Nu-merical examples confirm that an appropriate choice of theweight significantly increases the number of cases where re-covery is possible with the proposed problem formulation.

Konstantin PieperFlorida State UniversityDept of Scientific [email protected]

Bao Q. Tang, Philip TrautmannInstitute of Mathematics and Scientific ComputingUniversity of [email protected], [email protected]

Daniel WalterTechnische Universitat [email protected]

MS7

Shape-Changing Limited-Memory SR1 Trust-Region Methods

In this talk, we discuss methods for solving the trust-regionsubproblem when a limited-memory symmetric rank-one(SR1) matrix is used in place of the true Hessian. Weconsider two different shape-changing norms to define thetrust region. In both cases, the proposed methods takeadvantage of the two shape-changing norms to decomposethe subproblem into two separate problems–both easy tosolve. The proposed solver makes use of the structure oflimited-memory SR1 matrices to find solutions that solveeach subproblem to high accuracy even in the so-called”hard case”. This is joint work with Johannes Brust, OlegBurdakov, Roummel Marcia, and Ya-xiang Yuan.

Jennifer ErwayWake Forest [email protected]

Johannes J. BrustUC [email protected]

Oleg BurdakovLinkoping [email protected]

Roummel F. MarciaUniversity of California, [email protected]

Yaxiang YuanLaboratory of Scientific and Engineering ComputingChinese Academy of Science

[email protected]

MS7

On Solving Symmetric Systems of Linear EquationsArising in Optimization

Solving systems of linear equations where the matrix issymmetric is an important part of many optimizationmethods intended for smooth optimization problems. Itcould be methods based on linear equations as such, for ex-ample active-set methods for linear and convex quadraticprogramming, or methods based on solving nonlinear equa-tions by solving a sequence of linear equations, for exampleinterior methods. It is not always so clear which equationsto solve. Many methods are Newton-based, but it may bedesirable to use quasi-Newton approximations of the Hes-sian. We discuss particular systems of symmetric linearequations arising in optimization. Some are genuinely in-definite, whereas others may be converted into positive def-inite matrices. A particular application is interior methods,where the indefinite augmented system may be identifiedby a doubly augmented system, which is positive definite.

Anders ForsgrenKTH Royal Institute of TechnologyDept. of [email protected]

MS7

Using Nonlinear Optimization to Adjust ProteinSimilarity Score Matrices for Amino Acid Compo-sition

The algorithm in BLAST (Basic Local Alignment SearchTool) and similar algorithms align protein sequences to as-sess whether the sequences are significantly similar. A pos-itive score is awarded for each position for which there is anexact match at an aligned position, and a negative score istypically assigned each mismatch. These awards and penal-ties are known as similarity scores and depend on the spe-cific pairs of amino acids aligned. Gaps within alignmentsare allowed but penalized with a negative score. The val-ues for similarity scores and gap penalties may be chosentogive on average good results. However, such generic scor-ing systems inevitably yield biologically undesirable resultson sequences with skewed amino acid composition. Wedescribe a nonlinear information-theoretic entropy mini-mization problem used within BLAST to adjust proteinsimilarity scores for amino acid composition. We showthat composition adjustment maintains the retrieval effi-ciency of BLAST searches, while improving the accuracyof BLAST statistics.

E. Michael GertzNational Center for Biotechnology InformationNational Institutes of [email protected]

MS7

Adding Primal Regularization to a Primal-DualAugmented Lagrangian Approach

This talk focuses on the importance of adding explicit pri-mal regularization to primal-dual augmented Lagrangianapproaches. In particular we demonstrate how doing soprovides mechanisms for implicitly bounding both primaland dual Lagrange multiplier estimates. This is of par-ticular importance for primal-dual approaches that hope

Page 35: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

124 OP17 Abstracts

to retain a one-to-one correspondence between the searchdirection computation and the corresponding primal-dualmerit function. Without primal regularization, numericaldifficulties exist when solving problems that are even mod-erately ill-conditioned or infeasible. The advantageous cou-pling of the search direction with the corresponding meritfunction is therefore typically abandoned as a solution isapproached.

Joshua Griffin, Riadh OmheniSAS Institute, [email protected], [email protected]

Wenwen ZhouSAS Institute [email protected]

MS8

A Distributed ADMM-Like Method for ResourceSharing Under Conic Constraints Over Time-Varying Networks

In this talk, decentralized methods for solving cooperativemulti-agent optimization problems over a network of agentswill be discussed, where only those agents connected by anedge can directly communicate with each other. The ob-jective is to minimize the sum of agent-specific compositeconvex functions, i.e., each term in the sum is a privatecost function belonging to an agent, subject to a conic con-straint coupling all agents local decisions. An ADMM-likeprimal-dual algorithm is proposed to solve a saddle pointformulation of the problem in a distributed fashion, wherethe consensus among agents is imposed on the agents’estimates of the dual price associated with the couplingconstraint. We provide convergence rates both in sub-optimality, infeasibility, and consensus violation in dualprice estimates; examine the effect of underlying networktopology on the convergence rate of the proposed decen-tralized algorithm; and show how to extend this method tohandle communication networks with time-varying topol-ogy and directed networks. This optimization model ab-stracts a number of applications in machine learning, dis-tributed control, and estimation using sensor networks.

Necdet S. Aybat, Erfan Yazdandoost HamedaniPenn State [email protected], [email protected]

MS8

Structured Nonconvex and Nonsmooth Optimiza-tion: Algorithms and Iteration Complexity Analy-sis

We consider in this paper some constrained nonconvex op-timization models in block decision variables, with or with-out coupled affine constraints. In the case of no coupledconstraints, we show a sublinear rate of convergence toan ε-stationary solution in the form of variational inequal-ity for a generalized conditional gradient method, wherethe convergence rate is shown to be dependent on theHolderian continuity of the gradient of the smooth partof the objective. For the model with coupled affine con-straints, we introduce corresponding ε-stationarity con-ditions, and propose two proximal-type variants of theADMM to solve such a model, assuming the proximalADMM updates can be implemented for all the block vari-ables except for the last block, for which either a gradientstep or a majorization-minimization step is implemented.We show an iteration complexity bound of O(1/ε2) to reach

an ε-stationary solution for both algorithms. Moreover, weshow that the same iteration complexity of a proximal BCDmethod follows immediately. Numerical results are pro-vided to illustrate the efficacy of the proposed algorithmsfor tensor robust PCA.

Bo JiangSchool of Information Management and EngineeringShanghai University of Finance and [email protected]

MS8

Geometric Descent Method for Composite ConvexMinimization

In the talk, we discuss how to extend the geometric de-scent method that is recently proposed by Bubeck, Leeand Singh, to solving composite convex minimization prob-lems. Techniques that can potentially accelerate the algo-rithm are also discussed. Numerical results show that thismethod can be advantageous comparing with Nesterov’saccelerated gradient method for some applications.

Shiqian Ma, Shixiang ChenThe Chinese University of Hong [email protected], [email protected]

MS8

Gradient Sampling Methods on Riemannian Man-ifolds and Algebraic Varieties

Gradient sampling (GS) is a conceptually simple methodfor optimization of locally Lipschitz functions. The ideais to approximate the subdifferential at a given pointby a convex hull of randomly sampled nearby gradients.We consider generalizations of this approach to optimiza-tion problems posed on complete Riemannian manifolds orclosed algebraic varieties admitting an a-regular stratifi-cation. The latter scenario is motivated by optimizationproblems on low-rank matrices or tensors. The main taskis to prove that the minimum norm element in a convexhull of vectors, which are obtained by transporting gradi-ents from nearby points to the current tangent space, isa descent direction with respect to a retraction, providedthat the number of sampled gradients equals at least thedimension of the manifold/variety plus one. For varieties itis further necessary to discuss what should happen in thesingular points, where one may be faced with a tangentcone that is not convex. Under reasonable assumptions,we are able to prove that with probability one the limitpoints of our proposed GS algorithms are Clarke station-ary points relative to their strata. Numerical experimentsillustrate that the approach can be useful on manifolds ofnot too high dimension.

Andre Uschmajew, Seyedehsomayeh HosseiniUniversity of [email protected], [email protected]

MS9

Fast Algorithms for Nonconvex Robust Learning

We show how to transform any optimization problem thatarises from fitting a machine learning model into one that(1) detects and removes contaminated data from the train-ing set and (2) simultaneously fits the trimmed model onthe remaining uncontaminated data. To solve the result-ing nonconvex optimization problem, we introduce a fast

Page 36: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 125

stochastic proximal-gradient algorithm that incorporatesprior knowledge through non-smooth regularization.

Aleksandr AravkinUniversity of [email protected]

Damek DavisCornell [email protected]

MS9

Nonsmooth Optimization Using Taylor-Like Mod-els: Error Bounds, Convergence, and TerminationCriteria

Common algorithms rely on minimizing simple Taylor-likemodels of the objective function. The proximal gradientalgorithm and methods of Gauss-Newton type for mini-mizing the composition of a convex function and a smoothmap are typical examples. Our main result is an explicit re-lationship between the step-size of any such algorithm andthe slope of the objective function at a nearby point. Con-sequently, we (1) show that the step-sizes can be reliablyused to terminate the algorithm, (2) prove that as long asthe step-sizes tend to zero, every limit point of the iteratesis stationary, and (3) show that conditions, akin to clas-sical quadratic growth, imply that the step-sizes linearlybound the distance of the iterates to the solution set. Thelatter so-called error bound property is typically used toestablish linear (or faster) convergence guarantees. Anal-ogous results hold when the step-size is replaced by thesquare root of the decrease in the model’s value. All theresults readily extend to when the models are minimizedonly inexactly.

Dmitriy DrusvyatskiyUniversity of WashingtonUniversity of [email protected]

MS9

Radial Subgradient Descent

We discuss a subgradient method for minimizing non-smooth, non-Lipschitz convex optimization problems. Weextend the work of Renegar by taking a different perspec-tive, leading to an algorithm which is conceptually morenatural, has notably improved convergence rate, and forwhich the analysis is surprisingly simple. Convergence oc-curs at a rate analogous to traditional methods that re-quire Lipschitz continuity. At each iteration, the algorithmtakes a subgradient step and then performs a line searchto move radially towards (or away from) a known feasiblepoint. Costly orthogonal projections typical of subgradientmethods are entirely avoided.

Benjamin GrimmerCornell UnviersityDepartment of Operations Research and [email protected]

MS9

First-Order Methods for Singular Kalman smooth-ing with PLQ Penalties

Kalman smoothing is widely used in science and engineer-

ing to reconstruct hidden states from a series of measure-ments. Classic models assume process and measurementerrors have non-degenerate (full-rank) variances. However,singular covariance models can capture integrated signalsand colored noise in many applications. We use a con-strained reformulation to allow singular covariances in bothprocess and measurement errors. The resulting problemcan be elegantly solved in the dual, where it reduces toinverting a single block tridiagonal positive definite sys-tem, just as in the classic case. We use this insight todevelop efficient first-order methods for general singularsmoothers, where process and measurement losses can bepiecewise linear-quadratic functions (such as huber, and 1-norm), and simple regularizes on the state can be added aswell.

Jonathan Jonker, Aleksandr AravkinUniversity of [email protected], [email protected]

MS10

A Joint Routing and Speed Optimization Problem

We study a variation of the classical vehicle routing prob-lem with time-windows, where the overall speed used totraverse an arc is also a decision variable. These can beused to ensure that time windows are respected by effec-tively speeding up the vehicle along an arc. The tradeoffis that the objective function will also have terms whichwill take into account a (nonlinear) convex function of thespeed, which can be used to model transportation modelswith energy savings and emission reduction considerations.It is worth noting that the speed optimization problem byitself is easy to solve. In other words, given a fixed route,there is an efficient algorithm to find the best set of speedsto use to traverse the route. On the other hand, if theroutes are not fixed, the problem becomes the joint routingand speed optimization problem, which is extremely chal-lenging to solve. Previous approaches involved a heuris-tic discretization of speeds and formulating the problemas a mixed-integer linear program, as well as formulatingthe exact problem as a mixed-integer nonlinear (convex)program. We propose a new set partitioning formulationfor the problem, with a new pricing scheme which involveskeeping track of the last time window that was hit. We testour formulation and algorithm on a variety of instances inthe literature. Our algorithm is capable of solving instancesof size much larger than previously reported.

Ricardo FukasawaUniversity of [email protected]

Qie HeUniversity of [email protected]

Fernando SantosUniversidade Federal de [email protected]

Yongjia SongVirginia Commonwealth [email protected]

MS10

Some Cut-Generating Functions for Second-Order

Page 37: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

126 OP17 Abstracts

Conic Sets

We study cut generating functions for conic sets. Our firstmain result shows that if the conic set is bounded, then cutgenerating functions for integer linear programs can easilybe adapted to give the integer hull of the conic integerprogram. Then we introduce a new class of cut gener-ating functions which are non-decreasing with respect tosecond-order cone. We show that, under some minor tech-nical conditions, these functions together with integer lin-ear programming-based functions are sufficient to yield theinteger hull of intersections of conic sections in R

2.

Asteroide SantanaGeorgia Institute of [email protected]

Santanu S. DeyGeorgia Institute of TechnologySchool of Industrial and System [email protected]

MS10

Vehicle Routing Problems with Time Windows andConvex Node Costs

We consider a variant of the vehicle routing problems withtime windows, where the objective includes the inconve-nience cost modeled by a convex function on each node.We propose a novel set partitioning formulation for thismixed integer convex program using the underlying blockstructure of the solution, by considering all possible combi-nations of routes and block structures. In particular, givena route r = (r1, r2, . . . , rn), we define a sequence of cus-tomers B = {rb, rb+1, . . . , rb+nb

}(1 ≤ b ≤ b + nb ≤ n) asblock, if there is no waiting between any of these customers.Based on the convexity of the node cost function, thisblock structure enables a recursive approach to calculatethe optimal cost of a route, which enables separability inthe labeling algorithm. This allows an efficient dominancechecking between paths by only solving one dimensionalconvex optimization problems instead of any higher dimen-sional convex optimization problems. In our computationalstudy, we show the effectiveness of our branch-and-price al-gorithm compared to state-of-the-art mixed integer convexprogramming solvers on a set of vehicle routing problemswith time windows and quadratic inconvenience costs.

Yongjia SongVirginia Commonwealth [email protected]

Qie HeUniversity of [email protected]

MS10

A Polyhedral Approach to Optimal Control ofSwitched Linear Systems

Consider a discrete-time dynamic system xk+1 = Mkxk fork = 0, 1, . . . , T . The matrix Mk can be chosen as one ofthe given two matrices A or B. The matrix selection prob-lem is defined as follows: given an initial vector x0, selectmatrices M1, . . . ,MT such that a linear function over thestates x1, . . . , xT is minimized. This problem is motivatedby designing combination therapies for cancer treatment,and is closely related to the matrix mortality problem andswitched linear systems studied by the control community.

We present a poly-time dynamic programming algorithmfor the case when A and B commute. We then formulatethe general problem as a mixed-integer linear program, andderive several families of facet-defining inequalities for theresulting formulation.

Zeyang WuUniversity of [email protected]

MS11

Exponentially Accurate Temporal Decompositionfor Long Horizon Dynamic Optimization Problem

We investigate a temporal decomposition approach to long-horizon dynamic optimization problems. The problems arediscrete-time, linear, time-dependent and with bounds onthe control variables. We prove that an overlapping do-mains temporal decomposition, while inexact, approachesthe solution of the long-horizon dynamic optimizationproblem exponentially fast in the size of the overlap. Theresulting subproblems share no solution information andthus can be computed with trivial parallelism. Our find-ings are demonstrated with a small, synthetic productioncost model with real demand data.

Mihai Anitescu, Wanting XuUniversity of [email protected], [email protected]

MS11

Accelerating Newtons Method for NonlinearParabolic Control via Reduced-Order Modeling

We use projection based reduced order models (ROMs)to substantially decrease the computational cost of New-ton’s method for large-scale time-dependent optimal con-trol problems. Previous approaches of using projectionbased ROMs in optimization solve sequences of compu-tationally less expensive optimization problems, which areobtained by replacing the high fidelity discretization of thegoverning partial differential equation (PDE) by a ROM.This may not be feasible or efficient for several nonlinearPDEs because 1) ROMs may not be well-posed because theROM solution does not satisfy properties obeyed by the so-lution of the high-fidelity PDE discretization, such as con-servation properties, or non-negativity, 2) ROMs can be-come untrustworthy when the controls differ slightly fromthose at which the ROM was generated, or 3) validationthat a ROM is still sufficiently accurate at a new controlcan be very expensive. We use high-fidelity PDE discretiza-tions for objective function and gradient evaluation, anduse state PDE and adjoint PDE solves to generate a ROMto compute approximate Hessian information. The ROMgeneration is computationally inexpensive, since it reusesinformation that is already computed for objective functionand gradient evaluation, and the ROM-Hessian approxima-tions are inexpensive to evaluate. The resulting methodoften enjoys convergence rates similar to those of Newton’smethod, but at the computational cost per iteration of thegradient method.

Matthias HeinkenschlossDepartment of Computational and Applied MathematicsRice [email protected]

Caleb C. MagruderRice University, Computational and Applied Mathematics

Page 38: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 127

[email protected]

MS11

New Multilevel Strategies in Optimal FeedbackControl

The application of model-based optimization methods hasbecome indispensable in science and engineering. Since allmodels contain systematic errors, the result of optimiza-tion can be very sensitive to such errors and even useless.Therefore, application of optimization to real-life processesdemands taking into account uncertainties. One of theremedies is Nonlinear Model Predictive Control (NMPC),which is based on two steps: a simultaneous on-line esti-mation of the system state and parameters (Moving Hori-zon Estimation, MHE) and re-optimization of the optimalcontrol for the current parameter and state values. Thechallenge is to solve these optimal control problems withhigh frequency in real time. During the last decade signif-icant progress has been made in development of so-calledmultilevel real-time iterations to reduce the response timeto a few milliseconds compared with standard methods.However, todays state-of-the-art is to perform MHE andNMPC separately. A next logical step is the developmentof a simultaneous MHE and NMPC in one step. This re-quires both the theoretical investigation of joint sensitivity,conditioning and stability analysis of coupled MHE andNMPC and an efficient generalization of the existing algo-rithmic approaches. In this talk we present combined mul-tilevel real-time iterations based on coupling of the MHEand NMPC with inexact Newton methods which allows forfurther reduction of response times.

Ekaterina Kostina, Gregor KriwetHeidelberg [email protected],[email protected]

Hans Georg BockIWR, University of [email protected]

MS11

An Online Experimental Design Approach for FastDetection of Extrasystoles

Networks described as graphs model many phenomena inthe real world. Confronted with a spreading signal on thegraph we are often interested in the (unknown) location ofits origin. The available information are usually the arrivaltimes of the signal at certain locations/nodes in the graph(measurement points). The shortest paths and linear re-gression are combined to estimate the source. This is a firststep, but where should the measurements be placed to gainmaximal information? What are requirements to have aunique estimate of the source? These questions will be ad-dressed in our framework of linear regression on a graph.Based on these considerations an algorithm is proposedthat places measurements iteratively on the nodes of thegraph and gives a best guess for the source node in everyiteration. Our test setting is derived from a heart disease(ventricular tachycardia), where a single source on one ofthe heart chambers initiates ill-timed heart beats. Treat-ing the disease requires the exact source location beforeablating it. After the heart chamber mesh is abstracted toa graph, our new method is applied. The talk concludeswith well-performing simulation results.

Tobias Weber

Otto-von-Guericke Universitat [email protected]

Sebastian SagerOtto-von-Guericke Universitat Magdeburg, [email protected]

MS12

Sampling Stochastic Dynamic Programming forMid-Term Hydropower Optimization

Rio Tinto Alcan (RTA) is a multinational aluminium pro-ducer with smelters in Quebec, Canada. RTA also ownsand operates power houses on the Peribonka and SaguenayRivers which supply energy to the smelters. The system,which is run by RTA’s Quebec Power Operations Division,consists of 6 generating stations and 4 reservoirs, for aninstalled capacity of 2900 MW. One of the key decisionsthat have to be made for effective operation of this systemis to determine the volume of water release per week atall generating stations. Stochastic Dynamic Programming(SDP) has been widely used for dealing with this kind ofoptimization problem. This optimization method can beapplied on non-convex stochastic problems. However, thequality of the solution depends on several factors that mustbe carefully taken into account. For example, the tempo-ral decomposition in SDP does not allow for modelling longtime correlation in the inflows. When the system is com-posed of more than two reservoirs, the computation timebecomes an issue and the way the cost-to-go function inSDP is calculated and interpolated as a huge impact. Thispresentation will show how we customized the SDP algo-rithm to get robust and operational software. In particular,we will show how to extend SDP to Sampling SDP for deal-ing with long-time correlation and how to solve efficientlythe cost-to-go function in Sampling SDP.

Pascal CoteRio [email protected]

James [email protected]

Kenjy DemeesterEcole Polytechnique [email protected]

MS12

Optimal Control in Hydropower Management witha Simulation and Regression Approach to DynamicProgramming

Hydropower management presents complex control prob-lems. Considering such problems as a series of real options,we develop a least-squares Monte Carlo, dynamic program-ming approach. The approach is applied to find optimalstorage and production decisions associated to two of Ri-oTinto’s installations in Canada.

Michel DenaultHEC [email protected]

MS12

Direct Policy Search Gradient Method using Inflow

Page 39: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

128 OP17 Abstracts

Scenarios

In the field of water reservoir operation design, stochasticdynamic programming (SDP) has been extensively used.This method imposes a time decomposition of the stochas-tic process to build the value functions. Thus, achievinga good modeling of the complex spatio-temporal correla-tion of the streamflow is tricky. The Direct Policy Searchmethods (DPS) methods optimize directly into the policyparameter within a given family of functions. The actualparametrization is evaluated by simulating over an ensem-ble of scenarios. Thus, no model for the random process isneeded and the stochastic process is implicitly representedthrough simulation. So far, DPS is still slow to convergebecause it used with inefficient optimization methods suchas evolutionary algorithms. We propose a method usinga gradient estimation to perform the optimization step.The gradient is estimated using the Reinforce algorithm.This method does not require the derivative of the objec-tive function but only the derivative of the policy function.The policy function used in our case is radial basis func-tions network. Sensibility of the DPS and improvement ofthe method are presented based on the real case of Kemano(British-Colombia, Canada). A comparison to SDP is per-formed and results suggest superiority of the DPS thanksto the better uncertainty modeling, with computation timeof the same order of magnitude.

Quentin DesreumauxUniversity of [email protected]

Pascal CoteRio [email protected]

Robert LeconteUniversity of [email protected]

MS12

Numerical Assessment of an Adaptive SimplicialApproximation Method for Hydroelectric Planning

We present a novel approach for approximate stochastic dy-namic programming (ASDP) over a continuous state spacewhen the optimization phase has a near-convex structure.The approach entails a simplicial partitioning of the statespace. Bounds on the true value function are used to re-fine the partition. The purpose of our method is to per-mit a user-chosen balance between computational burdenand accuracy. Features of the proposed ASDP scheme are:(i) sample state grid is constructed on-line on the basis ofoptimality gap estimation; (ii) for revenue maximization,best piecewise linear concave fit to grid point evaluations;(iii) for each grid point, optimization using an easily com-putable generalized linear program; and (iv) value functionconverges to its concave envelope. The proposed methodis suited for multi-echelon problems with stocks and flowsembedded in a network structure, with possibly stochasticinflows and outflows, such as mid-term operations plan-ning in an arborescent hydraulic energy production system.We empirically examine trade-offs between complexity andaccuracy where complexity has two dimensions: numberof reservoirs n and number of grid points κ. We presentnumerical tests to investigate (i) accuracy vs. state griddensity, and (ii) accuracy vs. state space dimension. Theapproach is experimented on several configurations of hy-

droenergy systems.

Luckny ZephyrCornell [email protected]

Pascal Lang, Bernard F. LamondUniversite [email protected], [email protected]

Pascal CoteRio [email protected]

MS13

Nonconvex Sparse Poisson Intensity Reconstruc-tion for Time-Dependent Tomography

While convex optimization for low-light imaging has re-ceived some attention by the imaging community, non-convex optimization techniques for photon-limited imag-ing are still in their nascent stages. In this talk, wepropose a novel stage-based non-convex approach to re-cover high-resolution sparse signals from low-dimensionalmeasurements corrupted by Poisson noise. We incorpo-rate gradient-based information to construct a sequence ofquadratic subproblems with a nonconvex �p-norm (0 ≤ p <1) penalty term to promote sparsity. We demonstrate theeffectiveness of the proposed approach by solving two time-dependent medical imaging applications: bioluminescencetomography and fluorescence lifetime imaging.

Lasith Adhikari, Arnold D. KimApplied Mathematics UnitUniversity of California, [email protected], [email protected]

Roummel F. MarciaUniversity of California, [email protected]

MS13

Kernel-Based Reconstruction Algorithm forAnatomically-Aided Optical Tomography

Abstract not available at time of publication.

Reheman BaikejiangUniversity of California, [email protected]

MS13

Extended Gauss-Newton-Type Methods for a Classof Matrix Optimization Problems

We develop a Gauss-Newton (GN) framework for solv-ing a class of nonconvex optimization problems involv-ing low-rank matrix variables. As opposed to standardGN method, our framework allows one to handle generalsmooth convex cost function via its surrogate. The maincomplexity-per-iteration consists of the inverse of two rank-size matrices and at most six small matrix multiplicationsto compute a closed form Gauss-Newton direction, and abacktracking linesearch. We show, under mild conditions,that the proposed algorithm globally and locally convergesto a stationary point of the original nonconvex problem.We also show empirically that the Gauss-Newton algorithmachieves much higher accurate solutions compared to the

Page 40: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 129

well studied alternating direction method (ADM). Then,we specify our Gauss- Newton framework to handle thesymmetric case and prove its convergence, where ADMis not applicable without lifting variables. Next, we in-corporate our Gauss- Newton scheme into the alternatingdirection method of multipliers (ADMM) to design a GN-ADMM algorithm for solving the low-rank optimizationproblem. We prove, under a proper choice of the penaltyparameter, that our GN-ADMM globally converges to astationary point. Finally, we apply our algorithms to solveseveral problems in practice such as matrix completion,robust low-rank matrix recovery, and matrix recovery inquantum tomography. The experiments provide encourag-ing results to motivate the use of nonconvex optimization.

Quoc Tran-DinhDepartment of Statistics and Operations ResearchUNC-Chapel Hill, [email protected]

Zheqi ZhangDepartment of Statistics and Operations ResearchUNC-Chapel [email protected]

MS14

Adaptive Sampling Strategies for Stochastic Opti-mization

Variance reduction techniques for stochastic gradientmethods have recently received much attention. In thistalk, we present an optimization method that achieves vari-ance reduction by controlling sample sizes. It includes atest for detecting the transient behavior of the stochasticgradient (SG) method, and begins increasing sample sizesonly after the initial phase of SG is complete. From thispoint forward, sample sizes are determined at every itera-tion by a new condition proposed in this talk. We presentnumerical experiments on logistic regression problems.

Raghu BollapragadaDepartment of Industrial Engineering and ManagementSciencesNorthwestern [email protected]

Richard H. ByrdUniversity of [email protected]

Jorge NocedalDepartment of Electrical and Computer EngineeringNorthwestern [email protected]

MS14

Exact and Inexact Subsampled Newton Methodsfor Optimization

We study the solution of stochastic optimization problemsin which approximations to the gradient and Hessian areobtained through subsampling. We show how to coordi-nate the accuracy in the gradient and Hessian to yield asuperlinear rate of convergence in expectation. We alsoconsider inexact Newton methods and investigate what isthe most effective linear solver in terms of computationalcomplexity.

Jorge Nocedal

Department of Electrical and Computer EngineeringNorthwestern [email protected]

Raghu BollapragadaDepartment of Industrial Engineering and ManagementSciencesNorthwestern [email protected]

Richard H. ByrdUniversity of [email protected]

MS14

Random Projections for Faster Second Order Con-strained Optimization Methods

In this talk we consider random projection methods forsolving convex optimization problems with a convex con-straint. First, we will describe a geometric relation betweencomplexity and optimality of random projection methodsand information theoretical lower bounds. Then, we willpresent a novel method, which iteratively refines the solu-tions subject to constraints and achieves statistical opti-mality. We then generalize our method to optimizing ar-bitrary convex functions of a large data matrix subject toconstraints. The proposed method is a faster randomizedversion of the well-known Newton’s Method. We show thatthe proposed method enables solving large scale optimiza-tion and statistical inference problems orders-of-magnitudefaster than existing methods.

Mert PilanciDepartment of MathematicsStanford [email protected]

MS14

The Role of Over-Parametrization in Optimizationwith a Focus on Deep Learning

Optimization of a real valued non-convex function over ahigh dimensional space can be a challenge, however recentresults in spin glass theory provide us with insight: Auffin-ger et. al., ”Complexity of random smooth functions onthe high-dimensional sphere”, 2013 shows that the localminima of the Hamiltonian of a spherical spin system con-centrate at a level that is slightly above its ground state.This concentration can empirically be observed even whenthere are finitely many spins. We demonstrate that thegradient descent finds those points whose energy is at theconcentration level. We then move on to the experimentsthat observe a similar concentration when optimizing theloss functions of deep learning. In particular, we observethat the solutions found by both noisy and noise-less algo-rithms lead the system to similar levels for both the testand training loss values. We furthermore demonstrate thatthis kind of concentration is observed only when the sys-tem is large enough, in other words, when the system isover-parametrized.

Levent SagunCourant Institute of Mathematical SciencesNYU

Page 41: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

130 OP17 Abstracts

[email protected]

MS15

SVS-Free Algorithms for Low-Rank Matrix Recov-ery

We propose a non-convex algorithmic approach for recover-ing low-rank matrices from a small number of linear obser-vations (samples). We demonstrate that our approach: (i)achieves optimal sample complexity; (ii) exhibits nearly-linear time complexity for constant-rank matrices with noextra spectral assumptions; and (iii) is stable to noise andmodel mismatch. To the best of our knowledge, this is thefirst approach that achieves all three of these properties.A distinction from existing methods is that our algorithmdoes not involve a full singular value decomposition (SVD),and instead uses a careful synthesis of techniques from ran-domized low-rank approximation schemes. Joint work withPiotr Indyk and Ludwig Schmidt.

Chinmay HegdeIowa State [email protected]

MS15

Nonconvex Proximal Primal Dual Methods

We propose a new decomposition approach named theproximal primal-dual algorithm (Prox-PDA) for smoothnonconvex linearly constrained optimization problems.The proposed approach is a primal-dual scheme, wherethe primal step minimizes certain approximation of theaugmented Lagrangian of the problem, and the dual stepperforms an approximate dual ascent. The approximationused in the primal step is able to decompose the variableblocks, making it possible to obtain simple subproblems byleveraging the problem structures. Theoretically, we showthat whenever the penalty parameter in the augmentedLagrangian is larger than a given threshold, the Prox-PDAconverges to the set of stationary solutions, globally andin a sublinear manner (i.e., certain measure of stationar-ity decreases in the rate of O(∞/∇), where r is the it-eration counter). Interestingly, when applying a variantof the Prox-PDA to the problem of distributed noncon-vex optimization (over a connected undirected graph), theresulting algorithm coincides with the popular EXTRA al-gorithm [Shi et al 2014], which is only known to work inconvex cases. Our analysis implies that EXTRA and afew of its variants converge globally sublinearly to station-ary solutions of certain nonconvex distributed optimizationproblem. There are many possible ways to generalize theProx-PDA, and we present one such generalization and ap-ply it to certain nonconvex distributed matrix factorizationproblem.

Mingyi HongIowa State [email protected]

MS15

Stochastic Gradient Descent for Nonconvex Prob-lems

Abstract not available at time of publication.

George LanDept of ISEUniversity of Florida

[email protected]

MS15

Distributed Stochastic Second-Order Method forTraining Large Scale Deep Neural Networks

Training deep neural network is a high dimensional and ahighly non-convex optimization problem. Stochastic gra-dient descent (SGD) algorithm and it’s variations are thecurrent state-of-the-art solvers for this task. However, dueto non-convexity nature of the problem, it was observedthat SGD slows down near saddle point. Recent empir-ical work claims that by detecting and escaping saddlepoint efficiently, it’s more likely to improve training per-formance. With this objective, we revisit Hessian-free andQuasi-Newton methods for deep networks. We show thatthose methods obtain almost linear scaling and do not suf-fer from extensive communication when compared to dis-tributed SGD methods.

Xi HeLehigh [email protected]

Dheevatsa MudigereParallel Computing LaboratoryIntel Corporation, Bangalore, [email protected]

Mikhail SmelyanskiyIntel [email protected]

Martin TakacUniversity of [email protected]

MS16

Characterization of the Robust Isolated Calmnessfor a Class of Conic Programming

This talk is devoted to studying the robust isolated calm-ness of the Karush-Kuhn-Tucker (KKT) solution mappingfor a large class of interesting conic programming problems(including most commonly known ones arising from appli-cations) at a locally optimal solution. Under the Robinsonconstraint qualification, we show that the KKT solutionmapping is robustly isolated calm if and only if both thestrict Robinson constraint qualification and the second or-der sufficient condition hold. This implies, among others,that at a locally optimal solution the second order suffi-cient condition is needed for the KKT solution mapping tohave the Aubin property.

Chao DingNational University of [email protected]

Defeng SunDept of MathematicsNational University of [email protected]

Liwei ZhangDalian University of Technology

Page 42: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 131

[email protected]

MS16

Primal-Dual Algorithm for Symmetric Program-ming Problems with Nonlinear Objective Functions

We describe a primal-dual algorithm for symmetric pro-gramming problems with nonlinear objective functions. Aclass of nonlinear objective functions is described for whichthe complexity estimates are (qualitatively) the same as forthe case of linear objective functions. The class includesquantum entropy function arising in applications rangingfrom quantum physics to quantum information theory.

Leonid FaybusovichDepartment of MathematicsUniversity of Notre [email protected]

MS16

Inner and Outer Approximations of the Semidefi-nite Cone Using SD Bases and their Applicationsto Some NP-Hard Problems

Some of the authors in previous paper introduced asemidefinite basis which is a basis of the space of symmetricmatrices consisting of rank-1 semidefinite matrices. In thistalk, we present polyhedral inner and outer approximationsof the positive semidefinite cone and the completely posi-tive cone by using semidefinite bases. The conical hull of asemidefinite basis gives an inner approximation of the pos-itive semidefinite cone. If elements of a semidefinite basisare nonnegative matrices, it gives an inner approximationof the completely positive cone. Supporting hyperplanesat every point of a semidefinite basis configure an outerapproximation of these cones. Preliminary numerical ex-periments suggest that linear optimization problems overour polyhedral cones give a good upper and lower boundsof standard quadratic programming and randomly gener-ated doubly nonnegative programming.

Daigo NarushimaTsukuba [email protected]

Akihiro TanakaUniversity of TsukubaFaculty of Engineering, Information and [email protected]

Akiko YoshiseUniversity of TsukubaGraduate School of Systems and Information [email protected]

MS16

New Developments in Primal-Dual Interior-PointAlgorithms for Convex Optimization

We will discuss some of the recent developments in primal-dual interior-point algorithms for convex optimizationproblems in conic form as well as in some other forms. Forthese approaches we will analyze the advantages and disad-vantages of (i) utilizing only first-order information versusoccasionally some second-order information and also (ii)enforcing, to various degrees, primal-dual symmetry of the

underlying algorithms.

Levent TuncelUniversity of WaterlooDept. of Combinatorics and [email protected]

MS17

Tensor Eigenvalue Complementarity Problems

This talk discusses tensor eigenvalue complementarityproblems. Basic properties of standard and complemen-tarity tensor eigenvalues are given. We formulate tensoreigenvalue complementarity problems as constrained poly-nomial optimization. When one tensor is strictly copos-itive, the complementarity eigenvalues can be computedby solving polynomial optimization with normalization bystrict copositivity. When no tensor is strictly copositive,we formulate the tensor eigenvalue complementarity prob-lem equivalently as polynomial optimization by a random-ization process. The complementarity eigenvalues can becomputed sequentially. The formulated polynomial opti-mization can be solved by Lasserre’s hierarchy of semidef-inite relaxations. We show that it has finite convergencefor general tensors.

Jinyan FanDepartment of Mathematics, Shanghai Jiao TongUniversityShanghai 200240, P.R. [email protected]

Jiawang NieUniversity of California, San [email protected]

Anwa ZhouShanghai University, [email protected]

MS17

More NP-Hard Tensor Problems

We discuss exciting recent developments in the computa-tional complexity of various tensor problems: (i) Shitovhas shown that tensor rank over integers is undecidableand symmetric tensor rank is NP-hard, conjectured in ear-lier work of Hillar and Lim; (ii) Schaefer and Stefankovichave shown that determining tensor rank over any field Fis ∃F-complete, providing an alternative proof for the NP-completeness of tensor rank over finite fields and a refine-ment of its NP-hardness over other fields; (iii) Friedlandand Lim have shown that tensor nuclear norm is NP-hardfor tensors of order 3 or higher over reals, and for tensors oforder 4 or higher over complex (which remains true even ifrestricted to bi-Hermitian, bisymmetric, positive semidef-inite, nonnegative-valued 4-tensors). Time permitting, wewill discuss the complexity of more specific tensor prob-lems in optimization such as deciding self-concordance ofa convex function or membership problems in SOS andSOS-convex cones of quartic polynomials. We will suggestopen problems on the complexity of two newly discoverednotions — tensor nuclear rank and tensor network ranks.

Lek-Heng LimUniversity of Chicago

Page 43: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

132 OP17 Abstracts

[email protected]

MS17

Geometric Measure of Entanglement of SymmetricD-Qubits is Polynomial-Time Computable

We give a simple formula for finding the spectral norm ofd-mode symmetric tensor in two variables over the complexor real numbers in terms of the complex or real roots of acorresponding polynomial in one complex variable. Thisresult implies that the geometric measure of entanglementof symmetric d-qubits is polynomial-time computable. Wediscuss a generalization to d-mode symmetric tensor inmore than two variables.

Shmuel Friedlanduniversity of illinois at [email protected]

Li WangUniversity of Illinois at [email protected]

MS17

Real Eigenvalues of Nonsymmetric Tensors

This talk discusses the computation of real Z-eigenvalues ofnonsymmetric tensors. A general nonsymmetric tensor hasfinitely many Z-eigenvalues, while there may be infinitelymany ones for special tensors. We propose Lasserre typesemidefinite relaxation methods for computing such eigen-values. For every nonsymmetric tensor that has finitelymany real Z-eigenvalues, we can compute all of them; eachof them can be computed by solving a finite sequence ofsemidefinite relaxations.

Xinzhen ZhangDepartment of MathematicsTianjin [email protected]

Jiawang NieUniversity of California, San [email protected]

MS18

High-Order Taylor Expansions for CompressibleFlows

Sensitivity analysis for systems governed by partial differ-ential equations is now commonly used by engineers to as-sess performance modification due to parameter changes.A typical illustration concerns shape optimization proce-dures based on the adjoint method, used in aeronauticsto improve aerodynamic or structural performance of air-crafts. However, these approaches are usually limited tofirst-order derivatives and steady PDE systems, due to thecomplexity to extend the adjoint method to higher-orderderivatives and the associated reverse time integration. Al-ternatively, this work investigates the use of the directdifferentiation approach (continuous sensitivity equationmethod) to estimate high-order derivatives for unsteadyflows. We show how this method can be efficiently imple-mented in existing solvers, in the perspective of providinga Taylor expansion of the PDE solution with respect tocontrol parameters. Applications to optimization and un-

certainty estimation are finally considered.

Regis DuvigneauINRIA Sophia [email protected]

MS18

A Comparison of Regularization Methods forGaussian Processes

Gaussian Processes (GPs) are classical probabilistic modelsto represent the results of experiments on grids of points.They have numerous applications, in particular in nonlin-ear global optimization when the experiments (typicallyPDE simulations) are costly. GPs require the inversion ofa covariance matrix. There are many situations, in particu-lar optimization, when the density of experiments becomeshigher in some regions of the search space, which makes thecovariance matrix ill-conditionned, an issue which is han-dled in general through regularization techniques. Today,the need to better understand and improve regularizationremains. The two most classical regularization methods arei) pseudoinverse (PI) and ii) adding a small positive con-stant to the main diagonal (which is called nugget regular-ization). This work provides new algebraic insights into PIand nugget regularizations. It is proven that pseudoinverseregularization averages the output values and makes thevariance null at redundant points. On the opposite, nuggetregularization lacks interpolation properties but preservesa non-zero variance at every point. However, these two reg-ularization techniques become similar as the nugget valuedecreases. A new distribution-wise GP is then introducedwhich interpolates Gaussian distributions instead of datapoints and mitigates the drawbacks of pseudoinverse andnugget regularized GPs.

Rodolphe Le RicheEcole des Mines de St [email protected]

Hossein Mohammadi, Nicolas Durrande, Eric Touboul,Xavier BayEcole des Mines de [email protected], [email protected],[email protected], [email protected]

MS18

Bayesian Optimization Approaches to ComputeNash Equilibria

Game theory finds nowadays a broad range of applica-tions in engineering and machine learning. However, ina derivative-free, expensive black-box context, very fewalgorithmic solutions are available to find game equilib-ria. Here, we propose a novel Gaussian-process based ap-proach for solving games in this context. We follow a classi-cal Bayesian optimization framework, with sequential sam-pling decisions based on acquisition functions. Two strate-gies are proposed, based either on the probability of achiev-ing equilibrium or on the Stepwise Uncertainty Reductionparadigm. Practical and numerical aspects are discussedin order to enhance the scalability and reduce computa-tion time. Our approach is evaluated on several syntheticgame problems with varying number of players and decisionspace dimensions. We show that equilibria can be foundreliably for a fraction of the cost (in terms of black-boxevaluations) compared to classical, derivative-based algo-

Page 44: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 133

rithms.

Victor PichenyMIAT Universite de Toulouse and [email protected]

Mickael BinoisUniversity of [email protected]

Abderrahmane HabbalUNIVERSITY COTE D’AZUR UCA, UNS, CNRS andInria, [email protected]

MS18

Surrogates and Classification Approaches for Effi-cient Global Optimization (EGO) with InequalityConstraints

In this work, we compare the use of Gaussian Process (GP)models for the constraints [Schonlau 1997] with a classifica-tion approach relying on a Least-Squares Support VectorMachine (LS-SVM) [Suykens and Vandewalle 1999]. Wepropose several adaptations of the classification approachin order to improve the efficiency of the EGO procedure, inparticular an extension of the binary LS-SVM classifier tocome-up with a probabilistic estimation of the feasible do-main. The efficiencies of the GP models and classificationmethods are compared in term of computational complex-ities, distinguishing the construction of the GPmodels orLS-SVM classifier from the resolution of the optimizationproblem. The effect of the number of design parameters onthe numerical costs is also investigated. The approachesare tested on the optimization of a complex non-linearFluid-Structure Interaction system modeling a two dimen-sional flexible hydrofoil. Multi-design variables, definingthe unloaded geometry of the foil and the characteristicsof its elastic trailing edge, are used in the minimization ofthe foils drag, under constraints set to ensure minimal liftforce and prevent cavitation at selected boat-speeds.

Matthieu SacherIRENAVEcole [email protected]

Regis DuvigneauINRIA Sophia [email protected]

Olivier Le MaitreLaboratoire d’Informatique pour la Mecanique et [email protected]

Mathieu [email protected]

Elisa BerriniUniversite Cote d’Azur, INRIA, [email protected]

Frederic Hauville, Jascque-Andre AstolfiIRENAVEcole [email protected], jacques-

[email protected]

MS19

Ellipsoid Method vs Accelerated Gradient Descent

We propose a new method for unconstrained optimizationof a smooth and strongly convex function, which attainsthe optimal rate of convergence of Nesterovs acceleratedgradient descent. The new algorithm has a simple geomet-ric interpretation, loosely inspired by the ellipsoid method.We provide some numerical evidence that the new methodcan be superior to Nesterovs accelerated gradient descent.

Sebastien Bubeck, Yin Tat Lee, Mohit SinghMicrosoft [email protected], [email protected], [email protected]

MS19

A New Perspective on Boosting in Linear Regres-sion via First Order Methods

Boosting is one of the most powerful and popular tools inmachine learning/ statistics that is widely used in prac-tice. They work extremely well in a variety of applica-tions. However little is known about many of the statis-tical and computational properties of the algorithm, andin particular their interplay. We analyze boosting algo-rithms in linear regression from the perspective modernfirst-order methods in convex optimization. We show thatclassic boosting algorithms in linear regression, namely theincremental forward stagewise algorithm (FSe) and leastsquares boosting (LS-Boost-e), can be viewed as subgradi-ent descent to minimize the maximum absolute correlationbetween features and residuals. We also propose a mod-ification of FSe that yields an algorithm for the LASSO,and that computes the LASSO path. We derive novel com-prehensive computational guarantees for all of these boost-ing algorithms, which provide, for the first time, a precisetheoretical description of the amount of data-fidelity andregularization imparted by running a boosting algorithmwith a pre-specified learning rate for a fixed but arbitrarynumber of iterations, for any dataset.

Rahul [email protected]

Robert M. FreundMITSloan School of [email protected]

Paul GrigasUC [email protected]

MS19

An Optimal First-order Method Based on OptimalQuadratic Averaging

In a recent paper, Bubeck, Lee, and Singh introduced a newfirst order method for minimizing smooth strongly convexfunctions. Their geometric descent algorithm, largely in-spired by the ellipsoid method, enjoys the optimal linearrate of convergence. We show that the same iterate se-quence is generated by a scheme that in each iteration com-putes an optimal average of quadratic lower-models of the

Page 45: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

134 OP17 Abstracts

function. Indeed, the minimum of the averaged quadraticapproaches the true minimum at an optimal rate. This in-tuitive viewpoint reveals clear connections to the originalfast-gradient methods and cutting plane ideas, and leadsto limited-memory extensions with improved performance.

Scott RoyUniversity of [email protected]

Dmitriy DrusvyatskiyUniversity of WashingtonUniversity of [email protected]

Maryam FazelUniversity of WashingtonElectrical [email protected]

MS19

A Unified Convergence Bound for Conjugate Gra-dient and Accelerated Gradient

We propose a single potential quantity that decreases at aconstant rate on every iteration for both Nesterov’s accel-erated gradient method for smooth, strictly convex func-tions and linear conjugate gradient for convex quadraticfunctions. Analysis of this potential leads to the optimalconvergence rate for both methods, namely, accuracy of εis attained in time proportional to log(ε) and the squareroot of L/�, where L and � are the parameters of strictconvexity. The purpose of this analysis is to suggest newnonlinear extensions of conjugate gradient.

Stephen A. VavasisUniversity of WaterlooDept of Combinatorics & [email protected]

Sahar KarimiUniversity of [email protected]

MS20

Smoothing Technique for Nonsmooth CompositeMinimization with Linear Operator

We introduce and analyze an algorithm for the minimiza-tion of convex functions that are the sum of differentiableterms and proximable terms composed with linear oper-ators. The method builds upon the recently developedsmoothed gap technique. In addition to a precise con-vergence speed result, valid even in the presence of linearequality constraints, it allows an explicit treatment of thegradient of differentiable functions and is equipped witha line search. We are also studying the consequences ofrestarting the algorithm at a given frequency. These fea-tures are not classical for primal-dual methods and allowus to solve difficult large scale convex optimization prob-lems. We illustrate the performances of the algorithm onbasis pursuit, TV-regularized least squares regression andL1 regression problems.

Volkan [email protected]

Olivier Fercoq

Telecom-ParisTech, CNRS LTCIUniversite Paris-Saclay,75013, Paris, [email protected]

Quang [email protected]

MS20

Let’s Make Block Coordinate Descent Go Fast!

Block coordinate descent (BCD) methods are widely-usedfor large-scale numerical optimization. They are often idealfor large scale optimization problems because of their cheapiteration costs, low memory requirements, and amenabil-ity to parallelization. Three main algorithmic choices in-fluence the performance of BCD methods; the block par-titioning strategy, the block selection rule, and the blockupdate rule. In this work we explore all three of thesebuilding blocks and propose variations for each that canlead to significantly faster BCD methods. We (i) proposenew greedy block-selection strategies and provide a generalconvergence result that holds for general block partitioningand selection strategies; (ii) explore block update strategiesthat exploit problem structure or higher-order informationto yield faster local convergence rates; (iii) consider opti-mal active manifold identification and superlinear or finitetermination for problems with sparse solutions; and (iv)explore the use of message-passing to efficiently computeoptimal block updates for problems with a certain sparsitystructure. We support all of our findings with numericalresults for the classic machine learning problems of logisticregression, multi-class logistic regression, support vectormachines, and label propagation.

Julie Nutini, Issam Laradji, Mark SchmidtUniversity of British [email protected], issamou(at)cs.ubc.ca,[email protected]

MS20

Efficiency of Coordinate Descent on StructuredProblems

We present complexity bounds for a variant of the acceler-ated coordinate descent method and show how these esti-mates can be obtained using the technique of randomizedestimate sequences. Our results show that coordinate de-cent methods often outperform the standard fast gradientmethods on optimization problems with dense data. Inmany important situations, the computational expenses ofthe oracle and the method itself at each iteration of ourscheme are perfectly balanced (both depend linearly on di-mensions of the problem). As an application, we considerunconstrained convex quadratic optimization and problemsarising from the application of the smoothing technique tononsmooth problems. On some special problem instances,the provable acceleration factor with respect to the fastgradient method can reach the square root of the numberof variables.

Sebastian StichUniversite catholique de [email protected]

Yurii NesterovCOREUniversite catholique de Louvain

Page 46: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 135

[email protected]

MS20

Flexible Coordinate Descent

I will present a novel randomized block coordinate de-scent method for the minimization of a convex compos-ite objective function. The method uses (approximate)partial second-order (curvature) information, so that thealgorithm performance is more robust when applied tohighly nonseparable or ill conditioned problems. We callthe method Flexible Coordinate Descent (FCD). At eachiteration of FCD, a block of coordinates is sampled ran-domly, a quadratic model is formed about that block andthe model is minimized approximately/ inexactly to de-termine the search direction. An inexpensive line search isthen employed to ensure a monotonic decrease in the objec-tive function and acceptance of large step sizes. I presentpreliminary numerical results to demonstrate the practicalperformance of the method.

Rachael [email protected]

MS21

Ultrasound Tomography for Breast Cancer Detec-tion

We present imaging techniques for breast cancer detec-tion with ultrasound, which can be stated as a PDE-constrained optimization problem governed by the (visco-)acoustic wave equation. We discuss models and algorith-mic strategies for the following three steps towards cre-ating a tomographic image of the human breast tissues.First, the acquisition geometry of sending and receivingultrasound transducers is determined using A-optimal ex-perimental design with a randomized trace estimator. Toreduce the computational costs, we create initial modelsfrom ray-based time-of-flight tomography and utilize spa-tial symmetries to extrapolate a 3D experimental designfrom a set of 2D configurations. Second, the emitting trans-ducer elements acting as external pressure sources need tobe calibrated, which amounts in solving a linear-quadraticcontrol problem. In a third step, we discuss trust-regionmethods with inexactly computed derivatives to image thesound speed of the breast tissue. Here, the amount ofmeasurement data and frequency bandwidths are gradu-ally increased during the inversion subject to homogeniza-tion constraints. This enables us to increase the resolutionwithout losing the long-wavelength components of the im-age, and to improve the computational efficiency. We shownumerical examples using Salvus, an open-source packagefor full-waveform modeling and inversion based on PETScand Eigen.

Christian Boehm, Naiara Korta Martiartu, AndreasFichtnerETH [email protected], [email protected], [email protected]

MS21

On Optimal Experimental Design Problems forPDEs

Optimal experimental design (OED) tries to improve theaccuracy of parameter estimation procedures under the in-

fluence of measurement errors. In this presentation we ad-dress OED problems in which the underlying state dynam-ics are given by an elliptic or parabolic partial differentialequation. Improvements in the estimation accuracy areachieved by optimizing the experimental conditions, whichmay comprise initial conditions, volume or boundary con-trol inputs, and sensor locations. We address formulationsand algorithms for the solution of OED problems for PDEs,focusing on aspects related to the high dimension of thediscretized problem.

Roland HerzogTechnische Universitat [email protected]

MS21

Nonlinear Robust PDE Constrained OptimizationUsing a Second Order Approximation Techniqueand Model Order Reduction

We investigate a nonlinear constrained optimization prob-lem with uncertain parameters. Using a robust worst-caseformulation we obtain an optimization problem of bi-levelstructure. This type of problems are difficult to treat com-putationally and hence suitable solution methods have tobe developed. We propose and investigate a strategy uti-lizing a quadratic approximation of the robust formulationas a surrogate model combined with an adaptive strategyto control the introduced error. The developed methodis then applied to a shape optimization problem governedby a nonlinear partial differential equation with applica-tion in electric motor design. The goal is to optimize thevolume and position of the permanent magnet in the ro-tor of the machine while maintaining a given performancelevel. These quantities are computed from the magneticvector potentials given by the magnetostatic approxima-tion of Maxwell’s equation with transient movement of therotor. Utilizing the robust optimization framework we ac-count for uncertainties in material and production preci-sion. The robustification of the optimization problem leadto high computational cost. Model order reduction is apromising choice when working in the context of partialdifferential equations. By generating reliable reduced ordermodels with a posteriori error control the goal is to accel-erate the computation. Numerical results are presented tovalidate the presented approach.

Oliver Lass, Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected],[email protected]

MS21

Approximation and Reduction of Optimal ControlProblems in Infinite Dimension

This talk is concerned with the optimal control of nonlin-ear evolution equations in Hilbert spaces. The approachfits into the long tradition of seeking for slaving relation-ships between the small scales and the large ones but differby the introduction of a new type of manifolds to do so,namely the finite-horizon parameterizing manifolds intro-duced by Chekroun and Liu (2015). A control in feedbackform will be derived from the reduced HJB equation as-sociated with the corresponding optimal control problemfor the surrogate system. Rigorous error estimates will bealso derived and application to a delay equation will be dis-cussed. This is joint work with M. D. Chekroun (University

Page 47: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

136 OP17 Abstracts

of California, Los Angeles) and Axel Kroner (INRIA Saclayand CMAP, Ecole polytechnique, France).

Mickael ChekrounUniversity of California, Los AngelesDepartment of Atmospheric and Oceanic [email protected]

Axel KroenerINRIA Saclay and CMAP, Ecole [email protected]

Honghu LiuVirginia Tech, Department of [email protected]

MS22

Compact Representation of Quasi-Newton Matri-ces

Quasi-Newton methods for minimizing a continuously dif-ferentiable function generate a sequence of matrix approx-imations to the second derivative. Perhaps the most well-known class of quasi-Newton matrices is the Broyden fam-ily of updates. Two of the most widely used updatesfrom this family of updates include the Broyden-Fletcher-Goldfarb-Shanno update and the symmetric rank-one up-date. Byrd et al. (1994) formulated a compact matrix rep-resentation of these updates. In this talk, we extend andgeneralize this result to (i) the Davidon-Fletcher-Powell up-date (ii) the restricted Broyden class of updates, and (iii)the full Broyden class of updates.

Omar DeGuchyApplied Mathematics UnitUniversity of California, [email protected]

Jennifer ErwayWake Forest [email protected]

Roummel F. MarciaUniversity of California, [email protected]

MS22

An Enhanced Truncated-Newton Algorithm forLarge-Scale Optimization

I present a truncated-Newton algorithm for solving uncon-strained optimization problems. In many ways the methodis similar to other truncated-Newton methods: apply lin-ear CG to the Newton system, and terminate when it isdeemed appropriate. The first contribution of this workis the inclusion of a termination condition that allows fora stronger local convergence theory that does not requireany prior knowledge of the smallest eigenvalue of the Hes-sian matrix at a local minimizer. The second contributionis the use of a certain rank-1 modification to the Hessianthat allows for a stronger global convergence theory thatdoes not make any assumption about the boundedness ofthe Hessian matrices over the sequence of computed iter-ates. Numerical experience will be reported.

Daniel RobinsonJohns Hopkins University

[email protected]

MS22

Large-Scale Linear Algebra and its Role in Opti-mization

Numerical optimization has always needed reliable numer-ical linear algebra. Some worries go away when we usequad-precision floating-point (as is now possible with theGCC and Intel compilers). quad-MINOS has proved use-ful in systems biology for optimizing multiscale metabolicnetworks, and in computer experimental design for opti-mizing the IMSPE function with finite-difference gradients(confirming the existence of twin points). Recently quad-MINOS gave a surprising solution to the optimal controlproblem spring200. Quad versions of SNOPT and PDCOhave been developed in f90 and C++ respectively. Chang-ing a single line of source code leads to double-SNOPT orquad-SNOPT. The C++ PDCO can switch from double toquad at runtime.

Michael A. SaundersSystems Optimization Laboratory, Stanford [email protected]

MS22

Sequential Quadratic Programming Methods forNonlinear Optimization

In this talk, we discuss the development of the optimiza-tion software package DNOPT for solving dense, small- tomedium-scale nonlinear problems. The method is a se-quential quadratic programming (SQP) method that uti-lizes either a quasi-Newton approximation of the Hessian ofthe Lagrangian or exact second derivative information ina rank-enforcing, inertia-controlling active-set method tosolve the QP subproblems. Practical issues involving theformulation and implementation of the method will also beconsidered. Numerical results will be presented.

Elizabeth Wong, Philip E. GillUniversity of California, San DiegoDepartment of [email protected], [email protected]

MS23

Lp-Norm Regularization Algorithms for Optimiza-tion Over Permutation Matrices

Optimization problems over permutation matrices appearwidely in facility layout, chip design, scheduling, patternrecognition, computer vision, graph matching, etc. Sincethis problem is NP-hard due to the combinatorial nature ofpermutation matrices, we relax the variable to be the moretractable doubly stochastic matrices and add an Lp-norm(0 < p < 1) regularization term to the objective function.The optimal solutions of the Lp-regularized problem arethe same as the original problem if the regularization pa-rameter is sufficiently large. A lower bound estimation ofthe nonzero entries of the stationary points and some con-nections between the local minimizers and the permutationmatrices are further established. Then we propose an Lp

regularization algorithm with local refinements. The algo-rithm approximately solves a sequence of Lp regularizationsubproblems by the projected gradient method using a non-monotone line search with the Barzilai-Borwein step sizes.Its performance can be further improved if it is combinedwith certain local search methods, the cutting plane tech-

Page 48: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 137

niques as well as a new negative proximal point scheme.Extensive numerical results on QAPLIB show that our pro-posed algorithms can often find reasonably high qualitysolutions within a competitive amount of time.

Bo JiangSchool of Mathematical SciencesNanjing Normal [email protected]

Ya-Feng LiuAcademy of Mathematics and Systems ScienceChinese Academy of [email protected]

Zaiwen WenPeking UniversityBeijing International Center For Mathematical [email protected]

MS23

Joint Power and Admission Control: A Sparse Op-timization Perspective

In the first part of this talk, we shall consider the jointpower and admission control problem for a wireless networkconsisting of multiple interfering links where the channelstate information is assumed to be perfectly known. Thegoal is to support a maximum number of links at theirspecified signal to interference plus noise ratio (SINR) tar-gets while using a minimum total transmission power. Wefirst reformulate this NP-hard problem as a sparse �0-minimization problem and then approximate it by �1 con-vex approximation. We show that the �1 convex approx-imation can be equivalently reformulated as a linear pro-gram (without introducing any auxiliary variables) by ex-ploiting the special structure of the problem. We use thesolution of the above LP to iteratively remove strong inter-fering links (deflation). Numerical simulations show thatthe proposed approach compares favorably with the exist-ing approaches in terms of the number of supported links,the total transmission power, and the execution time. Inthe second part of this talk, we shall discuss some inter-esting extensions of the first part. For instance, instead ofdoing convex (linear programming) approximations, we cando non-convex approximations to improve the approxima-tion performance. If time is allowed, we shall also discusshow to extend the algorithm developed in the first part tothe imperfect channel state information scenario.

Ya-Feng LiuAcademy of Mathematics and Systems ScienceChinese Academy of [email protected]

MS23

Randomized Block Proximal Damped NewtonMethod for Composite Self-Concordant Minimiza-tion

We consider the composite self-concordant (CSC) min-imization problem, which minimizes the sum of a self-concordant function f and a (possibly nonsmooth) properclosed convex function g. The CSC minimization is thecornerstone of the path-following interior point methodsfor solving a broad class of convex optimization problems.It has also found numerous applications in machine learn-ing. The proximal damped Newton (PDN) methods havebeen well studied in the literature for solving this problem

and they enjoy a nice iteration complexity. Given that ateach iteration these methods typically require evaluatingor accessing the Hessian of f and also need to solve a prox-imal Newton subproblem, the cost per iteration can be pro-hibitively high for solving large-scale problems. To allevi-ate this difficulty, we propose a randomized block proximaldamped Newton (RBPDN) method for solving the CSCminimization. Compared to the PDN methods, the com-putational cost per iteration of RBPDN is usually signifi-cantly lower. The computational experiment on a class ofregularized logistic regression problems demonstrate thatRBPDN is indeed promising in solving large-scale CSCminimization problems. The convergence of RBPDN isalso analyzed

Zhaosong LuDepartment of MathematicsSimon Fraser [email protected]

MS23

A Distributed Semi-Asynchronous ADMM TypeAlgorithm for Network Traffic Engineering

In this work, we consider the traffic engineering problemin a large-scale hierarchical network arising in the nextgeneration cloud-based wireless networks. We propose adistributed semi-asynchronous algorithm for this problembased on the so-called block successive upper bound mini-mization method of multipliers (BSUM-M). Theoretically,we show that the proposed algorithm converges to theglobal optimal solution under some assumptions on the de-gree of network asynchrony. We illustrate the effectivenessand efficiency of the proposed algorithm by comparing itwith the state of the art commercial solvers in a networkedenvironment.

Zhi-Quan LuoUniverisity of MinnesotaMinneapolis, [email protected]

MS24

An Accelerated Proximal Method for MinimizingCompositions of Convex Functions with SmoothMaps

I will describe an accelerated algorithm for minimizingcompositions of finite-valued convex functions with smoothmappings. When applied to optimization problems havingan additive composite form, the algorithm reduces to themethod of Ghadimi and Lan. The method both realizes thebest known complexity bound in optimality conditions fornonconvex problems with bounded domain, and achievesan accelerated rate under standard convexity assumptions.A natural convexity parameter of the composition quanti-fies the transition between the two modes of convergence.I end by analyzing the effect of inexact proximal subprob-lem solves on global performance – an essential feature ofthe composite setting – and compare the resulting schemewith algorithms based on smoothing.

Courtney KemptonUniversity of [email protected]

Dmitriy DrusvyatskiyUniversity of WashingtonUniversity of Washington

Page 49: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

138 OP17 Abstracts

[email protected]

MS24

Nonasymptotic and Asymptotic Linear Conver-gence of an Almost Cyclic Shqp Dykstra’s Algo-rithm for Polyhedral Problems

We show that an almost cyclic (or generalized Gauss-Seidel) Dykstra’s algorithm which incorporates the SHQP(supporting halfspace- quadratic programming) strategycan achieve nonasymptotic and asymptotic linear conver-gence for polyhedral problems.

C.H. Jeffrey PangMathematics, National University of [email protected]

MS24

Whats the Shape of your Penalty?

Performance of machine learning approaches is strongly in-fluenced by choice of misfit penalty, and correct settingsof penalty parameters, such as the threshold of the Hu-ber function. These parameter are typically chosen usingexpert knowledge, cross-validation, or black-box optimiza-tion, which are time consuming for large-scale applications.We present a data-driven approach that simultaneouslysolves inference problems and learns error structure andpenalty parameters. We discuss theoretical properties ofthese joint problems, and present algorithms for their so-lution. We show numerical examples from the piecewiselinear-quadratic (PLQ) family of penalties.

Zheng Peng, Aleksandr AravkinUniversity of [email protected], [email protected]

MS24

Optimization in Structure from Motion

Estimating the accurate poses of cameras and locations of3D scene points from a collection of images obtained bythe cameras is a classic problem in computer vision, re-ferred to as ”structure from motion” (SfM). There are twocomponents in this pipeline: (a) estimating correspondingpoints between images, a.k.a., correspondence matching,(b) recovering 3D points and camera parameters from thecorresponding 2D image points, a.k.a., bundle adjustment.Both these components have their unique optimization for-mulations, which become particularly challenging in thepresence of noisy correspondences and as the scale of thedataset grows. We will touch upon the general optimiza-tion principles in these problems and discuss existing so-lutions along with their limitations. We will also point toexciting areas of opportunity for developing robust, large-scale solutions to these problems.

Karthik [email protected]

MS25

Pajarito: An Open-Source Mixed-Integer ConicSolver

Pajarito is an open-source solver written in Julia, in-tegrated with the JuliaOpt ecosystem, and accessiblethrough powerful algebraic modeling systems such as

JuMP, Convex.jl, and CVXPY. Pajarito solves mixed-integer convex optimization problems (MICPs), which havesome discrete variables but are convex when discretenessis relaxed. Unlike existing MICP solvers, which typicallyuse nonlinear branch-and-bound and solve continuous re-laxations at nodes, Pajarito performs iterative outer ap-proximation on an extended formulation. Specifically, itconstructs and refines a polyhedral relaxation in a higher-dimensional space, allowing it to converge more rapidlyto the optimal MICP solution. A key contribution is theuse of disciplined convex programming (DCP), an alge-braic modeling concept designed to access powerful conicsolvers via epigraph reformulations, to automate the con-struction of a conic extended formulation. Pajarito cansolve MICPs with any mixture of second-order, exponen-tial, and positive-semidefinite cones, and is easily exten-sible to other cones. We demonstrate that Pajarito isthe fastest open-source mixed-integer nonlinear solver andillustrate some applications of mixed-integer semidefiniteprogramming (MISDP).

Chris [email protected]

Miles LubinOperations Research CenterMassachusetts Institute of [email protected]

Juan Pablo VielmaSloan School of ManagementMassachusetts Institute of [email protected]

MS25

SCIP-SDP: A Framework for Solving Mixed-Integer Semidefinite Programs

Mixed-integer semidefinite programs arise in many applica-tions and several problem-specific solution approaches havebeen studied recently. In this talk, we present a genericbranch-and-bound framework for solving such problems.We will discuss results regarding strong duality and theexistence of relative interior points for SDPs appearing ina branch-and-bound tree and their implications for a prac-tical implementation of an MISDP-solver. Furthermore, wewill present techniques used to accelerate the solving pro-cess of SCIP-SDP. Finally, we will show numerical resultsexamining the occurrence of the Slater condition in practi-cal applications of mixed-integer semidefinite programmingand demonstrating the applicability of the framework aswell as the impact of the discussed techniques.

Tristan GallyTechnische Universitat [email protected]

Marc E. Pfetsch, Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected],[email protected]

MS25

Valid Quadratic Inequalities for Some Non-ConvexQuadratic Sets

In recent years, the generalization of Balas disjunctive cuts

Page 50: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 139

for mixed integer linear optimization problems to mixedinteger non-linear optimization problems has received sig-nificant attention. Among these studies, mixed integersecond order cone optimization (MISOCO) is a specialcase. For MISOCO one has the disjunctive conic cuts ap-proach. That generalization introduced the concept of dis-junctive conic cuts (DCCs) and disjunctive cylindrical cuts(DCyCs). Specifically, it showed that under some mild as-sumptions the intersection of those DCCs and DCyCs witha closed convex set, given as the intersection of a secondorder cone and an affine set, is the convex hull of the inter-section of the same set with a parallel linear disjunction.The key element in that analysis was the use of pencilsof quadrics to find close forms for deriving the DCCs andDCyCs. In this work we use the same approach to showthe existence of valid conic quadratic inequalities for hy-perboloids and non-convex quadratic cones when the dis-junction is defined by parallel hyperplanes. Also, for someof the cases, we show that for each of the branches of thosesets, which are convex, the intersections with the DCCs orDCyCs still provides the convex hull of the intersection ofthe branches with a parallel linear disjunction.

Julio GoezNorwegian School of [email protected]

Miguel F. AnjosMathematics and Industrial Engineering & GERADPolytechnique [email protected]

MS25

Mixed-Integer Sum of Squares: Computation andApplications

Sum of squares optimization has seen a flurry of researchand adoption in the control systems analysis over the past15 years. Many of these problems involve discrete deci-sions, which can be naturally modeled using mixed-integeroptimization. In this talk, we address computational as-pects of mixed-integer sum of squares optimization, includ-ing both algorithmic and end-user modeling aspects. Weapply these ideas to applications in optimal control and ex-perimental design, with an emphasis on producing optimal(or provably sub-optimal) solutions for practical instances.

Joey HuchetteOperations Research CenterMassachusetts Institute of [email protected]

MS26

A Bundle Method for Transient Constrained Opti-mization Problems

The inclusion of dynamic constraints is the nominal ob-jective of many optimization-based power system analy-ses. In current practice this is typically done off-line. Wepresent an approach for the on-line inclusion of dynamicconstraints in power grid optimization problems, and weapply it to transient security-constrained economic dis-patch (TSCED). The approach is based on an encapsu-lation that allows for the separate simulation of transientsystem behavior, provided the simulation is equipped toreturn the sensitivity of constraint functions to the op-timization problems. We define a bundle approach thatincorporates the successively obtained sensitivity informa-

tion to determine the solution of optimization problemswith dynamic constraints. Numerical experiments are con-ducted on the IEEE 39-Bus System and show the benefitsof the TSCED formulation.

Francois Gilbert, Shrirang Abhyankar, Hong ZhangArgonne National [email protected], [email protected],[email protected]

Mihai AnitescuArgonne National LaboratoryMathematics and Computer Science [email protected]

MS26

Towards Dual Control for Acute Myeloid Leukemia

Dual control is an extended feedback strategy comparedto NMPC in which a sequence of optimal control problemswith integrated OED are solved. For the parameter es-timation OED is considered in the NMPC framework toobtain a control which provides well identifiable model pa-rameters. The control strategy has two competitive tasks.Bringing the dynamic system into a desired state while si-multaneously exciting the system such that the performedmeasurements provide minimal parameter uncertainty dur-ing estimation. Published dual control approaches focus onproblem formulations with continuous measurement flow.We are interested in medical applications in which fast sam-pling rates are not necessary or available. In this settingthe time points for taking measurements are of great im-portance and therefore should be considered during op-timization. We present a new dual control algorithm inwhich the sampling decision is a further optimization vari-able due to a limited number of total measurements. Thenew algorithm is applied to a medical example. The math-ematical model describes the dynamics of leukocytes of pa-tients having acute myeloid leukemia. The model is usedto investigate Leukopenia which is a clinically importantside effect arising from the treatment with chemotherapy.The competitive tasks in this example are the reduction ofintervals in which an infection due to Leukopenia is highand small excitations of the leukocytes’ dynamics to iden-tify the unknown parameters.

Felix Jost, Kristine RinkeOtto-von-Guericke Universitat [email protected], [email protected]

Sebastian SagerOtto-von-Guericke Universitat Magdeburg, [email protected]

Thuy T. T. LeOtto-von-Guericke Universitat [email protected]

MS26

Novel Computational Framework for Solving Com-plex Constrained Optimal Control Problems

A novel computational framework is described for solv-ing complex constrained nonlinear optimal control prob-lems. The framework has a wide variety of applicationsin mechanical and aerospace engineering. The basis ofthe framework is the new class of hp-adaptive Gaussianquadrature methods that transcribe the continuous opti-mal control problem to a finite-dimensional nonlinear op-

Page 51: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

140 OP17 Abstracts

timization problem. The hp-adaptive methods have thefeature that high accuracy can be obtained with a signifi-cantly smaller mesh when compared with traditional fixed-order methods while accurately capturing nonsmoothnessor rapidly changing behavior. The hp-adaptive methodsemployed using advanced sparse nonlinear programming(NLP) solvers. The derivatives required by the NLP solversare obtained using a new approach to algorithmic differen-tiation where efficient derivative source code is producedthrough a method that combines operator overloading withsource transformation. The mathematical foundation ofthe framework is provided and examples are given thatdemonstrate the improvement over previously developedapproaches.

Anil RaoUniversity of FloridaMechanical and Aerospace [email protected]

MS26

Architectures for Multi-Scale Predictive Control

We analyze the structure of the Euler-Lagrange conditionsof a lifted long-horizon optimal control problem. The anal-ysis reveals that the conditions can be solved by using blockGauss-Seidel schemes and we prove that such schemes canbe implemented by solving sequences of short-horizon prob-lems. The analysis also reveals that a receding-horizoncontrol scheme is equivalent to performing a single Gauss-Seidel sweep. We also derive a strategy that uses adjointinformation from a coarse long-horizon problem to correctthe receding-horizon scheme and we observe that this strat-egy can be interpreted as a hierarchical control architecturein which a high-level controller transfers long-horizon in-formation to a low-level, short-horizon controller. Our re-sults bridge the gap between multigrid, hierarchical, andreceding-horizon control.

Victor ZavalaUniversity of [email protected]

Yankai CaoUniversity of [email protected]

MS27

Model Formulations and Solution Approaches forthe Hydropower Maintenance Scheduling Problem

Maintenance of generators is essential for the economicand reliable of operation of electrical power systems. Asgenerators under maintenance are typically unavailable forelectricity production, the generator maintenance schedul-ing problem (GMSP) should anticipate its impact on thesystem operation. Particularly in hydroelectric systems,the operation is affected by uncertain water inflows, non-linearities in power production functions and interdepen-dencies of the system variables in space and time. Inthis talk we present two MILP model formulations for theGMSP considering the specificities of the hydropower oper-ation, and we discuss solution approaches for this problem.

Jesus A. Rodriguez SarasPolytechnique [email protected]

Miguel F. AnjosMathematics and Industrial Engineering & GERADPolytechnique [email protected]

Guy DesaulniersMathematics and Industrial EngineeringGerad & Polytechnique [email protected]

Pascal CoteEnergie electriqueRio Tinto Alcan & [email protected]

MS27

Optimal Planning and Control of Large Scale Hy-droelectric Systems: Information Flows and Mod-eling Tools

Operation of large-scale hydroelectric systems involvesmultiple decisions by different groups at different plan-ning levels concerning when and how much water to re-lease and power to generate at different generating unitsin a multi-reservoir system. These decisions are difficultbecause they must simultaneously consider technical, eco-nomic and reliability objectives as well as organizationaland management boundaries. Several uncertainties mustalso be considered requiring the solution of stochastic prob-lems in both supply and demand. The operation of such acomplex system cannot be practically optimized by a sin-gle optimization model, no matter how sophisticated themodel nor the computing power used as there are too manyvariables and unknowns and organizational levels to dealwith. We present a Modeling and Information Flow Hierar-chy we have developed by closely working with operationsplanning engineers at BC Hydro. The framework is dividedinto several computationally and operationally manageablelevels; each provides information flows and feedbacks to theappropriate organizational and planning level, consistingof common measures such as resource prices, operationaltarget levels, and constraints. This information is derivedusing stochastic and deterministic optimization and simu-lation algorithms, is passed from one level to the other toaddress a different aspect or planning horizon of the systemranging from long term operations planning to real-timeoperations.

Ziad [email protected]

MS27

Stochastic Short-Term Hydropower OperationsPlanning

Short-term hydropower optimization is used on a daily ba-sis to plan water releases, reservoir volumes and turbines inoperation. This talk presents an optimization approach tosolve the short-term hydropower problem with uncertaininflows. Scenario trees are built based on inflows forecastsand a multistage stochastic program is used to solve theproblem. The scenario tree algorithm is based on the mini-mization of the nested distance and the multistage stochas-tic program is an innovative two-phase optimization pro-cess. Numerical results are presented for three hydropowerplants located in the province of Quebec, Canada.

Sara Seguin

Page 52: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 141

Universite [email protected]

Stein-Erik FletenNorwegian University of Science and [email protected]

Alois PichlerNTNU and University of [email protected]

Pascal CoteRio [email protected]

Charles AudetEcole Polytechnique de Montreal - [email protected]

MS27

Stochastic Co-Optimization of Transmission andDistribution-as-Microgrid Networks with Renew-ables and Energy Storage

The increase in renewable resource penetration in bulkpower systems introduces challenges in operations, due tothe inability to confidently predict resource availability.The result of inaccurate forecasting is a reduction in re-liability, with volatility in prices, line flows and voltageson the power transmission network. However, the abilityof microgrids to be operated in both interconnected andislanded modes shows potential to leverage these smallersystems to manage disturbances in the bulk power network,contributing to the integration of renewables in the powersystem. Therefore, more sophisticated tools are needed tooperate the network efficiently. In this talk, we will presentour in-progress work to co-optimize the interaction betweenthe main network and microgrids, in the presence of highrenewable penetration and distributed storage.

Luckny Zephyr, Jialin LuiCornell [email protected], [email protected]

C. Lindsay Andersonc lindsay [email protected]

MS28

Incremental Algorithms for Additive Convex CostOptimization and Learning

Motivated by machine learning problems over large datasets and distributed optimization over networks, we con-sider the problem of minimizing the sum of a large numberof convex component functions. We start with an overviewof this session by giving an introduction to incrementalgradient methods and decentralized estimation techniquesfor solving such problems, which process component func-tions sequentially in time. We first consider determinis-tic cyclic incremental gradient methods (that process thecomponent functions in a cycle, one at a time) and providenew convergence rate results under some assumptions. Wethen consider a randomized incremental gradient method,called the random reshuffling (RR) algorithm, which picksa uniformly random order/permutation and processes thecomponent functions one at a time according to this order(i.e., samples functions without replacement in each cy-

cle). We provide the first convergence rate guarantees forthis method that outperform its popular with-replacementcounterpart stochastic gradient descent (SGD). We finallyconsider incremental aggregated gradient methods, whichcompute a single component function gradient at each it-eration while using outdated gradients of all componentfunctions to approximate the global cost function gradi-ent, and provide new linear rate results. This is joint workwith Asu Ozdaglar and Pablo Parrilo.

Mert GurbuzbalabanMassachusetts Institute of [email protected]

MS28

Distributed Learning: Parameter Estimation forthe Exponential Family

In recent years, several distributed non-Bayesian algo-rithms have been proposed for the problem of distributedlearning. Stability and explicit non-asymptotic conver-gence rates have be been shown for general estimationproblems. Such convergence rates have been recently ex-tended to continuous set of parameter or hypothesis. Thisallows the definition of an optimization based family of dis-tributed algorithms for typical parameter estimation prob-lems. Specifically we will describe how such non-Bayesianmethods have ”nice” close form explicit algorithmic repre-sentation in terms of the natural parameters for distribu-tions in the exponential family.

Cesar A. UribeUniversity of Illinois at [email protected]

Angelia NedichIndustrial and Enterprise Systems EngineeringUniversity of Illinois at [email protected]

MS28

Global Convergence Rate of Proximal IncrementalAggregated Gradient Methods

We focus on the problem of minimizing the sum of smoothcomponent functions (where the sum is strongly convex)and a non-smooth convex function, which arises in reg-ularized empirical risk minimization in machine learningand distributed constrained optimization in wireless sen-sor networks and smart grids. We consider solving thisproblem using the proximal incremental aggregated gradi-ent (PIAG) method, which at each iteration moves alongan aggregated gradient (formed by incrementally updatinggradients of component functions according to a determin-istic order) and taking a proximal step with respect to thenon-smooth function. While the convergence properties ofthis method with randomized orders (in updating gradientsof component functions) have been investigated, this paper,to the best of our knowledge, is the first study that estab-lishes the convergence rate properties of the PIAG methodfor any deterministic order. In particular, we show thatthe PIAG algorithm is globally convergent with a linearrate provided that the step size is sufficiently small. Weexplicitly identify the rate of convergence and the corre-sponding step size to achieve this convergence rate. Ourresults improve upon the best known condition numberdependence of the convergence rate of the incremental ag-gregated gradient methods used for minimizing a sum of

Page 53: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

142 OP17 Abstracts

smooth functions.

Nuri D. [email protected]

Mert GurbuzbalabanMassachusetts Institute of [email protected]

Asu [email protected]

MS28

An Asynchronous Distributed Newton Method

We propose an asynchronous distributed Newton basedmethod for minimizing a summation of convex objectivefunctions. We show that the proposed method achievesat least linear rate of convergence and numerical studiesshows great performance over existing methods.

Ermin WeiNorthwestern [email protected]

Fatemeh Mansoorinorthwestern [email protected]

MS29

A Numerical Investigation of Sketching and Sub-sampling

The concepts of subsampling and sketching have recentlyreceived much attention by the optimization and statis-tics communities. In this talk, we focus on their numericalperformance, with the goal of providing new insights intotheir theoretical and computational properties. We payparticular attention to subsampled Newton and Newton-Sketch methods, as well as techniques for solving the New-ton equations approximately. Our tests are performed ona collection of optimization problems arising in machinelearning.

Albert BerahasDepartment of Engineering Sciences and AppliedMathematicsNorthwestern [email protected]

Raghu BollapragadaDepartment of Industrial Engineering and ManagementSciencesNorthwestern [email protected]

Jorge NocedalDepartment of Electrical and Computer EngineeringNorthwestern [email protected]

MS29

Gradient Descent Efficiently Solves Cubic-Regularized Non-Convex Quadratic Problems

We consider the minimization of non-convex quadratic

forms regularized by a cubic term, which exhibit multi-ple saddle points and poor local minima. Nonetheless, weprove that, under mild assumptions, gradient descent ap-proximates the global minimum to within ε accuracy inO(ε−1 log(1/ε)) steps for large ε and O(log(1/ε)) steps forsmall ε (compared to a condition number we define), withat most logarithmic dependence on the problem dimension.We show how to use our results to develop a Hessian-freealgorithm that finds second-order stationary points of gen-eral smooth non-convex functions. We complement ourtheoretical findings with experimental evidence corrobo-rating our bounds, suggesting they are tight up to sub-polynomial factors.

Yair CarmonStatistics and Electrical EngineeringStanford [email protected]

John C. DuchiStanford UniversityDepartments of Statistics and Electrical [email protected]

MS29

Block BFGS Methods

We introduce a quasi-Newton method with block updatescalled Block BFGS. We show that this method, performedwith inexact Armijo-Wolfe line searches, converges globallyand superlinearly under the same convexity assumptionsas BFGS. We also show that Block BFGS is globally con-vergent to a stationary point when applied to non-convexfunctions with bounded Hessian, and discuss other modi-fications for non-convex minimization. Numerical experi-ments comparing Block BFGS, BFGS and gradient descentare presented, as well as applications to machine learning.

Wenbo Gao, Donald GoldfarbColumbia [email protected], [email protected]

MS29

Sub-Sampled Newton Methods and Large-ScaleMachine Learning

A major challenge for large-scale machine learning, andone that will only increase in importance as we developmodels that are more and more domain-informed, involvesgoing beyond high-variance first-order optimization meth-ods. Here, we consider the problem of minimizing the sumof a large number of functions over a convex constraintset, a problem that arises in many data analysis and ma-chine learning applications, as well as many more tradi-tional scientific applications. For this class of problems,we establish improved bounds for algorithms that incorpo-rate sub-sampling as a way to improve computational effi-ciency, while maintaining their original convergence prop-erties. These methods exploit recent results from Random-ized Linear Algebra on approximate matrix multiplication.Within the context of second order methods, they providequantitative convergence results for variants of Newton’smethods, where the Hessian and/or the gradient is uni-formly or non-uniformly sub-sampled, under much weakerassumptions than prior work.

Fred Roosta, Michael MahoneyUC Berkeley

Page 54: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 143

[email protected], [email protected]

MS30

Matrix Completion, Saddlepoints, and GradientDescent

Matrix completion is a fundamental machine learning prob-lem with wide applications in collaborative filtering andrecommender systems. Typically, matrix completion aresolved by non-convex optimization procedures, which areempirically extremely successful. We prove that the sym-metric matrix completion problem has no spurious localminima, meaning all local minima are also global. Thusthe matrix completion objective has only saddlepoints anglobal minima. Next, we show that saddlepoints are easyto avoid for even Gradient Descent – arguably the simplestoptimization procedure. We prove that with probability 1,randomly initialized Gradient Descent converges to a lo-cal minimizer. The same result holds for a large class ofoptimization algorithms including proximal point, mirrordescent, and coordinate descent.

Jason LeeUniversity of Southern [email protected]

MS30

Regularized Hpe-Type Methods for Solving Mono-tone Inclusions with Improved Pointwise Iteration-Complexity Bounds

We discuss the iteration-complexity of new regularized hy-brid proximal extragradient (HPE)-type methods for solv-ing monotone inclusion problems (MIPs). The new (regu-larized HPE-type) methods essentially consist of instancesof the standard HPE method applied toregularizations ofthe original MIP. It is shown that its pointwise iteration-complexity considerably improves the one of the HPEmethod while approaches (up to a logarithmic factor) theergodic iteration-complexity of the latter method.

Renato C. MonteiroGeorgia Institute of TechnologySchool of [email protected]

MS30

Stochastic Reformulations of Linear Systems andFast Stochastic Iterative Method

We propose a new paradigm for solving linear systems. Inour paradigm, the system is reformulated into a stochas-tic problem, and then solved with a randomized algorithm.Our reformulation can be equivalently seen as a stochasticoptimization problem, stochastically preconditioned linearsystem, stochastic fixed point problem and as a probabilis-tic intersection problem. We propose and analyze basicand accelerated stochastic algorithms for solving the refor-mulated problem, with linear convergence rates.

Peter RichtarikUniversity of [email protected]

MS30

Serial and Parallel Coordinate Updates for Split-

ting Algorithms

Splitting algorithms are widely used in computationalmathematics because they individually handle smaller andsimpler parts of a problem that as a whole is more compli-cated. Many splitting algorithms can be abstracted by thefixed-point iteration xk+1 = T (xk), where T consists of asequence of operations. This talk discusses the prevailingcoordinate-friendly structure in T , with which one can up-date one, or a block of, components of x (while fixing the

others) at a much lower cost than computing T (xk). Basedon this structure, we develop coordinate-update splittingalgorithms for problems that current coordinate descentalgorithms do not apply, such as certain conic programs,portfolio optimization problems, as well as certain prob-lems with nonsmooth nonseparable functions. Theoreti-cally, we show that, when T has a fixed point and is non-expansive, it coordinate-update algorithm converges to afixed point (which is a solution to the original problem)under appropriate step sizes. This result applies to bothcyclic, random, and (async-)parallel selections of coordi-nates.

Wotao YinUniversity of California, Los [email protected]

MS31

Stochastic Optimization for Optimal Transport

Abstract not available at time of publication.

Marco CuturiENSAE / [email protected]

MS31

Adversarial Embedding for Compositional Stochas-tic Optimization and Reinforcement Learning

While reinforcement learning received much attention re-cent years to address Markov decision problems, most ex-isting reinforcement learning algorithms are either purelyheuristic or lack of efficiency. A key challenge boils down tothe difficulty with learning from conditional distributionswith limited samples and minimizing objectives involv-ing nested expectations. We propose a novel stochastic-approximation-based method that benefits from a freshemploy of Fenchel duality and kernel embedding tech-niques. To the best of our knowledge, this is the firstalgorithm that allows to take only one sample at a timefrom the conditional distribution and comes with prov-able theoretical guarantee. Finally, the proposed algorithmachieves the state-of-the-art empirical performances on sev-eral benchmark datasets comparing to the existing algo-rithms such as gradient-TD2 and residual gradient.

Niao HeUniversity of Illinois at [email protected]

MS31

Accelerating Stochastic Gradient Descent

There is widespread sentiment that it is not possible toeffectively utilize fast gradient methods (e.g. Nesterov’sacceleration, conjugate gradient, BFGS) for the purposesof stochastic optimization due to their instability and er-ror accumulation, a notion made precise in dAspremont

Page 55: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

144 OP17 Abstracts

’08, Devolder, Glineur, & Nesterov 14]. This work stronglyrefutes this folklore by showing that acceleration, in a pre-cise sense, is robust to statistical errors. In particular, thiswork provides an accelerated stochastic gradient descent al-gorithm that enjoys minimax optimal statistical estimationat a rate that is provably faster than stochastic gradient de-scent. The analysis involves introducing a novel technicallens of viewing accelerated stochastic gradient descent up-dates as a stochastic process, which may be of independentinterest.

Sham KakadeUniversity of [email protected]

MS31

Restarting Accelerated Gradient Methods with aRough Strong Convexity Estimate

We propose new restarting strategies for accelerated gradi-ent and accelerated coordinate descent methods. Our maincontribution is to show that the restarted method has ageometric rate of convergence for any restarting frequency,and so it allows us to take profit of restarting even when wedo not know the strong convexity coefficient. The schemecan be combined with adaptive restarting, leading to thefirst provable convergence for adaptive restarting schemeswith accelerated gradient methods. Finally, we illustratethe properties of the algorithm on a regularized logisticregression problem and on a Lasso problem.

Zheng QuThe University of Hong [email protected]

Olivier FercoqTelecom-ParisTech, CNRS LTCIUniversite Paris-Saclay,75013, Paris, [email protected]

MS32

Hybrid Optimal Control Problems for Partial Dif-ferential Equations

We provide optimality conditions and numerical results forhybrid optimal control problems governed by partial differ-ential equations. These problems involve switching timesto be optimized, at which the dynamics, the integral cost,and the bound constraints on the control can vary. Thedeveloped method to solve these problems is based on firstand second-order optimality conditions. It is applied todifferent type of partial differential equations, such as semi-linear parabolic equations, coupled systems or hyperbolicequations. Numerical illustrations are presented, whosequalitative aspects underlie the original motivations.

Sebastien CourtUniversity of GrazInstitut for Mathematics and Scientific [email protected]

Karl KunischKarl-Franzens University GrazInstitute of Mathematics and Scientific [email protected]

Laurent PfeifferUniversity of Graz

[email protected]

MS32

Optimal Control of PDEs in a Complex Space Set-ting; Application to the Schrodinger Equation

In this talk we discuss optimality conditions for abstractoptimization problems over complex spaces. We then applythese results to optimal control problems with a semigroupstructure. As an application we detail the case when thestate equation is the Schrodinger one, with pointwise con-straints on the bilinear control. We derive first and secondorder optimality conditions and address in particular thecase that the control enters the state equation and costfunction linearly.

M. Soledad AronnaEMAp/FGVRio de [email protected]

Frederic BonnansInria-Saclay and CMAP, Ecole [email protected]

Axel KroenerINRIA Saclay and CMAP, Ecole [email protected]

MS32

Binary Optimal Control of Semi Linear PDEs

We consider a specific combinatorial optimal control prob-lem governed by a semilinear elliptic PDE. The controlvariable is given by a vector of integers. In addition, theoptimization is subject to pointwise state constraints andpossibly additional control constraints of arbitrary combi-natorial structure. The objective is linear in the controlvariable. The particular structure of the optimal controlproblem under consideration allows to design an efficientouter approximation algorithm based on the decomposi-tion of the optimal control problem into an integer lin-ear programming (ILP) master problem and a subproblemfor calculating linear cutting planes. These cutting planesrely on the pointwise concavity of the PDE solution oper-ator in terms of the control variables, which we prove inthe case of PDEs with a non-decreasing convex nonlinearpart. The decomposition allows us to use standard solu-tion techniques for ILPs as well as for PDEs. We furtherbenefit from reoptimization strategies due to the iterativestructure of the algorithm. Experimental results show theefficiency of our approach.

Christoph Buchheim, Christian MeyerTU DortmundFaculty of [email protected],[email protected]

MS32

Optimal Control of a Regularized Phase Field Frac-ture Propagation Model

We consider an optimal control problem governed by aphase-field fracture model. One challenge of this modelproblem is a non-differentiable irreversibility condition onthe fracture growth, which we relax using a penalizationapproach. We then discuss existence of a solution to the

Page 56: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 145

penalized fracture model, existence of at least one solutionfor the regularized optimal control problem, as well as firstorder optimality conditions.

Ira NeitzelUniversitat BonnInstitut fur Numerische [email protected]

Thomas WickEcole [email protected]

Winnifried WollnerTU [email protected]

MS33

Parallel Algorithms for Robust PCA UsingMarginalization

Robust PCA and its variants are popular models for sepa-rating data into a low-rank model while accounting for out-liers. While there have been a number of algorithms pro-posed to solve the model, these algorithms do not performwell in a massively parallel environment such as the GPUdue to high-communication costs of critical steps, e.g., SVDor QR computations. We show how marginalizing over oneof the variables allows us to use Burer-Monteiro splittingtechniques and apply many of the techniques from low-rankrecovery literature. This approach works well on the GPUand the resulting algorithm is significantly faster than theprior state-of-the-art.

Stephen BeckerUniversity of Colorado at [email protected]

MS33

Gauge and Perspective Duality

Fenchel-Young duality is widely used in convex optimiza-tion and relies on the conjugacy operation for convex func-tions; however, alternative notions of duality relying onparallel operations exist as well. In particular, gaugeand perspective duality are defined via the polarity op-eration on nonnegative functions. We present a pertur-bation argument for deriving gauge duality, thus placingit on equal footing with Fenchel-Young duality. This ap-proach also yields explicit optimality conditions (analogousto KKT conditions), and a simple primal-from-dual re-covery method based on existing algorithms. Numericalresults confirm the usefulness of this approach in certaincontexts (e.g. optimization over PLQ functions).

Kellie MacPhee, Aleksandr AravkinUniversity of [email protected], [email protected]

James V. BurkeUniversity of WashingtonDepartment of [email protected]

Dmitriy DrusvyatskiyUniversity of WashingtonUniversity of [email protected]

Michael P. FriedlanderDept of Computer ScienceUniversity of British [email protected]

MS33

A Novel Methods for Saddle Point Prob-lems: Primal-Dual and Variational Inequality Ap-proaches

In this paper we propose a new method for monotone vari-ational inequalities. We prove its convergence, establish itsoptimal ergodic rate of convergence, and under some ad-ditional assumptions obtain R-linear rate of convergence.We also introduce a line search which allows to use largersteps. Surprisingly, the same idea can be used to derivea new primal-dual method. Some preliminary numericalexperiments are reported.

Yura MalitskyUniversity of [email protected]

MS33

Sketchy Decisions: Convex Low-Rank Matrix Op-timization with Optimal Storage

In this talk, we consider a fundamental class of convex ma-trix optimization problems with low-rank solutions. Weshow it is possible to solve these problem using far lessmemory than the natural size of the decision variable whenthe problem data has a concise representation. Our pro-posed method, SketchyCGM, is the first algorithm to offerprovable convergence to an optimal point with an optimalmemory footprint. SketchyCGM modifies a standard con-vex optimization method the conditional gradient methodto work on a sketched version of the decision variable, andcan recover the solution from this sketch. In contrast torecent work on non-convex methods for this problem class,SketchyCGM is a convex method; and our convergenceguarantees do not rely on statistical assumptions.

Madeleine R. UdellStanford UniversityStanford [email protected]

MS34

Multigrid Preconditioning for Space-Time Dis-tributed Optimal Control of Parabolic Equations

This work is concerned with designing optimal order multi-grid preconditioners for space-time distributed control ofparabolic equations. The focus is on the reduced prob-lem resulted from eliminating state and adjoint variablesfrom the KKT system. Earlier numerical experimentshave shown that our ability to design optimal order pre-conditioners depends strongly on the discretization of theparabolic equation, with several natural discretizationsleading to suboptimal preconditioners. Using a continuous-in-space-discontinuous-in-time Galerkin discretization weobtain the desired optimality.

Andrei DraganescuDepartment of Mathematics and StatisticsUniversity of Maryland, Baltimore County

Page 57: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

146 OP17 Abstracts

[email protected]

MS34

Flexible Decision Variables in Reservoir OperationUsing Dimension Reduction Approach

The optimization of reservoir operation involves variouscompeting objectives and it is essential to consider multipleobjectives simultaneously. There are also various sources ofuncertainty that should be considered as well. We considerthe question of unquantifiable uncertainty in the context ofproviding a decision maker with the largest possible set ofoptions which are feasible and potentially optimal. Wemodel a range of options with random variables, and thenattempt to maximize the variance in order to provide thelargest flexibility possible. This becomes a multi-objective,dynamic PDE-constrained optimization problem for an op-timal probability distribution of controls. We solve for thisflexible control distribution by first identifying a suitablereduced basis via KL expansion, and then finding the setof Pareto optimal joint probability distributions for the KLrandom variables.

Nathan GibsonOregon State [email protected]

Parnian HosseiniOregon State UniversityCivil [email protected]

Arturo LeonUniversity of HoustonDepartment of Civil and Environmental [email protected]

MS34

Recent Advances in Newton-Krylov Methods forNMPC

We present advances in Newton-Krylov (AKA continua-tion) methods for NMPC, pioneered by Prof. Ohtsuka.Our main results concern efficient preconditioning, leadingto dramatically improved real-time Newton-Krylov MPCoptimization. One idea is solving a forward recursion forthe state and a backward recursion for the costate ap-proximately, or reusing previously computed solutions, forthe purpose of preconditioning. Another ingredient is dis-covery of sparse factorizations of high-quality precondi-tioners. Altogether, our fast and efficient precondition-ing leads to optimal linear complexity in the number ofgridpoints on the prediction horizon, for iterative solutionof forward-difference Newton-Krylov NMPC. We suggestscenario/particle Newton-Krylov MPC in the case, wheresystem dynamics or constraints discretely change on-line,for ensembles of predictions corresponding to various sce-narios of anticipated changes and test it for minimum timeproblems. Simultaneous on-line computation of ensemblesof controls, for several scenarios of changing in real timesystem dynamics/constraints, allows choosing and adapt-ing the optimal destination. The Newton-Krylov MPC ap-proach is extended to the case, where the state is implicitlyconstrained to a smooth manifold, and demonstrate its ef-fectiveness for a problem on a hemisphere.

Andrew Knyazev

Mitsubishi Electric Research Laboratories (MERL)[email protected]

Alexander MalyshevUniv. of Bergen, [email protected]

MS34

Parameter Estimation for Nonlinear HyperbolicPDE Using Transient Flow Observations

We introduce a method for parameter estimation for non-linear hyperbolic partial differential equations (PDEs) ona metric graph that represent compressible fluid flow inpipelines subject to time-dependent boundary conditionsand nodal controls. Flow dynamics are described on eachedge by space- and time-dependent density and mass fluxthat evolve according to a system of coupled PDEs; mo-mentum dissipation is modeled using the Darcy-Wiesbachfriction approximation. Mass flow balance is maintainedby applying the Kirchhoff-Neumann boundary conditionsat network nodes, at which fluid may be injected or with-drawn from the network. Flow through the system may beactuated by augmenting density at the interface betweennodes and adjoining pipes. We focus here on estimationof the Darcy-Wiesbach friction factor using partial obser-vations of time-series for pressures and flows throughoutthe network. A rapid, scalable computational method forperforming a non-linear least squares fit is developed.

Kaarthik SundarLos Alamos National [email protected]

MS35

Structural Properties of Affine Sparsity Con-straints

We introduce a new constraint system for sparse variableselection in statistical learning. Such a system arises whenthere are logical conditions on the sparsity of certain un-known model parameters that need to be incorporated intotheir selection process. Formally, extending a cardinalityconstraint, an affine sparsity constraint (ASC) is definedby a linear inequality with two sets of variables: one set ofcontinuous variables and the other set represented by theirnonzero patterns. This paper aims to study an ASC sys-tem consisting of finitely many affine sparsity constraints.We investigate a number of fundamental structural prop-erties of the solution set of such a non-standard systemof inequalities, including its closedness and the descrip-tion of its closure, continuous approximations and theirset convergence, and characterizations of its tangent conesfor use in optimization. Our study lays a solid mathemati-cal foundation for solving optimization problems involvingthese affine sparsity constraints through their continuousapproximations.

Jong-Shi PangUniversity of Southern [email protected]

Hongbo DongAssistant ProfessorWashington State [email protected]

Miju AhuUniversity of Southern California

Page 58: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 147

[email protected]

MS35

Distributed Big-Data Optimization via Batch Gra-dient Tracking

We study distributed nonconvex bigdata optimization inmultiagent networks. Our interest is on big-data problemswherein there is a large number of variables to optimize.If treated by means of standard distributed optimizationalgorithms, these large-scale problems may be intractabledue to the prohibitive local computation and communica-tion burden at each node. We propose a novel distributedsolution method whereby at each iteration the agents up-date in an uncoordinated fashion only a block of the en-tire decision variables. To deal with the nonconvexity ofthe cost function, the novel scheme hinges on SuccessiveConvex Approximation (SCA) techniques coupled with i)a tracking mechanism instrumental to estimate locally gra-dient averages; and ii) a novel block-wise consensus-basedprotocol to perform local block-averaging operations andgradient tacking. Asymptotic convergence to stationarysolutions of the nonconvex problem is established. This isa joint work with (listed in alphabetic order): I. Notarni-cola, G. Notarstefano, and Y. Sun.

Gesualdo ScutariPurdue [email protected]

MS35

Stochastic Convex Optimization: Faster LocalGrowth Implies Faster Global Convergence

In this talk, I will present a new theory for first-orderstochastic convex optimization, showing that the globaliteration complexity is sufficiently quantified by a localgrowth rate of the objective function in a neighborhood ofthe optimal solutions. In particular, if the objective func-

tion F (w) in the ε-sublevel set grows as fast as ‖w−w∗‖1/θ2 ,where w∗ represents the closest optimal solution to w andθ ∈ (0, 1] quantifies the local growth rate, the iterationcomplexity of first-order stochastic optimization for achiev-

ing an ε-optimal solution can be O(1/ε2(1−θ)), which is op-timal at most up to a logarithmic factor. This result isfundamentally better in contrast with the previous worksthat either assume a global growth condition in the entiredomain or achieve a local faster convergence under the localfaster growth condition. I will also present several stochas-tic algorithms for achieving the aforementioned iterationcomplexity and consider applications in machine learning.

Tianbao YangUniversity of [email protected]

MS35

Pathwise Coordinate Optimization for NonconvexSparse Learning: Algorithm and Theory

Nonconvex optimization naturally arises in many machinelearning problems. Machine learning researchers exploitvarious nonconvex formulations to gain modeling flexibil-ity, estimation robustness, adaptivity, and computationalscalability. Although classical computational complexitytheory has shown that solving nonconvex optimization isgenerally NP-hard in the worst case, practitioners have pro-posed numerous heuristic optimization algorithms, which

achieve outstanding empirical performance in real-worldapplications. To bridge this gap between practice andtheory, we propose a new generation of model-based op-timization algorithms and theory, which incorporate thestatistical thinking into modern optimization. Particu-larly, when designing practical computational algorithms,we take the underlying statistical models into considera-tion. Our novel algorithms exploit hidden geometric struc-tures behind many nonconvex optimization problems, andcan obtain global optima with the desired statistics prop-erties in polynomial time with high probability.

Tuo ZhaoJohns Hopkins [email protected]

MS36

Optimal Control of Problems with Hysteresis

We discuss optimality conditions for control problems forordinary and parabolic equations which include nonsmoothnonlinearities of hysteresis type. These are obtained eitherby smooth approximation or by proving and utilizing re-sults on generalized (weak) differentiability of hysteresisnonlinearities.

Martin BrokateTechnische Universitat [email protected]

MS36

An Approach to the Differential Stability Analysisof Mosolov’s Problem

The aim of this talk is to study the differentiability proper-ties of the solution operator to the so-called Mosolov prob-lem. This is a simplified version of the classical Binghamflow problem that takes the form of an elliptic variationalinequality of the second kind and models the steady-statemotion of a viscous-plastic medium in a cylindrical pipe.We derive differentiability results under suitable structuralassumptions and show that the differential sensitivity anal-ysis of Mosolov’s problem is closely linked to the study ofcertain weighted Sobolev spaces. Some of the results onthe fine properties of Sobolev functions needed for our in-vestigation are also of independent interest.

Constantin ChristofTechnische Universitat [email protected]

Christian MeyerTU DortmundFaculty of [email protected]

MS36

Optimal Control of Time Discretized ContactProblems

Our presentation deals with the optimal control of prob-lems governed by time discretized, dynamic contact prob-lems in linear viscoelasticity with respect to distributedcontrols. We discuss some of the subtleties introduced intothe contact model by the linear non-penetration conditionas well as a suitable formulation of the set of admissiblestates in the optimal control of the resulting variational in-equality. A number of integrators for the time continuous

Page 59: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

148 OP17 Abstracts

form of dynamic contact problems allow for the analysisof corresponding time discretized problems. We employ atemporal finite element discretization, which correspondsto a contact implicit Newmark scheme. In the discretizedsetting, we obtain existence of a solution operator and, con-sequently, existence of minimizers under standard assump-tions on the cost functional. We address different notionsof capacity on the boundary of a domain and analyze dif-ferentiability properties of the solution operator. When thesolution operator to the time discretized contact problemis directionally differentiable and the controls are dense, wederive a set of first order optimality conditions. Solutionsto a modified adjoint equation can then be computed by aninverse time stepping scheme and allow for efficient compu-tation of search directions for the fully discretized systemin a simple, steepest-descent type optimization scheme.

Georg Muller, Anton SchielaUniversity of [email protected], [email protected]

MS36

Optimal Control of Thermoviscoplastic Systems

Elastoplastic deformations play a tremendous role in in-dustrial forming. Many of these processes happen at non-isothermal conditions. Therefore, the optimization of suchproblems is of interest not only mathematically but alsofor applications. In this talk we will present the analy-sis of an optimal control problem governed by a thermo-visco(elasto)plastic model. We will point out the difficul-ties arising from the nonlinear coupling of the heat equa-tion with the mechanical part of the model. Finally, wewill discuss first numerical results.

Ailyn Stotzner, Roland HerzogTechnische Universitat [email protected],[email protected]

Christian MeyerTU DortmundFaculty of [email protected]

MS37

Sums of Squares and Matrix Completion

I will review recent results on semidefinite matrix comple-tion and its relation with sums of squares. I will discusslow-rank completion, positive definite completion and ap-plications to Gaussian graphical models.

Greg BlekhermanGeorgia Institute of [email protected]

Rainer SinnGeorgia [email protected]

Mauricio VelascoUniversidad de los [email protected]

Mauricio VelascoUniversity of los Andes

[email protected]

MS37

Polynomial Optimization with Sum-of-Squares In-terpolants

One of the most common approaches to polynomial opti-mization utilizes semidefinite programming (SDP) hierar-chies, which arise from combining the algebraic theory ofsum-of-squares polynomials with the observation that SOSpolynomials are semidefinite representable. While theo-retically satisfactory, the translation of problems involvingsum-of-squares polynomials to SDPs can be impracticalfor a number of reasons. First, the SDP representationof sum-of-squares polynomials roughly squares the numberof optimization variables, increasing the time and memorycomplexity of the solution algorithms by several orders ofmagnitude. The second problem is numerical: in the mostcommon SDP formulation of the hierarchies, the dual vari-ables are semidefinite matrices whose condition numbersgrow exponentially with the degree of the polynomials in-volved; this is detrimental for afloating-point implementa-tion. In the talk, we show that using a combination of poly-nomial interpolation and non-symmetric cone optimizationwe can optimize directly over sum-of-squares cones in avery efficient way, avoiding the aforementioned pitfalls ofthe semidefinite programming approach to sum-of-squaresoptimization.

David PappNorth Carolina State [email protected]

Sercan YildizUniversity of North Carolina at Chapel [email protected]

MS37

Polynomials as Sums of Few Squares

The classes of polynomials for which every nonnegativepolynomial is a sum of squares is an important subject dat-ing back to Hilbert. I will discuss the number of squaresneeded in these representations and give a tight bound interms of the number of variables. This generalizes thecelebrated result by Hilbert that every real nonnegativeternary quartic is a sum of three squares. Applying thistheory to polynomials called biforms gives low-rank factor-izations of positive semidefinite bivariate matrix polyno-mials. For polynomials in two variables, we can also countthe number of representations as a sum of few squares.

Greg BlekhermanGeorgia Institute of [email protected]

Daniel PlaumannTU [email protected]

Rainer SinnGeorgia [email protected]

Cynthia VinzantUniversity of MichiganDept. of Mathematics

Page 60: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 149

[email protected]

MS38

Deep Learning Meets Partial Differential Equa-tions and Optimization

Abstract not available at time of publication

Eldad HaberUniversity of British Columbia, Vancouver, [email protected]

MS38

Data-Driven Model Reduction via CUR-FactoredHankel Approximation

In this presentation, we consider obtaining reduced-ordermodels for optimization and control from impulse-responsedata of dynamical systems. Traditional system identifica-tion methods for this task require the the singular valuedecomposition of the large, dense, block-structured Hankelmatrix. We show that we can use certain other, faster ma-trix decompositions of the Hankel matrix. In particular, weuse a fast algorithm to compute the CUR-decompositionof the Hankel matrix, and present an algorithm that in-tegrates the decomposition into a subspace-based systemidentification method. We show a speedup factor of 25compared to SVD-based approaches, while maintaining thesame magnitude of error. Error bounds and stability proofsof the obtained reduced-order models are also provided.We demonstrate the approach on a numerical test prob-lem.

Boris Kramer, Karen E. WillcoxMassachusetts Institute of [email protected], [email protected]

Alex A. GorodetskySandia National [email protected]

MS38

Adaptive Multiscale Finite Element Methods forPDE Parameter Estimation

PDE parameter-estimation problems arise frequently inmany applications such as geophysical and medical imag-ing. The problem can be formulated as an optimizationproblem with constraints given by the underlying PDEs.We consider the so-called reduced formulation in whichthe PDEs are eliminated. The resulting unconstrained op-timization problem is computationally expensive to solvebecause as the PDEs need to be solved numerous timesfor different parameters until an adequate reconstruction isfound. We consider the case in which many measurementsare available leading to a large number of PDE constraints.We present an optimization scheme that uses operator-induced interpolation to solve PDEs on a coarser mesh andthereby reduces the computational burden. The projectionis computed using multiscale finite element methods (MS-FEM) and - in our formulation - is adapted in the courseof optimization. Our optimization approach includes thederivatives of the adaptive multiscale basis. We demon-strate the potential of the method using examples fromgeophysical imaging.

Lars RuthottoDepartment of Mathematics and Computer Science

Emory [email protected]

Samy Wu FungEmory University400 Dowman Dr, 30322 Atlanta, GA, [email protected]

MS39

Latest Results on the Binary ApproximationMethod for Polynomial Optimization Problems

We present experimental results with a cutting-plane al-gorithm for polynomial optimization problems which usesan approximate representation of continuous variables asweighted sums of binary variables. Rather than using therepresentation directly, we instead use it to separate overthe convex hull of feasible solutions. This allows us tocontrol the degree to which variables are approximatelyrepresented (as weighted sums of binary variables) as wellas the precision to which the approximations are carriedout, while also reaping the benefits of relying on a ma-ture (linear) mixed-integer solver. Our experiments showsome promising results. Joint work with Chen Chen andGonzalo Munoz.

Daniel BienstockColumbia University IEOR and APAM DepartmentsIEOR [email protected]

MS39

Characterizations of Mixed Integer ConvexQuadratic Representable Sets

Representability results play a fundamental role in opti-mization since they provide geometric characterizations ofthe feasible sets that arise from optimization problems. Inthis talk we study the sets that appear in the feasibility ver-sion of mixed integer convex quadratic optimization prob-lems. We provide a number of characterizations of thesets that can be obtained as the projection of such feasibleregions in a higher dimensional space. Most notably, weprovide a complete characterization in the case that thefeasible region is bounded. This work is joint work withJeffrey Poskin.

Alberto Del PiaUniversity of [email protected]

MS39

Relaxations and Convex Hull Results for the Inter-section of Elementary Nonconvex Quadratics andGeneral Regular Cones

We examine the structure of nonconvex sets obtained fromthe intersection of a regular cone K and an elementarynonconvex quadratic where the underlying matrix has ranktwo. Such nonconvex quadratics are fundamentally relatedto two-term disjunctions on the regular cone K, and thusthe resulting sets can be used as relaxations for a num-ber of problems. We develop a family of structured convexinequalities for these nonconvex sets. Under mild assump-tions, these inequalities together characterize the closedconvex hull of such a set in the original space, and if addi-tional conditions are satisfied, a single inequality from thisfamily is sufficient. For the case where K is the direct prod-

Page 61: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

150 OP17 Abstracts

uct of positive semidefinite cone, second-order cones and anonnegative orthant, we show that these inequalities canbe represented in appropriate tractable forms in certaincases. For more general cases, and when K is the positivesemidefinite cone, we present conic relaxations admittingsome desirable properties.

Sercan YildizUniversity of North Carolina at Chapel [email protected]

Fatma Kilinc-KarzanTepper School of BusinessCarnegie Mellon [email protected]

MS39

Embedding Formulations For Unions Of ConvexSets

There is often a strong trade-off between formulationstrenght and size in mixed integer programming (MIP).When modelling unions of convex sets this trade-off can beresolved by adding auxiliary continuous variables. How-ever, adding these variables can result in a deterioration ofthe computational effectiveness of a formulation. For thisreason there has been considerable interest in constructingstrong formulations that do not use auxiliary variables. Inthis work we introduce a technique to construct such for-mulations that is based on a geometric object known asthe Cayley embedding. We show how this technique canbe easily used to construct new formulations and to detectredundant non-linear inequalities in existing formulations.

Juan Pablo VielmaSloan School of ManagementMassachusetts Institute of [email protected]

MS40

The Adaptive Gradient Method for Optimizationof Functions Observed with Deterministic Error

We consider unconstrained optimization problems wherethe objective function is observed with deterministic errorand assumed to decay with increasing oracle effort. Animportant special case is the integral optimization problemwhere the objective function is an integral that can be eval-uated with a computationally expensive numerical quadra-ture oracle or through a Quasi Monte Carlo oracle. We pro-pose the adaptive gradient method (AGM), which evolvesby repeatedly taking a step along the gradient direction,estimated after expending a carefully specified (strategic)amount of oracle effort. We present AGM’s work com-plexity bounds when implemented on three basic classesof functions characterized by increasing levels of functionstructure. Our results quantify the effect of the “quality”of the oracle in use on the resulting convergence rate ofAGM.

Fatemeh HashemiVirginia [email protected]

Raghu PasupathyPurdue [email protected]

Mike Taaffe

Virginia Tech, [email protected]

MS40

Synchronous and Asynchronous Inexact Proxi-mal Best-Response Schemes for Stochastic NashGames: Rate Analysis and Complexity Statements

This work considers a stochastic Nash game, where eachplayer solves a parametrized stochastic optimization prob-lem. In deterministic regimes, best response schemesare convergent under a suitable spectral property on theproximal-response map. However, a direct application ofthis scheme to stochastic settings requires obtaining ex-act solutions to stochastic optimization problems at everystep. Instead, we propose an inexact generalization of thisscheme in which an inexact solution is computed via a fixed(but increasing) number of projected stochastic gradientsteps. On the basis of this framework, we make several con-tributions: First, we propose a synchronous inexact best-response scheme where all players simultaneously updatetheir decisions. Then we propose an asynchronous schemefor the setting when players are not required to make simul-taneous decisions nor need they have have access to theirrivals’ latest information. Surprisingly, we show that theiterates of both regimes converge to the unique equilibriumin mean at a linear rate rather than a sub-linear rate. Fur-thermore, we establish the overall iteration complexity ofthe schemes in terms of projected stochastic gradient stepsfor computing an ε−Nash equilibrium. The schemes arefurther extended to settings where players solve two-stagelinear and quadratic stochastic Nash games. Finally, wecarry out some numerical simulations to demonstrate therate and complexity statements.

Jinlong LeiPenn. State [email protected]

Uday ShanbhagDepartment of Industrial and Manufacturing EngineeringPennsylvania State [email protected]

Jong-Shi Pang, Suvrajeet SenUniversity of Southern [email protected], [email protected]

MS40

Distributed Optimization Over Networks: NewRate Results

We consider the problem of distributed optimization overtime-varying undirected graphs. We discuss a distributedalgorithm (DIGing) for solving this problem based on acombination of an inexact gradient method and a gradi-ent tracking technique. This algorithm deploys fixed stepsize but converges exactly to the global and consensualminimizer. Under strong convexity assumption, we provethat the algorithm converges at an R-linear (geometric)convergence rate as long as the step size is less than a spe-cific bound; we give an explicit estimate of this rate overuniformly connected graph sequences and show it scalespolynomially with the number of nodes. Numerical experi-ments demonstrate the efficacy of the introduced algorithmand validate our theoretical findings.

Angelia NedichArizona State University

Page 62: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 151

[email protected]

Alexander OlshevskyUniversity of Illinois at [email protected]

Wei ShiBoston [email protected]

MS40

A Variable Sample-Size Stochastic ApproximationFramework

Traditional stochastic approximation (SA) schemes employa single gradient or a fixed batch of noisy gradients incomputing a new iterate and requires a projection on apossibly complex convex set for every update. As a conse-quence, practical implementations may require the solutionof a large number of projection problems, and may renderthis scheme impractical. We present a variable sample-size stochastic approximation (VSSA) scheme where thebatch of noisy gradients may change in size across iterationsand the scheme terminates when the budget is consumed.In this setting, we derive error bounds in strongly con-vex, convex differentiable, and convex nonsmooth regimes.In particular, we focus on quantifying the rate of conver-gence in terms of projection steps and in terms of simu-lation budget and comment on the optimality of the ob-tained rates. In addition, we present an accelerated VSSAscheme which displays the optimal accelerated rate of con-vergence in terms of projection steps if the sample sizegrows sufficiently fast. Preliminary numerics suggest thatVSSA schemes outperform their traditional counterpartsin terms of computational time, often by several orders ofmagnitude, with little drop in empirical performance.

Uday ShanbhagDepartment of Industrial and Manufacturing EngineeringPennsylvania State [email protected]

Afrooz JalilzadehPenn. State [email protected]

MS41

An Augmented Lagrangian Filter Method for Real-Time Embedded Optimization

We present a filter line-search algorithm for non-convexcontinuous optimization that combines an augmented La-grangian function and a constraint violation metric to ac-cept and reject steps. The approach is motivated by real-time optimization applications that need to be executedon embedded computing platforms. The proposed methodenables primal-dual regularization of the linear algebra sys-tem that in turn permits the use of solution strategies withlower computing overheads. We prove that the proposedthe algorithm is globally convergent and we demonstratethe developments using a nonlinear real-time optimizationapplication. Our numerical tests are performed on a stan-dard desktop and on an embedded platform, and demon-strate that the proposed approach enables reductions in so-lution times of up to three orders of magnitude comparedto an standard filter implementation.

Nai-Yuan ChiangUnited Technologies Research Center

[email protected]

Rui HuangUnited [email protected]

Victor ZavalaUniversity of [email protected]

MS41

A Parallelizable Augmented Lagrangian MethodApplied to Large-Scale Non-Convex-ConstrainedOptimization Problems

We contribute improvements to a Lagrangian dual solu-tion approach applied to large-scale optimization problemswhose objective functions are convex, continuously differ-entiable and possibly nonlinear, while the non-relaxed con-straint set is compact but not necessarily convex. Suchproblems arise, for example, in the split-variable determin-istic reformulation of stochastic mixed-integer optimizationproblems. The dual solution approach needs to addressthe nonconvexity of the non-relaxed constraint set whilebeing efficiently implementable in parallel. We adapt theaugmented Lagrangian method framework to address thepresence of nonconvexity in the non-relaxed constraint setand the need for efficient parallelization. The developmentof our approach is most naturally compared with the de-velopment of proximal bundle methods and especially withtheir use of serious step conditions. However, deviationsfrom these developments allow for an improvement in effi-ciency with which parallelization can be utilized. Pivotalin our modification to the augmented Lagrangian methodis the use of an integration of approaches based on thesimplicial decomposition method (SDM) and the nonlin-ear block Gauss-Seidel (GS) method. Under mild condi-tions optimal dual convergence is proven, and we reportcomputational results on test instances from the stochasticoptimization literature. We demonstrate improvement inparallel speedup over a baseline parallel approach.

Brian C. DandurandArgonne National [email protected]

Natashia BolandGeorgia Institute of [email protected]

Jeffrey ChristiansenMathematical Sciences,School of Science, RMIT [email protected]

Andrew EberhardRMIT [email protected]

Fabricio OliveiraMathematical Sciences,School of Science, RMIT [email protected]

MS41

An Efficient Dual Newton Strategy for Tree-Sparse

Page 63: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

152 OP17 Abstracts

Quadratic Programs

Model Predictive Control (MPC) is an advanced controlstrategy that uses the model of the controlled plant to opti-mize its future behaviour, subject to constraints. However,when uncertainties are present in the employed model, theperformance of the controller may deteriorate significantly.One common approach to address this problem is scenario-tree MPC, where uncertainties are handled via a finitenumber of realizations at each decision point. A draw-back of this formulation is that the size of the underlyingoptimization problem grows exponentially in the horizonlength and thus structure-exploiting solvers are essentialto achieve acceptable computation times. In this talk, wepropose to solve the tree-structured Quadratic Programs(QPs) arising in scenario-tree MPC with a dual Newtonstrategy, where the dynamic constraints are relaxed viaLagrange duality and the dual problem is solved with asemi-smooth Newton method. We exploit the underlyingstructure to factorize the dual Hessian in an efficient andparallelizable manner. We conclude the talk with a com-parison of our method to other dual Newton strategy vari-ants that differ in the choice of constraints that are dual-ized.

Dimitris KouzoupisUniversity of [email protected]

Emil KlintbergChalmers University of [email protected]

Gianluca FrisonTechnical University of [email protected]

Sebastien GrosChalmers University of [email protected]

Moritz DiehlUniversity of [email protected]

MS41

Scenario Decomposition for Stochastic 0-1 Opti-mization

We present a scenario decomposition algorithm for stochas-tic 0-1 programs. The algorithm recovers an optimal solu-tion by iteratively exploring and cutting-off candidate so-lutions obtained from solving scenario subproblems. Thescheme is applicable to quite general problem structuresand can be implemented in a distributed framework. Weprovide a theoretical justification of the effectiveness of theproposed scheme.

Shabbir AhmedGeorgia Institute of Technology, Industrial andSystems [email protected]

Deepak RajanLawrence Livermore National Laboratory

[email protected]

MS42

Sketch and Project: A Tool for Designing Stochas-tic Quasi-Newton Methods and Stochastic Vari-ance Reduced Gradient Methods

I will present a stochastic sketching and projecting tech-nique, akin to the least change formulation of quasi-Newtonmethods, for designing new stochastic quasi-Newton meth-ods. This includes a new and recently introduced stochas-tic block BFGS method. With the same sketching andprojecting technique, I will present a unifying viewpoint ofvariance reduced stochastic gradient methods. Not only isthis viewpoint expressive, but it also leads to simpler andstronger proofs of convergence.

Robert M. Gower, Peter RichtarikUniversity of [email protected], [email protected]

Francis BachINRIA - Ecole Normale [email protected]

MS42

Introduction to Riemannian BFGS Methods

Riemannian optimization is the task of finding an opti-mum of a real-valued function defined on a Riemannianmanifold. Riemannian optimization has been a topic ofmuch interest over the past few years due to many applica-tions including computer vision, signal processing, and nu-merical linear algebra. So far, many effective and efficientoptimization methods on Riemannian manifolds have beenproposed and analyzed. In this presentation, we chooseRiemannian BFGS methods as examples to briefly reviewthe framework of Riemannian optimization and emphasizeits differences from Euclidean optimization. A numericalexperiment is used to show the performance of RiemannianBFGS methods.

Wen HuangRice [email protected]

MS42

GRANSO: An Open-Source Software Package Em-ploying a BFGS-SQP Method for Constrained,Nonsmooth, Nonconvex Optimization

We present a new open-source optimization package im-plemented in Matlab intended to be an efficient solverfor nonsmooth constrained optimization problems, with-out any special structure or assumptions imposed on theobjective or constraint functions. Though nonsmooth op-timization is a well-studied area, there is still little avail-able software specifically designed to handle general nons-mooth constraint functions. In 2012, Curtis and Overtonproposed a relevant gradient-sampling-based SQP method,proving convergence results under the assumption that theobjective and constraint functions are locally Lipschitz, butif these functions are expensive to compute, its code canbe quite slow due to the gradient sampling. In contrast,our new software, called GRANSO: GRadient-based Algo-rithm for Non-Smooth Optimization, is designed to use fewfunction/gradient evaluations per iteration and can handleproblems involving functions that are any or all of the fol-

Page 64: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 153

lowing: smooth or nonsmooth, convex or nonconvex, andlocally Lipschitz or non-locally Lipschitz. GRANSO onlyrequires that gradients be supplied, even for nonsmoothproblems. Although GRANSO’s underlying BFGS-SQPmethod provides no convergence guarantees for nonsmoothor non-locally-Lipschitz problems, using a challenging testset of 200 nonsmooth, nonconvex, constrained optimiza-tion problems arising in controller design, we have observedGRANSO to be both reliable and significantly more effi-cient than available alternatives.

Frank E. CurtisIndustrial and Systems EngineeringLehigh [email protected]

Tim MitchellMax Planck Institute for Dynamics of Complex [email protected]

Michael L. OvertonNew York UniversityCourant Instit. of Math. [email protected]

MS42

Parallel quasi-Newton methods for PDE-constrained optimization

In this talk we will present a parallel quasi-Newton interiormethod for constrained optimization problems. The workis motivated by large-scale PDE-constrained topology opti-mization problems that occurs in optimal material design.Our parallel method exploits the fact that the interior-point linear system consists of thin dense blocks that bor-der a low-rank update of a diagonal matrix. This structurestems from the use of reduced-space PDE-constrained op-timization formulations and the use of secant Hessian ap-proximations. The remainder of the computations (mat-vec, vec-vec operations) are parallelized by leveraging thedomain decomposition present in the PDE solver used forfinite element analysis and for the computation of the sen-sitivities. We will also report on the numerical performancewhen solving structural topology optimization problems inparallel and discuss the parallel efficiency of the proposedapproach.

Cosmin G. PetraLawrence Livermore NationalCenter for Advanced Scientific [email protected]

MS43

Semidefinite Approximations of Matrix Logarithm

We propose a new way to treat the exponential/relativeentropy cone using symmetric cone solvers. Our approachis based on highly accurate rational (Pade) approximationsof the logarithm function. The key to this approach is thatour rational approximations, by construction, inherit the(operator) concavity of the logarithm. Importantly, ourmethod extends to the matrix logarithm and other derivedfunctions such as the matrix relative entropy, giving newsemidefinite optimization-based tools for convex optimiza-tion involving these functions. We compare our methodto the existing successive approximation scheme in CVX,and show that it can be much faster, especially for large

problems.

Hamza FawziUniversity of [email protected]

James Saunderson, Pablo A. ParriloMassachusetts Institute of [email protected], [email protected]

MS43

Optimal Resource Allocation to Control EpidemicOutbreaks in Networked Populations

We study the problem of controlling epidemic outbreaks innetworked populations by distributing protection resourcesthroughout the nodes of the network. We assume thattwo types of protection resources are available: (i) Preven-tive resources able to defend individuals in the populationagainst the spreading of the disease (such as vaccines ordisease-awareness campaigns), and (ii) corrective resourcesable to neutralize the spreading (such as antidotes). We as-sume that both preventive and corrective resources have anassociated cost and study the problem of finding the cost-optimal distribution of resources throughout the networkedpopulation. We analyze these questions in the context ofa viral outbreak and study the following two problems: (i)Given a fixed budget, find the optimal allocation of pre-ventive and corrective resources in the network to achievethe highest level of disease containment, and (ii) when abudget is not specified, find the minimum budget requiredto eradicate the disease. We show that both resource al-location problems can be efficiently solved for a wide classof cost functions. We illustrate our approach by design-ing optimal protection strategies to contain an epidemicoutbreak that propagates through the air transportationnetwork.

Victor PreciadoUniversity of [email protected]

MS43

Relative Entropy Relaxations for Signomial Opti-mization

Signomial programs (SPs) are optimization problems spec-ified in terms of signomials, which are weighted sums ofexponentials composed with linear functionals of a deci-sion variable. SPs are non-convex optimization problemsin general, and families of NP-hard problems can be re-duced to SPs. We describe a hierarchy of convex relax-ations to obtain successively tighter lower bounds of theoptimal value of SPs. This sequence of lower bounds iscomputed by solving increasingly larger-sized relative en-tropy optimization problems, which are convex programsspecified in terms of linear and relative entropy functions.We show how these ideas can be applied to a number ofproblems in various application domains (e.g. machinelearning) where SPs appear in a very natural manner. Wealso discuss numerical aspects of our work, including a rel-ative entropy optimization solver, and various experimentsconducted using the same.

Parikshit ShahUniversity of Wisconsin/Philips [email protected]

Venkat Chandrasekaran

Page 65: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

154 OP17 Abstracts

California Institute of [email protected]

MS43

A Positivstellensatz for Sums of Nonnegative Cir-cuit Polynomials

Deciding nonnegativity of real polynomials is a fundamen-tal problem in real algebraic geometry and polynomial op-timization. Since this problem is NP-hard, one is interestedin finding sufficient conditions (certificates) for nonnegativ-ity, which are easier to check. Since the 19th century, thestandard certificates for nonnegativity are sums of squares(SOS). In practice, SOS based semidefinite programming(SDP) is the standard method to solve polynomial opti-mization problems. In 2014, Iliman and I introduced an en-tirely new nonnegativity certificate based on sums of non-negative circuit polynomials (SONC), which are indepen-dent of sums of squares. We successfully applied SONCsto global nonnegativity problems. Recently, Dressler, Ili-man, and I proved a Positivstellensatz for SONCs using re-sults by Chandrasekaran and Shah. This Positivstellensatzprovides a converging hierarchy of lower bounds for con-strained polynomial optimization problems. These boundscan be computed efficiently via relative entropy program-ming. This result establishes SONCs as a promising al-ternative to SOS based SDP methods for hard polynomialoptimization problems.

Timo de WolffTexas A&M [email protected]

Mareike DresslerGoethe University, Frankfurt am [email protected]

Sadik IlimanKPMG AG, Frankfurt am [email protected]

MS44

Sparse Linear Programming for Optimal AdaptiveTrial Design

Adaptive enrichment designs involve preplanned rules formodifying enrollment criteria based on accruing data in arandomized trial. We focus on designs where the overallpopulation is partitioned into two predefined subpopula-tions, e.g., based on a biomarker or risk score measured atbaseline. The goal is to learn which populations benefitfrom an experimental treatment. Two critical componentsof adaptive enrichment designs are the decision rule formodifying enrollment, and the multiple testing procedure.We provide a general method for simultaneously optimizingboth of these components for two stage, adaptive enrich-ment designs. We minimize the expected sample size underconstraints on power and the familywise Type I error rate.It is computationally infeasible to directly solve this op-timization problem due to its nonconvexity. The key toour approach is a novel discrete representation of this op-timization problem as a sparse linear program. We applyadvanced optimization methods to solve this problem tohigh accuracy, revealing new, approximately optimal de-signs. This is a joint work with Michael Rosenblum andHan Liu

Ethan X. FangPennsylvania State University

[email protected]

MS44

Large Scale Nonconvex ADMM

Abstract not available at time of publication.

Tom [email protected]

MS44

Exploring the Second Order Sparsity in Large ScaleOptimization

In this talk we shall demonstrate how the second ordersparsity in important optimization problems such as thelasso models, semidefinite programming, and many otherscan be explored to induce efficient algorithms. The key ideais to incorporate the semismooth Newton methods into theaugmented Lagrangian method framework in a way thatthe subproblems involved only need low to medium costs,e.g., for lasso problems with sparse solutions, the cost forsolving the subproblems at each iteration is no more thanthat for most of the first order methods and can be muchcheaper than some first order methods. Consequently, withthe fast convergence rate in hand, usually asymptoticallysuperlinear linear, we now reach at the stage of being ableto solve many challenging large scale convex optimizationproblems efficiently and robustly.

Defeng SunDept of MathematicsNational University of [email protected]

MS44

Nonconvex Statisitcal Optimization: From LocalExploitation to Global Exploration

Taming nonconvexity requires a seamless integration ofboth global exploration and local exploitation of latentconvexity. In this talk, we study a generic latent variablemodel that includes PCA, ICA, and dictionary learning asspecial cases. Using this statistical model as illustration,we propose two global exploration schemes for identifyingthe basin of attraction, namely tightening after relaxationand noise regularization, which allow further local exploita-tion via gradient-based methods. Interestingly, given a suf-ficiently good initialization, the local exploitation methodsare often information-theoretic optimal. In sharp contrast,based on an oracle computational model, we prove that,one often has to pay a computational or statistical price inthe global exploration stage to attain a good initialization.This is joint work with Zhuoran Yang, Junchi Li, and HanLiu.

Zhaoran WangPrinceton [email protected]

MS45

Nested Min Cuts and Bipartite Queueing System

Abstract not available at time of publication.

Yichuan Ding, Yichuan DingSauder School of Business

Page 66: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 155

University of British [email protected], [email protected]

MS45

Combinatorial Algorithms for Clustering, ImageSegmentation and Data Mining

Abstract not available at time of publication.

Dorit HochbaumDepartment of IE&ORUC [email protected]

MS45

Rerouting Flow in a Network

Abstract not available at time of publication.

Jannik MatuschkeTU [email protected]

MS45

A Unified Framework for Determining the Struc-tural Properties of Nested Min Cuts

Abstract not available at time of publication.

S. Thomas McCormickSauder School of Business, Vancouver, [email protected]

MS46

Estimation and Computation of Log-Concave Den-sity Estimators

We consider nonparametric maximum likelihood estima-tion of a log-concave density with further constraints im-posed. These are useful for using likelihood ratio teststo perform inference, and in their own right. In particu-lar we consider the mode-constrained log-concave nonpara-metric maximum likelihood estimator (MLE). One reasonthis MLE is useful is that it is simultaneously nonparamet-ric, yet automatically selects optimal bandwidths withoutany user dependent tuning required, unlike e.g. kernel den-sity estimators. We will discuss using the constrained MLEin modal regression, as well as its use in a likelihood ratiotest for the mode. We will discuss the algorithms used forcomputation of the estimators.

Charles DossSchool of Statistics, University of Minnesota313 Ford Hall, 224 Church St. SE, Minneapolis, [email protected]

MS46

On the Accuracy of the Kiefer-Wolfowitz MLE forGaussian Mixtures

Given real-valued observations X1, . . . , Xn, the Kiefer-Wolfowitz MLE for gaussian mixtures is defined as the(nonparametric) maximum likelihood estimator over theclass of all gaussian mixtures (densities of the form

∫φ(x−

θ)dμ(θ) where φ(x) := (2π)−1 exp(−x2/2) and μ is a prob-ability measure on the real line). This estimator (contrary

to most mixture estimates) can be computed via convexoptimization. I shall present some results on the accuracyof this estimator when the data is generated i.i.d from agaussian mixture. I will also briefly mention extensions tothe multidimensional case.

Aditya GuntuboyinaDepartment of Statistics, University of California,Berkeley423 Evans Hall, Berkeley, CA [email protected]

MS46

Consistency in Nonparametric Estimation UsingVariational Analysis

This presentation discusses density estimation in situationswhen the sample (hard information) is supplemented bysoft information about the random phenomenon. Thesesituations arise broadly in operations research and man-agement science where practical and computational rea-sons severely limit the sample size, but problem structureand past experiences could be brought in. We adopt a con-strained maximum likelihood estimator that incorporatesany, possibly random, soft information through an arbi-trary collection of constraints. We illustrate the breadthof possibilities by discussing soft information about shape,support, continuity, smoothness, slope, location of modes,symmetry, density values, neighborhood of known density,moments, and distribution functions. The maximizationtakes place over spaces of extended real-valued semicon-tinuous functions and therefore allows us to consider es-sentially any conceivable density as well as convenient ex-ponential transformations. The infinite dimensionality ofthe optimization problem is overcome by approximatingsplines tailored to these spaces.

Johannes O. RoysetOperations Research DepartmentNaval Postgraduate [email protected]

MS46

On Multivariate Convex Regression

In the talk I will describe properties of the nonparamet-ric least squares estimator (LSE) of a convex regressionfunction when the predictor is multidimensional. We char-acterize the LSE and discuss the computation of the esti-mator. We study theoretical properties of the estimator:its consistency, global rate of convergence, and adaptationproperties. I will also point to a few applications of convexregression.

Bodhisattva SenDepartment of Statistics, Columbia University1255 Amsterdam Ave, Room # 1032 SSW, New York,NY [email protected]

MS47

Feasible Parameter Sets in Statinary Models of GasTransportation

The concept of feasible parameter set established in para-metric optimization long ago has regained attention inmathematical decision support in gas network operation.Here, parameters correspond to vectors of in- and mostlyoutput of gas quantities to be sent through the network.

Page 67: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

156 OP17 Abstracts

Feasible parameters correspond to valid nominations, andthat is what gas companies are after since they must guar-antee technical feasibility to the customers. Of similar,maybe even bigger, importance to the company is to beable to “prove’ technical infeasibility if they turn down anomination. Due to the uncertainty which is inherent togas networks it is relevant to assign probabilities to subsetsof nominations for being feasible. In the talk we are dis-cussing feasible parameter sets for pipeline systems withoutmeshes or with at most one. We conclude with an outlookto systems with a low number of mutually interwoven cy-cles.

Ralf Gollmer, Matthias ClausUniversity of [email protected], [email protected]

MS47

Steady-State Models for Optimizing Gas Trans-portation: Analytical and Algebraic Perspectives

Steady-state models for gas transportation involving Kirch-hoff’s Laws obey interesting mathematical properties.From a more analytical point-of-view coercivity and mono-tonicity come to the fore. Algebraically considered, differ-ent types of polynomial descriptions are possible. Thismobilizes application of ideas from, for instance, Grobnerbases theory and methods or from polynomial optimiza-tion in the presence of zero-dimensional ideals. In the talk,we elaborate on these topics and provide applications tofeasibility analysis (nomination validation) of transporta-tion orders in mildly meshed gas transportation networkswith different densities of meshes. A specific feature of ourmodels is in the presence of absolute values which has todo with flow directions. Therefore we present some rules tocheck for inconsistencies leading to elimination of irrelevantresolutions of absolute values.

Ruediger SchultzUniversity of [email protected]

MS47

On the Robustification of Uncertain Physical Pa-rameters in Gas Networks

A class of stationary gas network problems is investigated.The challenge is to decide whether a set of demands can besatisfied by the network if not all characteristic data as, e.g.the roughness of the pipes are precisely known. To handlethese uncertain physical parameters, a robust approach issuggested. It is shown that if, for a passive network, thepressure loss is modeled by the Weymouth equation, cer-tificates for feasibility and infeasibility can be derived. Therobustified problems can be rephrased as polynomial opti-mization problems. If active elements are involved suchreformulations are not possible. If this case, the nonlinearpressure loss is approximated by a piecewise linearizationscheme introducing auxiliary binary and continuous vari-ables. As these variables approximate the pressure and flowalong the pipes, they have to be adjustable with respectto the uncertain parameters. Thus, one is faced with thesolution of a two-stage robust optimization problem withbinary variables at second stage. In order to cope withthis, a piecewise approximation of the second stage vari-ables is suggested. It is demonstrated that in certain casesthe underlying decomposition of the uncertainty set canbe performed such that the second stage binary variablesare eliminated. The practical relevance of the theoretical

results will be demonstrated by a set of small to mediumsize numerical examples.

Michael StinglFriedrich-Alexander University of [email protected]

MS47

Control and Estimation Problems in Transient GasPipeline Flow

Renewed interest in the dynamic behavior of gas pipelinenetworks motivates a fundamental examination of model-ing, simulation, optimal control, and estimation techniquesfor these systems. Mathematically, compressible fluid flowin a pipeline system can be represented by nonlinear hy-perbolic partial differential equations (PDE) on a metricgraph subject to time-dependent boundary conditions andnodal compatibility laws. In addition to mass flow balanceat network nodes, where fluid may be injected or with-drawn from the network, flow can be actuated by com-pressors or regulators. The resulting highly nonlinear andill-conditioned distributed system model defies traditionalcontrol-theoretic and computational treatment. After for-mulating a control system model and applying approxima-tions, we can formulate constrained optimal control prob-lems for flow scheduling, and state and/or parameter esti-mation problems, as large-scale nonlinear programs. Solu-tions using finer discretization converge to feasible optimaof the continuous problem, which can be validated by sim-ulation.

Anatoly ZlotnikLos Alamos National [email protected]

MS48

Robust Empirical Optimization is almost the sameas Mean-Variance Optimization

We formulate a distributionally robust optimization prob-lem where the empirical distribution plays the role of thenominal model, the decision maker optimizes against aworst-case alternative, and the deviation of the alterna-tive distribution from the nominal is controlled by a φ-divergence penalty in the objective. Our main finding isthat a large class of robust empirical optimization prob-lems of this form are essentially equivalent to an in-samplemean-variance problem. Intuitively, controlling the vari-ance reduces the sensitivity of a decision’s expected re-ward to perturbations in the tail of the in-sample rewarddistribution. This in turn reduces the sensitivity of itsout-of-sample performance to perturbations in the nomi-nal model, which is precisely the notion of robustness. Asa practical application of our main result, we introducethe notion of a robust mean-variance frontier, which canbe used as a calibration tool for the robust model. To il-lustrate the usefulness of such frontiers we consider a realworld portfolio optimization application. Our numericalexperiments show that the first-order effect of robustnessis variability reduction, followed by reduction in the meanreward.

Jun-ya GotohChuo [email protected]

Michael KimUniversity of Toronto

Page 68: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 157

[email protected]

Andrew LimNational University of [email protected]

MS48

Solving Stochastic Variational Inequalities by Pro-gressive Hedging

Most of the research on stochastic variational inequalitieshas concentrated on models in which information about theuncertain future is revealed only once. Such models areinadequate to cover multistage stochastic programming,where information comes in stages that offer repeated op-portunities for recourse decisions. That feature can bebrought into stochastic variational inequalities by adapt-ing them to a constraint of nonanticipativity. In that waynot only stochastic programming but multistage multia-gent games can be covered. A particular advantage ofthis approach is that it generates information price vec-tors which can be used to decompose the overall probleminto a separate problem for each scenario. This fits withsolution approaches like the progressive hedging algorithm.However, the ideas behind this algorithm offer the prospectof being applicable outside of a stochastic framework andunder localized versions of convexity/monotonicity whichoften may be elicited from the problem structure in themanner of augmented Lagrangian methodology.

Terry RockafellarUniversity of WashingtonDepartment of [email protected]

MS48

Quasi-Monte Carlo and Sparse Grid Quadraturescan be Efficient for Solving Two-Stage StochasticOptimization Models

Sparse grid and randomized Quasi-Monte Carlo quadra-tures can be efficient for multivariate numerical integrationhave first order mixed partial Sobolev derivatives with re-spect to all variables. This remains true if a smoothingtechnique based on dimension reduction applies and theintegrand in stochastic optimization models allows an ap-proximate representation by lower order smooth ANOVAterms. We discuss the theoretical background of all meth-ods and techniques, and illustrate the theory by a practicalexample originating from electric power industry.

Werner RoemischHumboldt University BerlinDepartment of [email protected]

Hernan LeoveyHumboldt-University Berlin, Institute of [email protected]

MS48

Surrogate Models for Characterizing Tail Statisticsof Systems Governed by PDEs

Performance of physical and engineering systems is oftendescribed by statistics of distribution tails such as quan-tile and probability of exceedance. However, these func-tions have poor mathematical properties for distributions

described by sample statistics. Therefore, it is very dif-ficult to optimize such quantities and build systems withthe optimal characteristics. The recent trend is to use al-ternative characteristics, serving similar purposes: Condi-tional Value-at-Risk (CVaR) and Buffered Probability ofExceedance (bPOE). CVaR and bPOE can be optimizedwith convex and liner programming algorithms. We buildsurrogates for CVaR and bPOE of systems described byPartial Differential Equations (PDEs) with stochastic pa-rameters. The surrogates are estimated with linear regres-sion algorithms.

Stan UryasevUniversity of [email protected]

MS49

Matrix Problems with No Spurious Local Minima

We show that there are no spurious local minima in thenon-convex factorized parametrization of low-rank matrixrecovery from incoherent linear measurements. With noisymeasurements we show all local minima are very close to aglobal optimum. Together with a curvature bound at sad-dle points, this yields a polynomial time global convergenceguarantee for stochastic gradient descent from random ini-tialization.

Nathan SrebroToyota Technical Institute - [email protected]

Srinadh BhojanapalliToyota Technological Institute at [email protected]

MS49

Fast Algorithms for Robust PCA via Gradient De-scent

Abstract not available at time of publication.

Constantine CaramanisThe University of Texas at [email protected]

MS49

Provably Robust Low-Rank Matrix Recovery viaMedian-Truncated Gradient Descent

Low-rank matrix recovery is of central interest to manyapplications such as collaborative filtering, sensor localiza-tion, and phase retrieval. Convex relaxation via nuclearnorm minimization is a powerful method, but suffers fromcomputational complexity when the problem dimension ishigh. Recently, it has been demonstrated that gradientdescent works provably for low-rank matrix recovery if ini-tialized using the spectral method, therefore achieving bothcomputational and statistical efficacy. However, this ap-proach fails in the presence of arbitrary, possibly adver-sarial, outliers in the measurements. In this talk, we willdescribe how to modify the gradient descent approach viaa median-guided truncation strategy, and show this yieldsprovably recovery guarantees for low-rank matrix recovery.While median has been well-known in the robust statisticsliterature, its utility in high-dimensional signal estimationis novel in this work. In particular, robust phase retrievalwill be highlighted as a special case. This is based on joint

Page 69: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

158 OP17 Abstracts

work with Huishuai Zhang, Yuanxin Li, and Yingbin Liang.

Yuejie ChiOhio State [email protected]

MS49

Low-Rank Matrix Completion (LRMC) Using Nu-clear Norm (NN) with Facial Reduction (FR)

Minimization of the NN is often used as a surrogate, con-vex relaxation, for solving LRMC problems. The minimumNN problem can be solved as a trace minimization semidef-inite program (SDP). The SDP and its dual are regular inthe sense that they both satisfy strict feasibility. FR hasbeen successful in regularizing many problems where strictfeasibility fails, e.g., SDP relaxations of discrete optimiza-tion problems such as QAP, GP, as well as sensor networklocalization. Here we take advantage of the structure atoptimality for the NN minimization and show that eventhough strict feasibility holds, the FR framework can besuccessfully applied to obtain a proper face that containsthe optimal set. This can dramatically reduce the size ofthe final NN problem while guaranteeing a low-rank solu-tion. We include numerical tests for both exact and noisycases. (work with Shimeng Huang)

Henry WolkowiczUniversity of WaterlooDepartment of Combinatorics and [email protected]

MS50

Thompson Sampling for Reinforcement Learning

We consider a variation of “Thompson Sampling” (TS),i.e., Bayesian posterior sampling heuristics for managingexploration-exploitation trade-off, to design an on-policylearner for a finite state, finite action, average reward re-inforcement learning problem. We compare the averagereward of TS to average reward of the optimal station-ary policy, and demonstrate that TS achieves near-optimalworst case regret bound for weakly communicating MDPs.

Shipra AgrawalDepartment of Industrial Engineering and OperationsResearchColumbia [email protected]

MS50

Overview of Reinforcement Learning Algorithms

Reinforcement learning (RL) is an area of machine learn-ing for creating agents that can learn long-term successfulbehavior through autonomous interaction with the envi-ronment. Modern RL is a topic studied in different com-munities, including computer science, optimal control, psy-chology, and neuroscience, among others. This talk givesan overview of RL Algorithms, introducing key challengesfrom the optimization perspective, and lay the commonground for other talks in this minisymposium.

Lihong LiMicrosoft Research

[email protected]

MS50

Gradient, Semi-Gradient and Pseudo-Gradient Re-inforcement Learning

In this talk we I will present the establishment of a unifiedgeneral framework for stochastic-gradient-based temporal-difference learning algorithms that use proximal gradientmethods. The primal-dual saddle-point formulation is in-troduced, and state-of-the-art stochastic gradient solvers,such as mirror descent and extragradient are used to designseveral novel RL algorithms. The finite-sample analysis isgiven along with detailed empirical experiments to demon-strate the effectiveness of the proposed algorithms. Severalimportant extensions, such as control learning, variancereduction, acceleration, and regularization will also be dis-cussed in detail.

Bo LiuDepartment of Computer Science and SoftwareEngineeringAuburn [email protected]

MS50

Algorithms for Safe Reinforcement Learning

As our ability to construct intelligent machines improves,so too must our ability to ensure that they are safe. Inthis talk I will review past efforts to ensure that machinelearning algorithms, and particularly reinforcement learn-ing algorithms, ensure different forms of safety. I will thendiscuss some ongoing research on this topic as well as futureresearch directions.

Philip ThomasSchool of Computer ScienceCarnegie Mellon [email protected]

MS51

Symmetric Sums of Squares over K-Subset Hyper-cubes

We consider the problem of finding sum of squares (sos)expressions to establish the non-negativity of a symmetricpolynomial over a discrete hypercube whose coordinatesare indexed by k-element subsets of [n]. We develop a vari-ant of the Gatermann-Parrilo symmetry-reduction methodtailored to our setting that allows for several simplifica-tions and a connection to Razborov’s flag algebras. Weshow that every symmetric polynomial that has a sos ex-pression of a fixed degree also has a succinct sos expressionwhose size depends only on the degree and not on the num-ber of variables. This is joint work with James Saunderson,Mohit Singh and Rekha Thomas.

Annie RaymondUniversity of [email protected]

James SaundersonMassachusetts Institute of [email protected]

Mohit SinghMicrosoft Research

Page 70: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 159

[email protected]

Rekha ThomasUniversity of [email protected]

MS51

Lieb’s Concavity Theorem, Matrix GeometricMeans, and Semidefinite Optimization

A famous result of Lieb establishes that the map (A,B) �→tr(K∗A1−tKBt) is jointly concave in the pair (A,B) ofpositive definite matrices, where K is a fixed matrix andt ∈ [0, 1]. This fact leads to concavity and convexity prop-erties of many related functions that arise naturally, forinstance, in quantum information. In this talk we dis-cuss explicit descriptions of Lieb’s function, for any ratio-nal t ∈ [0, 1], in terms of the feasible regions of semidef-inite optimization problems. Such descriptions allow usto use standard software for semidefinite programming tosolve convex optimization problems involving Lieb’s func-tion, and a number of related functions.

James SaundersonMassachusetts Institute of [email protected]

Hamza FawziUniversity of [email protected]

MS51

Solving Generic Nonarchimedean SemidefinitePrograms Using Stochastic Game Algorithms

One of the basic algorithmic questions associated withsemidefinite programming is to decide whether a spectra-hedon is empty or not. It is unknown whether this problembelongs to NP in the Turing machine model, and the state-of-the-art algorithms that solve this task exactly are basedon cylindrical decomposition or the critical points method.We study the nonarchimedean analogue of this problem,replacing the field of real numbers by the field of Puiseuxseries. We introduce the notion of tropical spectrahedraand show that, under genericity conditions, these objectscan be described explicitly by systems of polynomial in-equalities in the tropical semiring. We demonstrate thatthese inequalities arise from Shapley operators associatedwith stochastic mean payoff games. As a consequence, weshow that tropically generic semidefinite feasibility prob-lems over Puiseux series can be solved efficiently usingcombinatorial algorithms designed for stochastic games.Finally, we discuss some consequences of this result forsemidefinite programming over the reals.

Xavier AllamigeonINRIA and CMAP, Ecole Polytechnique, [email protected]

Stephane GaubertINRIA and CMAP, Ecole [email protected]

Mateusz SkomraINRIA Saclay / Ecole Polytechnique

[email protected]

MS51

Do Sums of Squares Dream of Free Resolutions?

Let X be a variety defined by a homogeneous ideal IX . Weshow that the number of steps for which the minimal freeresolution of IX is linear, is a computable lower bound forthe next-to-minimal rank of the extreme rays of the conedual to the sums of squares in X. As a consequence, weobtain: (1) A complete classification of totally real reducedschemes for which nonnegative quadratic forms are sumsof squares generalizing Froberg’s Theorem on PSD matrixcompletions. (2) New certificates of exactness for truncatedmoment problems on projective varieties.

Mauricio VelascoUniversidad de los [email protected]

Mauricio VelascoUniversity of los [email protected]

Greg Blekherman, Rainer SinnGeorgia Institute of [email protected], [email protected]

MS52

New Results on the Simplex Method for MinimumCost Flow Problems in Infinite Networks

We provide a simplex algorithm for a class of uncapacitatedcountably infinite network flow problems. Previous effortsto develop simplex algorithms for countably infinite net-work flow problems required explicit capacities on arcs withuniformity properties to facilitate duality arguments. Incontrast, our method takes a “primal approach” by devis-ing a simplex method that produces iterates that convergein optimal value without relying on duality arguments andfacilitates the removal of explicit capacity bounds. Instead,our approach leverages a compactness assumption on theset of simplex iterates. This assumption holds in a varietyof applied settings, including infinite-horizon productionplanning and nonstationary infinite-horizon dynamic pro-gramming.

Marina A. EpelmanUniversity of MichiganIndustrial and Operations [email protected]

Christopher T. RyanUniversity of [email protected]

Robert L. SmithIndustrial and Operations EngineeringUniversity of [email protected]

MS52

Stability and Continuity in Robust Optimizationby Mappings to Linear Semi-Infinite OptimizationProblems

We consider the stability of robust optimization (RO) prob-lems with respect to perturbations in their uncertainty

Page 71: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

160 OP17 Abstracts

sets. In particular, we focus on robust linear optimiza-tion problems, including those with an infinite number ofconstraints, and consider uncertainty in both the cost func-tion and constraints. We prove Lipschitz continuity of theoptimal value and ε-approximate optimal solution set withrespect to the Hausdorff distance between uncertainty sets.The Lipschitz constant can be calculated from the problemdata. In addition, we prove closedness and upper semi-continuity for the optimal solution set with respect to theuncertainty set. In order to prove these results, we de-velop a novel transformation that maps RO problems tolinear semi-infinite optimization (LSIO) problems in sucha way that the distance between uncertainty sets of twoRO problems correspond to a measure of distance betweentheir equivalent LSIO problems. Using this isometry weleverage LSIO and Variational Analysis stability results toobtain stability results for RO problems.

Timothy C.Y. Chan, Philip Allen MarDepartment of Mechanical and Industrial EngineeringUniversity of [email protected], [email protected]

MS52

Strong Duality and Sensitivity Analysis in Semi-Infinite Linear Programming

Finite-dimensional linear programs satisfy strong duality(SD) and have the “dual pricing” (DP) property. The (DP)property ensures that, given a sufficiently small perturba-tion of the right-hand-side vector, there exists a dual solu-tion that correctly “prices” the perturbation by computingthe exact change in the optimal objective function value.These properties may fail in semi-infinite linear program-ming where the constraint vector space is infinite dimen-sional. Unlike the finite-dimensional case, in semi-infinitelinear programs the constraint vector space is a modelingchoice. We show that, for a sufficiently restricted vectorspace, both (SD) and (DP) always hold, at the cost of re-stricting the perturbations to that space. The main goal ofthe paper is to extend this restricted space to the largestpossible constraint space where (SD) and (DP) hold. Once(SD) or (DP) fail for a given constraint space, then theseconditions fail for all larger constraint spaces. We give suf-ficient conditions for when (SD) and (DP) hold in an ex-tended constraint space. Our results require the use of lin-ear functionals that are singular or purely finitely additiveand thus not representable as finite support vectors. Weuse the extension of the Fourier-Motzkin elimination pro-cedure to semi-infinite linear systems to understand theselinear functionals.

Amitabh BasuJohns Hopkins [email protected]

Kipp Martin, Christopher T. RyanUniversity of [email protected], [email protected]

MS52

Policy Iteration for Robust Countable-StateMarkov Decision Processes

Existing solution methods for robust Markov decision pro-cesses (MDPs) cannot be implemented “as-is” when thestate-space is countable, since they entail infinite amountsof computation in each step. We present a policy iterationalgorithm for robust countable-state MDPs with bounded

costs, which performs finitely-implementable approximatevariants of policy evaluation and policy improvement ineach iteration. We prove that the value functions for thesequence of policies produced by this algorithm convergemonotonically to the optimal value function. The policiesthemselves converge subsequentially to an optimal policy.

Saumya Sinha, Archis GhateUniversity of [email protected], [email protected]

MS53

PETSc and PDE-Constrained Optimization

We describe TAO, Argonne’s Toolkit for Advanced Opti-mization. TAO provides numerical optimization methodsfor high-performance computing. These methods are builtupon PETSc, the Portable Extensible Toolkit for AdvancedOptimization. I will discuss the overall design of the soft-ware describe some of the available methods, and showhow to solve PDE constrained optimization problems us-ing TAO. Time permitting, I will demonstrate the parallelavailability of TAO.

Todd MunsonArgonne National LaboratoryMathematics and Computer Science [email protected]

MS53

PDE-Constrained Shape Optimization with Fem-Tailored Discretization of Diffeomorphisms

PDE-constrained shape optimization problems are char-acterized by target functionals that depend both on theshape of a domain (the control) and on the solution of aboundary value problem formulated on that domain (thestate). There are conflicting demands on the discretizationof the domain. On the one hand, kth-order finite elementapproximation of the state suggests W k,∞-smooth domaindeformations. On the other hand, typical isoparametricfinite element discretization only allows W 1,∞ piecewisepolynomial representations of the domain geometry. Howcan the control and state spaces be discretized to satisfythese conflicting demands? In this talk we develop a tech-nique for cleanly decoupling a B-spline based discretizationof the control and the geometry used to solve the discretizedstate equation. In particular, the discretized control satis-fies the regularity requirement, which in turn guaranteesthat the standard finite element estimates hold. As a con-sequence, the state can be approximated efficiently withtypical finite element software.

Alberto Paganiniseminar for applied mathematicsETH, [email protected]

Patrick E. FarrellMathematical InstituteUniversity of [email protected]

Florian WechsungUniversity of Oxford

Page 72: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 161

[email protected]

MS53

Parallel-in-Time Methods for Large-Scale PDE-constrained Optimization

We study time-domain decomposition methods for opti-mality systems arising in optimal control problems withnonlinear transient partial differential equation (PDE) con-straints. The methods are based on a concurrent solutionof smaller optimality systems in each time subdomain. Thecoupling in time is recovered gradually at each optimizationiteration through an inexact Krylov scheme combined witha matrix-free trust-region sequential quadratic program-ming (SQP) algorithm. The SQP algorithm controls thedegree of inexactness based on feasibility and optimalitycriteria, and guarantees full time continuity at the optimalsolution. We present numerical results involving optimalcontrol of incompressible flows.

Denis RidzalSandia National [email protected]

MS54

On Intersection of Two Mixing Sets with Applica-tions to Joint Chance-Constrained Programs

We study the polyhedral structure of a generalization ofa mixing set described by the intersection of two mixingsets with two shared continuous variables, where one con-tinuous variable has a positive coefficient in one mixingset and a negative coefficient in the other. Our develop-ments are motivated from a key substructure of linear jointchance-constrained programs (CCPs) with random right-hand sides from a finite probability space. The CCPs ofinterest immediately admit a mixed-integer programmingreformulation. Nevertheless, such standard reformulationsare difficult to solve at large-scale due to the weakness oftheir linear programming relaxations. In this talk, we pro-vide a systemic polyhedral study of such joint CCPs byexplicitly analyzing the system obtained from simultane-ously considering two linear constraints inside the chanceconstraint. We propose a new class of valid inequalitiesin addition to the mixing inequalities and establish con-ditions under which these inequalities are facet defining.Moreover, under certain additional assumptions, we provethat these new valid inequalities along with the classicalmixing inequalities are sufficient in terms of providing theclosed convex hull description of our set. We complementour theoretical results with a computational study on thestrength of the proposed inequalities.

Xiao LiuThe Ohio State [email protected]

Fatma Kilinc-KarzanTepper School of BusinessCarnegie Mellon [email protected]

Simge KucukyavuzIndustrial and Systems EngineeringUniversity of Washington

[email protected]

MS54

Solving Standard Quadratic Programming By Cut-ting Planes

Standard quadratic programs are non-convex quadraticprograms with the only constraint that variables must be-long to a simplex. By a famous result of Motzkin andStraus, those problems are connected to the clique numberof a graph. We propose cutting planes to obtain strongbounds: our cuts are derived in the context of SpatialBranch & Bound, where linearization variables representproducts. Their validity is based on Motzkin-Straus re-sult. We study the relation between these cuts and theones obtained by the first RLT level. We present extensivecomputational results using the cuts in the context of theSpatial Branch & Bound implemented by the commercialsolver CPLEX.

Andrea Lodi

Ecole Polytechnique de [email protected]

Pierre BonamiIBM, [email protected]

Jonas Schweiger, Andrea TramontaniIBM [email protected], [email protected]

MS54

Revisiting the Factorable Relaxation Scheme

In factorable programming, products of functions are re-laxed by constructing outer-approximation of functionsand McCormick envelopes for products. We constructtighter relaxations by exploiting the outer-approximationwhile relaxing function products. We show that convexhull of the product can be constructed in closed-formover cartesian product of some specific polytopes, whereeach polytope encapsulates information regarding outer-approximators and their bounds. The result generalizes toconcave envelopes of supermodular functions linear in eachargument. We provide preliminary computational experi-ence with the new relaxations.

Mohit TawarmalaniKrannert School of [email protected]

Taotao HePurdue [email protected]

MS54

Optimization Methodologies for Truss TopologyDesign Problems

We consider various mathematical models for truss topol-ogy design problems with the objective to minimize theweight, where the non-convex Euler bucking constraints arealso considered. We propose solution approaches to eitheralleviate the non-convexity of the problem, or to obtaina tight lower bound of the optimal objective value. Us-ing a discretization approach, the non-convex optimization

Page 73: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

162 OP17 Abstracts

model is transformed to a mixed integer linear optimiza-tion model. We provide numerical experiments to demon-strate the performance of the discretization approach andthe Mc-Cormick relaxation.

Tamas TerlakyLehigh UniversityDepartment of industrial and Systems [email protected]

Mohammad Shahabsafa, Ali Mohammad NezhadDept. Industrial and Systems EngineeringLehigh [email protected], [email protected]

Luis ZuluagaLehigh [email protected]

Sicheng He, John T. Hwang, Joaquim R. R. A MartinsUniversity of [email protected], [email protected], [email protected]

MS55

Restarted Sub-Gradient Method for Non-SmoothOptimization under Error Bound Conditions

In this paper, we show that, when applied to a class of con-vex optimization problems that satisfies some error boundconditions, a restarted sub-gradient (RSG) method canfind an optimal solution with a much lower iteration com-plexity than the standard sub-gradient method. This classof problems include polyhedral optimization, and the op-timization whose objective functions have local quadraticgrowth, or admit a local Kurdyka-Lojasiewicz inequality.Moreover, we show that a homotopy smoothing methodcan further reduce the complexity of RSG when the prob-lem satisfies an additional structured non-smooth property

Qihang Lin, Tianbao Yang, Yi XuUniversity of [email protected], [email protected],[email protected]

Yan YanUniversity of Technology [email protected]

MS55

Further Properties of the Forward-Backward En-velope with Applications to Difference-of-ConvexProgramming

We further study the forward-backward envelope first in-troduced by Patrinos and Bemporad for problems whoseobjective is the sum of a convex function and a twice con-tinuously differentiable possibly nonconvex function withLipschitz continuous gradient. We derive sufficient con-ditions for the corresponding forward-backward envelopeto be a level-bounded and Kurdyka- Lojasiewicz functionwith an exponent of 1

2. We also demonstrate how to min-

imize some difference-of-convex regularized least squaresproblems by minimizing a suitably constructed forward-backward envelope. Finally, numerical results are shown.

Tianxiang Liu

the Hong Kong Polytechnic [email protected]

Ting Kei PongDepartment of Applied MathematicsThe Hong Kong Polytechnic University, Hong [email protected]

MS55

An Inertial Forward-Douglas-Rachford SplittingMethod and Applications

We propose an inertial Forward-Douglas-Rachford splittingmethod for solving monotone inclusions involving three op-erator. The weak and strong convergence of the iterationare proved under suitable condition. Some applications tominimization problems, system of monotone inclusions aredemonstrated. Numerical results are provided.

Bang Cong Vu, Volkan Cevher, Alp [email protected], [email protected],[email protected]

MS55

The Common-Directions Method for RegularizedEmpirical Risk Minimization

State-of-the-art first-order optimization methods are ableto achieve optimal linear convergence rate, but cannot ob-tain quadratic convergence. In this work, we propose afirst-order method for regularized empirical risk minimiza-tion that exploits the problem structure to efficiently com-bine multiple update directions to obtain both optimalglobal linear convergence rate and local quadratic conver-gence. Experimental results show that our method outper-forms state-of-the-art first- and second-order optimizationmethods in terms of the number of data accesses, while iscompetitive in training time.

Po-Wei WangCarnegie Mellon [email protected]

Ching-Pei LeeUniversity of [email protected]

Chih-Jen LinNational Taiwan [email protected]

MS56

Nonlinear Model Predictive Control on a Hetero-geneous Computing Platform

Model Predictive Control (MPC) is a computationallydemanding control technique that allows explicit perfor-mance optimization and systematic constraint handling.Although traditionally MPC algorithms have been imple-mented on general-purpose CPU-based machines, recentdevelopments in reconfigurable computing resulted in high-speed and resource-efficient hardware implementations ofMPC. We present an implementation of a nonlinear MPCon a heterogeneous embedded processer that combines ad-vantages of conventional CPUs and reconfigurable plat-forms, namely Field-Programmable Gate Arrays (FPGAs).We also present a method for scheduling sparse matrix-

Page 74: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 163

vector multiplications in a Lanczos kernel to enable signif-icant improvements in terms of the computation time vsresource usage when computing a search direction. A casestudy is used to demonstrate that a heterogeneous imple-mentation can accelerate a pure software realization by afactor of 10, while achieving a 10x memory saving com-pared to existing approaches.

Bulat KhusainovImperial [email protected]

Eric C. KerriganImperial College [email protected]

Andrea SuardiImperial College LondonElectrical and Electronic [email protected]

George ConstantinidesImperial College [email protected]

MS56

Parallel Temporal Decomposition for Long-TermProduction Cost Model in Power Grid

Abstract not available at time of publication.

Kibaek KimMathematics and Computer Science DivisionArgonne National [email protected]

MS56

Algorithms for Large-Scale Mixed Integer Pro-gramming

Abstract not available at time of publication.

Ted RalphsDepartment of Industrial and Systems EngineeringLehigh [email protected]

MS56

Comparison between ug[PIPS-SBB, MPI] and Par-allel PIPS-SBB on Large Scale Distributed Mem-ory Computing Environment

In this talk, we present and compare two frameworks fordistributed-memory branch-and-bound (B&B) tree search.Both of these are implemented as extensions of PIPS-SBB,which is already a parallel distributed-memory stochasticMIP solver based on the distributed memory stochastic LPsolver PIPS-S. As result, both frameworks include two lev-els of distributed-memory parallelism, for solving LP re-laxations and for B&B tree search. The first frameworkrelies on UG, part of the SCIP project, and implements anexternal coarse parallelization of PIPS-SBB. The secondframework, Parallel PIPS-SBB implements a novel internalfine-grained parallelization of PIPS-SBB. We present com-putational results that evaluate the effectiveness of bothframeworks.

Yuji Shinano

Zuse Institute BerlinTakustr. 7 14195 [email protected]

Lluis Miquel MunguiaGeorgia Institute of [email protected]

Geoffrey M. Oxberry, Deepak RajanLawrence Livermore National [email protected], [email protected]

MS57

Cubic Regularization For Symmetric Rank-1 andNonlinear Conjugate Gradient Methods

Regularization techniques have been used to help existingalgorithms solve ”difficult” nonlinear optimization prob-lems. Over the last decade, regularization has been pro-posed to remedy issues with equality constraints and equi-librium constraints, bound Lagrange multipliers, and iden-tify infeasible problems. In this talk, we will focus on theapplication of cubic regularization in the context of thesymmetric rank-one and conjugate gradient methods fornonlinear programming.

Hande Y. BensonDrexel UniversityDepartment of Decision [email protected]

David ShannoRutgers University (Retired)[email protected]

MS57

Quadratic Regularization with Cubic Descent forUnconstrained Optimization

Cubic-regularization and trust-region methods with worst-case first-order complexity O(ε−3/2) and worst-casesecond-order complexity O(ε−3) have been developed inthe last few years. In this paper it is proved that thesame complexities are achieved by means of a quadratic-regularization method with a cubic sufficient-descent con-dition instead of the more usual predicted-reduction baseddescent. Asymptotic convergence and order of convergenceresults are also presented. Finally, some numerical experi-ments comparing the new algorithm with a well-establishedquadratic regularization method are shown.

Ernesto G. BirginDepartment of Computer Science - IMEUniversity of Sao [email protected]

Jose Mario [email protected]

MS57

Primal-Dual Interior Methods for Nonlinear Opti-mization

Regularization and stabilization are vital tools for resolvingthe numerical and theoretical difficulties associated withill-posed or degenerate optimization problems. Broadly

Page 75: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

164 OP17 Abstracts

speaking, regularization involves perturbing the underly-ing linear equations so that they are always nonsingular.Stabilized methods are designed to provide a sequence of it-erates with fast local convergence even when the gradientsof the constraints satisfied at a solution are linearly de-pendent. We discuss the crucial role of regularization andstabilization in the formulation and analysis of modern in-terior methods for nonlinear optimization. In particular,we establish the close relationship between regularizationand stabilization, and propose a new interior method basedon formulating an associated “simpler” optimization sub-problem defined in terms of both primal and dual variables.

Philip E. GillUniversity of California, San DiegoDepartment of [email protected]

Vyacheslav KungurtsevElectical Engineering Department, KU [email protected]

Daniel RobinsonJohns Hopkins [email protected]

MS57

Implementation of an Active Set Algorithm forNonlinear Optimization with Polyhedral Con-straints

An active set algorithm is developed for solving an opti-mization problem whose feasible set is a polyhedron. Thegradient projection algorithm is used to identify active con-straints, while the conjugate gradient algorithm is used toaccelerate convergence. The algorithm exploits a recentlydeveloped scheme for projection onto a polyhedron, andmakes special adjustments when a loss of local convexityis detected.

James D. DiffenderferUniversity of [email protected]

William HagerUniversity of FloridaDepartment of [email protected]

Hongchao ZhangDepartment of Mathematics and CCTLouisiana State [email protected]

MS58

Topology and Geometry of Deep Rectified NetworkOptimization Landscapes

The loss surface of deep neural networks has recently at-tracted interest in the optimization and machine learningcommunities as a prime example of non-convex problem.Some insights were recently gained using spin glass models,but at the expense of strongly simplifying the nonlinearnature of the model. In this work, we do not make anysuch assumption and study conditions that prevent the ex-istence of bad local minima. We first take a topologicalapproach by studying the connectedness of the level sets.

We formalize two important facts: (i) the landscape of deeplinear networks has a radically different topology from thatof deep half-rectified ones, and (ii) that the landscape fornon-linear case is fundamentally controlled by the interplaybetween the smoothness of the data distribution and modelover-parametrization. Our main result is that half-rectifiedsingle layer networks are asymptotically connected, and weprovide explicit bounds that reveal the aforementioned in-terplay. The conditioning of GD is the next challenge weaddress. We study this question through the geometry ofthe level sets, and we introduce an algorithm to efficientlyestimate the regularity of such sets. Our empirical resultsshow that these level sets remain connected throughout allthe learning phase, suggesting a nearly convex behavior,but they become exponentially more curvy as the energylevel decays, in accordance to what is observed in practicewith very low curvature attractors.

Joan BrunaNew York UniversityCourant [email protected]

Daniel FreemanUC [email protected]

MS58

Advances in Nonconvex Optimization for DeepLearning

In this talk I will discuss new developments in non-convexoptimization for deep neural networks. I will review ourwork along with new developments showing that neuralnets do not significantly suffer from local minima unlikepreviously thought. In fact, the loss surface of neural netsexhibit a particular structure with saddle points leadingthe way to near-optimal local minima

Yann DauphinFacebook AI [email protected]

MS58

On the Rate of Convergence for Online PrincipalComponent Estimation and Tensor Decomposition

Principal component analysis (PCA) and tensor com-ponent analysis has been a prominent tool for high-dimensional data analysis. Online algorithms that esti-mate the component by processing streaming data are oftremendous practical and theoretical interests. In this talk,we cast these methods into stochastic nonconvex optimiza-tion problems, and we analyze the online algorithms as astochastic approximation iteration. This talk is dividedinto two parts. In the first half, we first attempt to under-stand the dynamics of stochastic gradient method via or-dinary and stochastic differential equation approximations.In the second half, we prove for the first time a nearly opti-mal convergence rate result for both online PCA algorithmand online tensor decomposition. We show that the finite-sample error closely matches the corresponding results ofminimax information lower bound.

Chris Junchi LiPrinceton University

Page 76: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 165

[email protected]

MS58

When Are Nonconvex Optimization Problems NotScary?

General nonconvex optimization problems are NP-hard. Inapplied disciplines, however, nonconvex problems abound,and heuristic algorithms are often surprisingly effective.The ability of nonconvex heuristics to find high-quality so-lutions for practical problems remains largely mysterious.In this talk, I will describe a family of nonconvex prob-lems which can be solved efficiently. This family has thecharacteristic structure that (1) every local minimizer isalso global, and (2) the objective function has a nega-tive directional curvature around all saddle points (rid-able saddle). Natural nonconvex formulations for a num-ber of important problems in signal processing and ma-chine learning lie in this family, including the eigenvectorproblem, complete dictionary learning (CDL), generalizedphase retrieval (GPR), orthogonal tensor decomposition,and various synchronization problems. This benign geo-metric structure allows a number of optimization methodsto efficiently find a global minimizer, without special ini-tializations. This geometric analysis has helped to producenew computational guarantees for several practical prob-lems, such as CDL and GPR.To complete and enrich the framework is an ongoing re-search effort. I will highlight challenges from both theoret-ical and algorithmic sides. Joint work with Qing Qu andJohn Wright. See my PhD thesishttp://sunju.org/pub/docs/thesis.pdffor details.

Ju SunStanford [email protected]

Qing QuColumbia UniversityElectrical [email protected]

John WrightDepartment of Electrical EngineeringColumbia [email protected]

MS59

Newton-Type Methods Under Inexact Information

Second-order methods have been studied much less thanthe first-order ones due to their computational burden ateach iteration. In this talk, we present Newton-type meth-ods with cubic regularization for solving (strongly) convexoptimization problems using noisy information of the ob-jective function. We provide rates of convergence of thesemethods and show that they perform better in practicethan their exact counterparts and optimal first-order meth-ods.

Saeed GhadimiPrinceton [email protected]

Tong [email protected]

Han LiuPrinceton [email protected]

MS59

On the Efficient Computation of a Generalized Ja-cobian of the Projector Over the Birkhoff Polytope

In this talk, we derive an explicit formula, as well as anefficient procedure, for constructing a generalized Jacobianfor the projector of a given square matrix onto the Birkhoffpolytope, i.e., the set of doubly stochastic matrices. Toguarantee the high efficiency of our procedure, a semis-mooth Newton method for solving the dual of the pro-jection problem is proposed and efficiently implemented.Extensive numerical experiments are presented to demon-strate the merits and effectiveness of our method by com-paring its performance against other powerful solvers suchas the commercial software Gurobi and the academic codePPROJ [Hager and Zhang, 2016]. In particular, our algo-rithm is able to solve the projection problem with over onebillion variables and nonnegative constraints to a very highaccuracy in less than 15 minutes on a modest desktop com-puter. In order to further demonstrate the importance ofour procedure, we also propose a highly efficient augmentedLagrangian method (ALM) for solving a class of convexquadratic programming (QP) problems constrained by theBirkhoff polytope. The resulted ALM is demonstrated tobe much more efficient than Gurobi in solving a collectionof QP problems arising from the relaxation of quadraticassignment problems.

Xudong [email protected]

Defeng SunDept of MathematicsNational University of [email protected]

Kim-Chuan TohNational University of SingaporeDepartment of [email protected]

MS59

Hyperparameter Tuning of Machine Learning Al-gorithms Using Stochastic Derivative Free Opti-mization

Abstract not available at time of publication.

Katya ScheinbergT.J. Watson Research Center, IBM, P.O.Box 218Yorktown Heights, NY [email protected]

MS59

Inexact Proximal Stochastic Gradient Method forConvex Composite Optimization

We study an inexact proximal stochastic gradient (IPSG)method for convex composite optimization, whose objec-tive function is a summation of an average of a large num-ber of smooth convex functions and a convex, but possiblynonsmooth, function. The variance reduction technique isincorporated in the method to reduce the stochastic gradi-

Page 77: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

166 OP17 Abstracts

ent variance. The main feature of this IPSG algorithm is toallow solving the proximal subproblems inexactly while stillkeeping the global convergence with desirable complexitybounds. Di?erent accuracy criteria are proposed for solvingthe subproblem, under which the global convergence andthe component gradient complexity bounds are derived forthe both cases when the objective function is strongly con-vex or generally convex. Preliminary numerical experimentshows the overall e?ciency of the IPSG algorithm.

Xiao WangUniversity of Chinese Academy of [email protected]

Shuxiong WangChinese Academy of Sciences, [email protected]

Hongchao ZhangDepartment of Mathematics and CCTLouisiana State [email protected]

MS60

MILP Relaxations for Instationary Gas NetworkOptimization Problems

We present an new MINLP model for instationary gas net-work optimization problems using time-expanded graphsand discretization of time. Furthermore we describe amethod solving the obtained MINLP based on constructingMILP relaxations of the underlying MINLP by PiecewiseLinear Functions. Both the theoretical background andnumerical results are treated.

Robert BurlacuFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment Mathematik, [email protected]

MS60

Global Optimization of Ideal Multi-component Dis-tillation Columns

Many separation tasks in chemical engineering are basedon distillation. The determination of an optimal design forsuch separation processes often requires solving a mixed-integer non-convex optimization problem to global opti-mality, and is, hence, very challenging, in general. Froma practical point of view, solving those problems within areasonable time (if possible) is of particular interest. Thismakes it necessary to develop problem-specific optimiza-tion techniques that exploit the underlying physical andmathematical properties of the considered processes andmodel formulations. In this talk, a reformulation for idealmulti-component distillation column processes is consid-ered. The reformulation is based on a suitable linear trans-formation of the concentration variables that yield mono-tonic concentration profiles. The monotonic behavoir espe-cially allows to design a problem-specific bound-tighteningstrategy that can be used to solve the underlying mixed-integer non-convex optimization problems. Numerical ex-amples are presented, demonstrating the usefulness of thereformulation and the bound-tightenting strategy.

Dennis MichaelsTechnische Universitat DortmundFakultat fur [email protected]

Nick MertensTechnische Universitat Dortmund Fakultat [email protected]

Achim KienleOtto-von-Guericke-Universitat MagdeburgInstitut fur [email protected]

Christian [email protected]

MS60

Complexity of MINLP from Gas Network Opti-mization

We discuss the complexity of various gas network opti-mization problems. The main result is that validation ofnomination in presence of compressors or valves is weaklyNP-complete for series-parallel graphs and strongly NP-complete on general graphs. This is exactly what is pre-dicted by the classical results on dynamic programmingalgorithms for these problems which are widely applied inpractice. We also show how the results can be used to pre-process these problems to make them easier solvable usingstandard solvers. These results also generalize to potentialdriven flow problems, e.g., in water networks.

Lars ScheweFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment Mathematik, [email protected]

Martin GroTU [email protected]

Marc PfetschResearch Group Optimization, Department ofMathematicsTU [email protected]

Martin SkutellaTU [email protected]

Martin SchmidtFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment [email protected]

MS60

Approximate Convex Decomposition in Gas Net-work Optimization

In some optimization problems, the feasible region of theconstraint set can be described as the union of convex poly-topes and thus forms itself a nonconvex polytope. Knownapproaches to treat such feasible regions involve Disjunc-tive Programming or the use of binary indicator variablesfor each of the original convex polytopes. We apply a de-composition method known as Approximate Convex De-composition to such sets in 2 or 3 dimensions in order toobtain relaxations that can be hierarchically refined whenrequired. Thereby, each hierarchy level yields a decompo-

Page 78: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 167

sition that reduces some measure of concavity until a spec-ified tolerance threshold is attained. We will illustrate ourapproach with the example of optimizing the operationaldetails of gas compressor stations.

Tom WaltherZuse Institute [email protected]

Benjamin HillerZuse Insitut [email protected]

MS61

The Case for Competition: Efficient Computationof Equilibria

Equilibrium problems are widespread in the economics andphysical literatures, and often involve interactions betweencompeting players or physical systems. These include ex-amples of generalized Nash Equilibria and Multiple Opti-mization Problems with Equilibrium Constraints. WhileExtended Mathematical Programming provides a conve-nient mechanism to describe these interactions, and to pro-cess them into models for solution by existing complemen-tarity and optimization methods, they are often limited byan assumption of exclusive control or knowledge. In manycases, shared constraints, linked multipliers, and sharedimplicit variables are necessary to model behavior withinthe equilibrium. We describe a framework that allows eachof these features within competitive equilibrium problemsand show how to exploit the structure in efficient compu-tational schemes. Several large scale applications to elec-tricity planning will be demonstrated that show the powerof the framework and the speed of solution.

Michael C. FerrisUniversity of WisconsinDepartment of Computer [email protected]

Youngdae KimComputer Science DepartmentUniversity of Wisconsin [email protected]

MS61

A Full-Newton Step Interior Point Method for Suf-ficient LCP

An improved Infeasible Full Newton-step Interior-Pointmethod for P∗(κ) Linear Complementarity Problem (LCP)is considered. For most methods of this type each itera-tion consists of one feasibility step and a few centeringsteps while in this version of the algorithm each iterationconsists of only one feasibility step. This improvement hasbeen achieved by a much tighter estimate of the proximitymeasure after the feasibility step. Furthermore, althoughthe analysis of the algorithm depends on the constant κ,the algorithm itself does not; thus, the method is suitablefor the class of sufficient LCPs. However, the best iterationbound known for these types of methods is still achieved.

Goran LesajaGeorgia Southern UniversityDepartment of Mathematical [email protected]

Florian A. Potra

University of Maryland Baltimore CountyDepartment of Mathematics and [email protected]

MS61

Inexact Interior Point Methods for Complementar-ity Problems

We consider Primal-dual Inexact Interior point methods forlinear complementarity problems. At each iteration, com-puting the step requires the solution of a linear system, andthe computational cost can be reduced by using iterativemethods where the accuracy of the solution is related to thequality of the Interior Point iterate. We focus on symmet-ric indefinite KKT formulations for this system and on theaccuracy of the inexact steps for different stopping criteriaproposed in the literature. Our aim is to analyze the effectof inexactness in the late stage of the Interior Point methodwhen the linear systems become severely ill-conditioned.

Benedetta MoriniDipartimento di Ingegneria IndustrialeUniversita’ di [email protected]

Valeria SimonciniUniversita’ di [email protected]

MS61

Weighted Complementarity Problems and Appli-cations

The weighted complementarity problem (wCP) is a newparadigm in applied mathematics that provides a unifyingframework for analyzing and solving a variety of equilib-rium problems in economics, multibody dynamics, atmo-spheric chemistry and other areas in science and technol-ogy. It represents a far reaching generalization of the no-tion of a complementarity problem (CP). Since many ofthe very powerful CP solvers developed over the past twodecades can be extended to wCP, formulating an equilib-rium problem as a wCP opens the possibility of devisinghighly efficient algorithms for its numerical solution. Forexample, Fisher’s competitive market equilibrium modelcan be formulated as a wCP, while the Arrow-Debreu com-petitive market equilibrium problem (due to Nobel prizelaureates Kenneth Joseph Arrow and Gerard Debreu) canbe formulated as a self-dual wCP. The talk will show howequilibrium problems can be formulated as wCP and willpresent an interior point method that can efficiently solvesuch problems. Several properties of wCP will be analyzed.

Florian A. PotraUniversity of Maryland Baltimore CountyDepartment of Mathematics and [email protected]

MS62

Euclidean Distance Matrix Optimization Model forOrdinal Embedding

When the coordinates of a set of points are known, thepairwise Euclidean distances among the points can be eas-ily computed. Conversely, if the Euclidean distance matrixis given, a set of coordinates for those points can be com-puted through the well known classical Multi-DimensionalScaling (MDS). In this paper, we consider the case where

Page 79: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

168 OP17 Abstracts

some of the distances are far from being accurate (con-taining large noises or even missing). In such a situation,the order of the known distances (i.e., some distances arelarger than others) is valuable information that often yieldsfar more accurate construction of the points than just us-ing the magnitude of the known distances. The meth-ods making use of the order information is collectivelyknown as non-metric MDS. A challenging computationalissue among all existing nonmetric MDS methods is thatthere are often a large number of ordinal constraints. Inthis paper, we cast this problem as a matrix optimizationproblem with ordinal constraints. We then adapt an ex-isting smoothing Newton method to our matrix problem.Extensive numerical results demonstrate the efficiency ofthe algorithm, which can potentially handle a very largenumber of ordinal constraints.

Qingna LiSchool of Mathematics and StatisticsBeijing Institute of [email protected]

Houduo QiUniversity of [email protected]

MS62

Sparse Recovery via Partial Regularization

In the context of sparse recovery, it is known that mostof existing regularizers such as L1 suffer from some biasincurred by some leading entries (in magnitude) of the as-sociated vector. We propose a class of models with partialregularizers to neutralize this bias for recovering sparse vec-tors. We show that every local minimizer of these modelsis sufficiently sparse or the magnitude of all its nonzeroentries is above a uniform constant depending only on thedata of the linear system. Moreover, for a class of par-tial regularizers, any global minimizer of these models isa sparsest solution to the linear system. We also estab-lish some sufficient conditions for local or global recoveryof the sparsest solution to the linear system, among whichone of the conditions is weaker than the best known re-stricted isometry property (RIP) condition for sparse re-covery by L1. In addition, a first-order feasible augmentedLagrangian (FAL) method is proposed for solving thesemodels, in which each subproblem is solved by a nonmono-tone proximal gradient (NPG) method. The global con-vergence of this method is also established. Numerical re-sults on compressed sensing and sparse logistic regressiondemonstrate that the proposed models substantially out-perform the widely used ones in the literature in terms ofsolution quality.

Zhaosong LuDepartment of MathematicsSimon Fraser [email protected]

Xiaorui LiSimon Fraser [email protected]

MS62

The Sparsest Solutions to Z-Tensor Complemen-tarity Problems

Finding the sparsest solutions to a tensor complementar-ity problem is generally NP-hard due to the non-convexity

and non-continuity of the involved ell0 norm. In this talk,a special type of tensor complementarity problems with Z-tensors has been considered. Under some mild conditions,we show that to pursuit the sparsest solutions is equivalentto solving polynomial programming with a linear objectivefunction. The involved conditions guarantee the desired ex-act relaxation and also allow to achieve a global optimal so-lution to the relaxed nonconvex polynomial programmingproblem. Particularly, in comparison to existing exact re-laxation conditions, such as RIP-type ones, our proposedconditions are easy to verify.

Ziyan LuoState key laboratory of rail traffic control and safety, [email protected]

MS62

A Generalized Alternating Direction Method ofMultipliers with Semi-Proximal Terms for ConvexComposite Conic Programming

In this talk, we propose a generalized alternating directionmethod of multipliers (ADMM) with semi-proximal termsfor solving a class of convex composite conic optimizationproblems, of which some are high-dimensional, to moderateaccuracy. Our primary motivation is that this method, to-gether with properly chosen semi-proximal terms, such asthose generated by the recent advance of symmetric Gauss-Seidel technique, is applicable to tackling these problems.Moreover, the proposed method, which relaxes both theprimal and the dual variables in a natural way with onerelaxation factor in the interval (0, 2), has the potential ofenhancing the performance of the classic ADMM. Exten-sive numerical experiments on various doubly non-negativesemidefinite programming problems, with or without in-equality constraints, are conducted. The corresponding re-sults showed that all these multi-block problems can besuccessively solved, and the advantage of using the relax-ation step was apparently observed.

Yunhai XiaoCollege of Mathematics and Statistics, Henan [email protected]

Liang ChenHunan [email protected]

Donghui LiSouth China Normal [email protected]

MS63

Risky Robust Optimization in Machine Learningand Support Vector Machines

Robust Optimization has proven to be an intuitive andfruitful framework for optimization under uncertainty.Typically, Robust Optimization approaches uncertaintyfrom a pessimistic, or worst-case, viewpoint and can beoverly conservative. We diverge from this tradition, andshow that an optimistic, or best-case, viewpoint of uncer-tainty should not be ignored. We show that this view-point can yield Robust Optimization problems with ad-vantageous properties, particularly w.r.t. sparsity of theresulting decision vector. We focus our analysis on RobustSupport Vector Machines, and show that this viewpointhelps to unify convex and non-convex formulations, help-ing to also explain the success of non-convex regularization

Page 80: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 169

for inducing sparsity in hyperplane selection.

Matthew NortonUniversity of [email protected]

MS63

The Application of Stochastic Optimization inChemical Engineering Process

In this talk, we discuss how to model a few importantchemical engineering processes using multi-stage stochasticprogramming. We then present stochastic first-order algo-rithms and establish their rate of convergence for solvingthese types of problems. Some preliminary numerical re-sults will also be presented to demonstrate the effectivenessof these algorithms.

Zhaohui Tong, Hanxi BaoUniversity of [email protected], [email protected]

Guanghui LanGeorgia Institute of [email protected]

MS63

Statistical Inversion and Risk Averse Control forAdditive Manufacturing Processes

We demonstrate an optimization process in which sta-tistical inversion methods determine dynamical-model co-efficients with corresponding statistical characterizations,which are in turn mapped to a risk-averse optimization for-mulation to solve for a control strategy. Appropriate riskmeasures are selected to achieve an a priori determinedperformance strategy. This approach is demonstrated ona direct write additive manufacturing process in which thegoal is to infer model parameters and control several stagesof the process to achieve material property targets. Directwrite consists of slurry preparation, printing and sinteringto develop ceramic components and therefore the challengeis to not only solve the coupled inversion-control problembut to solve this coupled problem for multiple stages. Fur-thermore, uncertainties need to be propagated throughoutthese stages. PDE-constrained optimization methods serveas the foundation for this work with finite element dis-cretizations, adjoint-based sensitivities, trust-region meth-ods, and Newton-Krylov solvers. Our final AM producedparts must achieve tight tolerances for a range of differentmaterial properties. Accordingly, a range of risk measuresare considered with a specific emphasis on reliability. Anumerical example demonstrates that the use of risk mea-sures as part of the objective function results in optimalsolutions.

Bart G. Van Bloemen WaandersSandia National [email protected]

MS64

Nonconvex Low-Rank Estimation: Robustness andLinear Convergence

Many problems in statistics and machine learning involvefitting a low-rank matrix to noisy data. The resultingrank-constrained optimization problem is often solved us-ing semidefinite relaxation, which enjoys strong statistical

guarantees but does not scale well to large-scale datasets.Recently, nonconvex approaches — such as gradient de-scent over the low-rank space — have emerged as a com-putationally efficient alternative for low-rank estimationproblems. We develop a unified framework characteriz-ing the convergence behaviors of this nonconvex approachand the statistical errors of the resulting fixed points. Ourresults provide insights on why it is expected to work ingeneral, and provide linear convergence rates. When spe-cialized to specific problems, our general theory yields con-crete guarantees for matrix completion, robust PCA, com-munity detection, sparse PCA, matrix sensing, 1-bit ma-trix completion, and others. For these problems noncon-vex methods enjoy near-linear running time, have statis-tical performance matching (sometimes beating) the bestknown results achieved by SDP relaxation, and are robustto gross corruption in data. Moreover, our framework ap-plies to projected gradient descent for constrained formu-lations, and does not require unrealistic assumptions suchas sample splitting.

Yudong ChenUniversity of California, BerkeleyDepartment of Electrical Engineering and [email protected]

MS64

Faster Projection-Free Convex Optimization withStructured Matrices

Many important problems in machine learning and sig-nal processing can be directly formulated as minimiza-tion of a non-convex function over a certain set of ma-trices. Fortunately, many of these problems admit naturaland well-studied convex relaxations with strong theoreti-cal guarantees. Prime examples include the problems ofLow-rank Matrix Completion and Multi-class Classifica-tion. On the downside, algorithms that are applicable forsolving these convex relaxations either converge fast, butrequire prohibitive SVD computations on each iteration(i.e., orthogonal projection), or rely only on cheap singleeigenvector computations, but converge with an inferiorrate. In this talk I will present a new variant of the clas-sical Frank-Wolfe method that relies only on cheap eigen-vector computations. I will show that the new methodenjoys (roughly) a quadratic improvement in convergencerate over the standard method, under an additional strong-convexity assumption.

Dan [email protected]

MS64

Dropping Convexity for Faster Semidefinite Opti-mization

Abstract not available at time of publication.

Sujay SanghaviElectrical and Computer EnigneeringUniversity of Texas at [email protected]

MS64

Nonconvex Optimization and Nonlinear Models for

Page 81: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

170 OP17 Abstracts

Matrix Completion

Most recent results in matrix completion assume that thematrix under consideration is low-rank. If the matrix isp × n, this model is equivalent to assuming we observe npoints in a p-dimensional ambient space that lie along alow-dimensional hyperplane. In real-world settings, how-ever, observations are more accurately modeled as lyingalong a low-dimensional nonlinear submanifold. In thistalk, I will describe two approaches to matrix completionin the presence of such nonlinearities. The first approachmodels the submanifold as a monotonic transformation ofa hyperplane – that is, each element of the matrix is amonotonic function of the corresponding entry of an un-derlying low-rank matrix. I will describe a novel matrixcompletion method that alternates between low-rank ma-trix estimation and monotonic function estimation to esti-mate the missing matrix elements. The second approachuses a piecewise linear approximation to the submanifoldby modeling the n points as lying in a union of low-ranksubspaces. I will describe a novel method for subspace clus-tering with missing data and matrix completion which isbased on group-sparsity and alternating minimization.

Rebecca WillettUniversity of [email protected]

MS65

Stochastic Variance Reduction Methods for PolicyEvaluation

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function thatpredicts states’ long-term value under a given policy. Inthis work, we focus on policy evaluation with linear func-tion approximation over a fixed dataset. We first transformthe empirical policy evaluation problem into a (quadratic)convex-concave saddle point problem, and then present aprimal-dual batch gradient method, as well as two stochas-tic variance reduction methods for solving the problem.These algorithms scale linearly in both sample size and fea-ture dimension. Moreover, they achieve linear convergenceeven when the saddle-point problem has only strong con-cavity in the dual variables but no strong convexity in theprimal variables. Numerical experiments on benchmarkproblems demonstrate the effectiveness of our methods.

Simon DuMachine Learning Department, Carnegie [email protected]

Jianshu Chen, Lihong Li, Lin XiaoMicrosoft [email protected], [email protected],[email protected]

Dengyong ZhouMicrosoft Research, [email protected]

MS65

Primal-Dual Reinforcement Learning

We consider the online estimation of the optimal policyestimation of Markov decision processes. We propose aStochastic Primal-Dual (SPD) method which exploits theinherent minimax duality of Bellman equations. SPD up-

dates a few coordinates of the value and policy estimatesas state transitions are sampled. We show that the SPDhas superior space and computational complexity and itfinds with high probability an ε-optimal policy using nearoptimal samples and iterations.

Yichen ChenDepartment of Computer SciencePrinceton [email protected]

Mengdi WangDepartment of Operations Research and FinancialEngineeringPrinceton [email protected]

MS65

A Generalized Reduced Linear Program forMarkov Decision Processes

The efficient computation of near-optimal policies inMarkov Decision Processes (MDPs) is of major interestin various scientific and engineering applications. Oneapproach of major interest is to combine linear functionapproximation with linear programming, leading to whatis known as ”Approximate Linear Programming” (ALP).While ALP allows for a compact representation of valuefunctions, the number of constraints in the standard ALPformulation is still intractable. One way to overcome thisis to reduce the number of constraints. We provide a newanalysis of a generalized version of the resulting reducedALP that complements previous results. In particular,as opposed to previous results, the new analysis allowsus to derive a policy error bound that is applicable re-gardless of how the constraints are selected and shows agraceful degradation that connects features, the optimalvalue function and the constraints selected in an easy-to-interpret fashion, suggesting specific ways of reducing theconstraints, while avoiding the knowledge of the stationarydistribution of the optimal policy.

Chandrashekar Lakshmi NarayananDepartment of Computing ScienceUniversity of [email protected]

Shalabh BhatnagarDepartment of Computer Science and AutomationIndian Institute of [email protected]

Csaba SzepesvariDepartment of Computing ScienceUniversity of [email protected]

MS65

Robust Optimization for Data-Limited Reinforce-ment Learning

One of the goals of robust reinforcement learning is tocompute good policies from limited data with high confi-dence. We develop and analyze new practical model-basedapproaches for limited-data reinforcement learning basedon robust Markov decision processes (RMDP). StandardRMDP methods are based on formulations with rectan-gular uncertainty sets. We show that rectangular modelsare not appropriate in common data-limited settings and

Page 82: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 171

propose new non-rectangular formulations. Most of theseformulations are, unfortunately, NP-hard. We develop ap-proximate approximate algorithms and empirically demon-strate their advantages over regular methods in several do-mains, including energy arbitrage in smart grids.

Marek PetrikDepartment of Computer ScienceUniversity of New [email protected]

MS66

Smoothing for Improved Worst-Case CompetitiveRatio in Online Optimization

In Online Optimization, the data in an optimization prob-lem is revealed over time, and at each step a decision vari-able needs to be set without knowing the future data. Thissetup covers online resource allocation, from classical in-ventory problems to the ‘Adwords’ problem popular in on-line advertising. We discuss several algorithms for a classof online convex optimization problems. Our focus is onthe algorithms’ competitive ratio, i.e., the ratio of the ob-jective achieved by the algorithm to that of the optimaloffline sequence of decisions, for worst-case input data. Wediscuss bounds on this ratio for a primal-dual greedy al-gorithm, show how smoothing the objective can improvethis bound, and for separable functions, how to seek theoptimal smoothing by solving a convex optimization prob-lem. This approach allows us to design effective smoothing,customized for a given cost function.

Maryam FazelUniversity of WashingtonElectrical [email protected]

MS66

Automating the Analysis and Design of Large-ScaleOptimization Algorithms

First-order iterative algorithms such as gradient descent,fast/accelerated methods, and operator-splitting methodssuch as ADMM can be viewed as discrete-time dynamicalsystems. We will show that if the function being optimizedis strongly convex, for example, computing the worst-caseperformance of a particular algorithm is equivalent to solv-ing a robust control problem. This amounts to establishingfeasibility of a small semidefinite program whose size is in-dependent of the dimension of the function’s domain. Ourunified approach allows for the efficient and automatic eval-uation of worst-case performance bounds for a wide varietyof popular algorithms. The bounds derived in this man-ner either match or improve upon the best known boundsfrom the literature. Finally, our framework can be used tosearch for algorithms that meet desired performance spec-ifications, thus establishing a new and principled method-ology for designing new algorithms.

Laurent LessardUniversity of [email protected]

MS66

Randomization Vs. Acceleration

Abstract not available at time of publication.

Pablo A. Parrilo

Massachusetts Institute of [email protected]

MS66

Low-Recurrence Width Polynomial Families in Lin-ear Algebra and Optimization

Abstract not available at time of publication.

Christopher ReDepartment of Computer ScienceStanford [email protected]

MS67

Application of SOCP in Power Efficient Networks

In this paper, a mixed-integer second-order cone program-ming (MISOCP) model based on robust optimization isproposed to achieve power saving in the networks. Most ofthe researches in the literature to minimize the power in thenetworks are based on exact traffic demands, but in prac-tice, estimating traffic demands exactly is a difficult task,because traffic demands fluctuate in general. In our work,taking account of some fluctuations in the estimated traf-fic demands, we apply the robust optimization techniqueto obtain an MISOCP. The objective is to minimize thepower consumption in the networks by deactivating someunnecessary links. Numerical experiments are presented tocompare our model with other models in the literature.

Bimal C. Das, Eiji Oki, Masakazu MuramatsuThe University of [email protected], [email protected], [email protected]

MS67

A Lifted-Polyhedral-Programming Approach forOptimal Contribution Problems

An important issue in tree breeding is to improve tree per-formance genetically in seed orchard without losing geneticdiversity. The optimal contribution selection is classifiedinto two types; unequal and equal problems. While theproportion of genes in unequal problems are not neces-sarily same, the proportion in equal problem should beequal to 0 or 1

Nfor a certain integer N . [Pong-Wong et.al,

2007] introduced semidefinite programming (SDP) to ob-tain an accurate value, but it demands long computationtime. [Yamashita et al., 2015] then proposed a second-order cone programming (SOCP) approach to reduce com-putation time. Since the methods are employed to solvethe unequal contribution selection, we should develop newmethods for the equal contribution selection problem. Ex-tending the notion of the SOCP approach, the equal prob-lem is much harder than the unequal problem. Extend-ing the notion of the SOCP approach, the equal contribu-tion selection can be described with mixed-integer SOCP(MISOCP) problems which involve both second-order coneconstraints and integer constraints. In order to solve theequal selection, we develop a relaxation approach usingLifted Polyhedral Programming proposed by Ben-Tal andNemirovski. Numerical results show that this approach iseffective compared with existing methods.

Sena SafarinaTokyo Institute of [email protected]

Page 83: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

172 OP17 Abstracts

Makoto YamashitaTokyo Institute of TechnologyDept. of Mathematical and Computing [email protected]

MS67

Constrained Penalized Spline Estimation bySecond-Order Cone Programming

We consider regression by B-splines with a penalty on high-order finite differences of the coefficients of adjacent B-splines. The penalty prevents overfitting. The underlyingfunction is assumed to be nonnegative. The model is castedas a second-order cone programming problem, which canbe solved efficiently by modern optimization techniques.The method is implemented in MATLAB.

Yu XiaLakehead [email protected]

Farid AlizadehRUTCOR and School of Business,Rutgers [email protected]

MS67

A Steep-Ascend Method for MI-SOCP Arisingfrom Tree Breeding

An optimal contribution problem arising from tree breed-ing is an optimization problem to determine the contribu-tion of candidate genotypes for seed orchards. A diversityconstraint to improve a long-term performance of the seedorchards is a second-order cone constraint, and each con-tribution should be an integer as an equally deployment.Therefore, the optimal contribution problem can be de-scribed as a mixed-integer SOCP problem. In this talk,we discuss LP, SOCP and SDP relaxations of the opti-mal contribution problem. In particular, we build an effi-cient SOCP relaxation by exploiting the sparsity and thestructure of the diversity constraint. We also consider thetightness of the SDP relaxation. Furthermore, we proposea steep-ascent method to generate a feasible solution fromthe three relaxation. We move the second-order constraintto the objective function as a penalty term and implementa heuristic method to search a feasible solution with a highlong-term performance. Numerical results including dataof Scot Pine show that the steep-ascent method combinedwith the SOCP relaxation obtained a favorable solution ina short time.

Makoto YamashitaTokyo Institute of TechnologyDept. of Mathematical and Computing [email protected]

Tim MullinThe Swedish Forestry Research Institute (Skogforsk)[email protected]

Sena SafarinaTokyo Institute of [email protected]

MS68

Conic Infimum and Supremum and Some Applica-

tions

Finding the minimum or the maximum of a set of numbersis probably the most widely used basic operation in opti-mization algorithms. We study the extension of these op-erations to partial orders induced by proper cones. Specif-ically, we define the notion of infimum and supremum of aset of points in Euclidean space with respect to the partialorder induced by a proper cone. We examine conditionsunder which these concepts are well-defined and present in-fimum and supremum induced by concrete examples suchas nonnegative polynomial, second order, semidefinite coneand polyhedral cones. Some applications of conic infimumand supremum, particularly in generalization of networkflow problems will be discussed.

Farid AlizadehRUTCOR and School of Business,Rutgers [email protected]

MS68

On Positive Duality Gaps in Semidefinite Program-ming

In semidefinite programming (SDP) one often encounterspositive duality gaps, i.e., the situation when the primaland dual optimal values differ. We present a systematicstudy of positive duality gaps, and generate a library ofSDPs with positive duality gaps. Our instances turn out tobe extremely challenging for both commercial and researchsolvers.

Gabor PatakiUniversity of North Carolina, at Chapel [email protected]

MS68

Dimension Reduction for SDP

We propose a new method for simplifying semidefinite pro-grams (SDP) inspired by symmetry reduction. Specifically,we show if an orthogonal projection satisfies certain invari-ance conditions, restricting to its range yields an equiv-alent primal-dual pair over a lower-dimensional symmet-ric cone—namely, the cone-of-squares of a Jordan subal-gebra of symmetric matrices. We then give a simple algo-rithm for minimizing the rank of this projection and hencethe dimension of this cone. Through the theory of Jor-dan algebras, the proposed method easily extends to linearprogramming, second-order cone programming, and, moregenerally, symmetric cone optimization

Frank Permenter, Pablo A. ParriloMassachusetts Institute of [email protected], [email protected]

MS68

On Facial Structure of Convex Sets

Whilst faces of a polytope form a well structured lattice,in which faces of each possible dimension are present, thelatter property is not true for general compact convex sets.We address the question of which dimensional patterns arepossible for the faces of general closed convex sets. Weshow that for any finite sequence of positive integers thereexist compact convex sets which only have extreme pointsand faces with dimensions from this prescribed sequence.

Page 84: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 173

We also discuss another approach to dimensionality, con-sidering the dimension of the union of all faces of the samedimension. We show that the questions arising from thisapproach are highly nontrivial and give examples of con-vex sets for which the sets of extreme points have fractaldimension.

Vera RoshchinaFederation University [email protected]

MS69

Exploiting Active Subspaces in Nonconvex Opti-mization of Physics-Based Models

Given a function of several variables, the active subspaceis the span of a set of directions in the function’s domainalong which variable perturbations change the function’svalue the most, on average. I will discuss how to exploitan nonconvex objective function’s active subspace to accel-erate minimization.

Paul ConstantineColorado School of [email protected]

MS69

Deep Learning Meets Differential Equations

Abstract not available at time of publication.

Eldad HaberDepartment of MathematicsThe University of British [email protected]

MS69

Fast Solvers for Optimization Problems Con-strained by PDEs with Uncertain Inputs

Optimization problems constrained by deterministic sta-tionary partial differential equations (PDEs) are computa-tionally challenging. This is even more so if the constraintsare deterministic unsteady PDEs since one would then needto solve a system of PDEs coupled globally in time andspace, and time-stepping methods quickly reach their lim-itations due to the enormous demand for storage. Yet,more challenging than the afore-mentioned are problemsconstrained by unsteady PDEs involving (countably many)parametric or uncertain inputs. This class of problemsoften leads to prohibitively high dimensional saddle-pointsystem with tensor product structure, especially when dis-cretized with the stochastic Galerkin finite element method(SGFEM). Moreover, a typical model for an optimal con-trol problem with stochastic inputs (SOCP) will usually beused for the quantification of the statistics of the systemresponse – a task that could in turn result in additionalenormous computational expense. In this talk, we con-sider two prototypical model SOCPs and discretize themwith SGFEM. We derive and analyze tensor-based, low-rank iterative solvers for the resulting stochastic Galerkinsystems. The developed solvers are quite efficient in the re-duction of temporal and storage requirements of the high-dimensional linear systems. Finally, we illustrate the effec-tiveness of our solvers with numerical experiments.

Akwum OnwuntaMax Planck Institute, Magdeburg, [email protected]

Peter BennerMax Planck Institute for Dynamics of Complex TechnicalSystems, Magdeburg, [email protected]

Sergey DolgovUniversity of BathDepartment of Mathematical [email protected]

Martin StollMax Planck Institute, [email protected]

MS69

Semidefinite Programming Relaxations and Multi-grid Methods in Optimal Control

We discuss recent developments for solving SemidefiniteProgramming relaxations of optimal control problems us-ing multigrid techniques. Multigrid methods are knownto lead to efficient algorithms for optimal control prob-lems. However, the SDP relaxation of the model does notmaintain the geometric information present in the originalmodel. Despite this issue, we discuss theoretical and algo-rithmic developments that allow us to take advantage ofthe inherent structure of optimal control problems.

Panos ParpasImperial College [email protected]

MS70

Stochastic Variance Reduction Methods forSaddle-Point Problems

We consider convex-concave saddle-point problems wherethe objective functions may be split in many components,and extend recent stochastic variance reduction methods toprovide the first large-scale linearly convergent algorithmsfor this class of problems which is common in machinelearning. While the algorithmic extension is straightfor-ward, it comes with challenges and opportunities: (a) theconvex minimization analysis does not apply and we usethe notion of monotone operators to prove convergence,showing in particular that the same algorithm applies toa larger class of problems, such as variational inequalities,(b) the split does need to be done with convex-concaveterms.

Francis Bach

INRIA - Ecole Normale [email protected]

Balamurugan PalaniappanSIERRA Project Team, INRIA, Paris, [email protected]

MS70

On a Scaled ε-Subgradient Method with AdaptiveStepsize Rule

A wide range of image reconstruction problems [Bertero etal, 2009] requires that the functional to be minimized hasa non differentiable term. A general formulation for suchproblems is

minx∈Rn

f(x) + Φ(x)

Page 85: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

174 OP17 Abstracts

where f,Φ : Rn → R ∪ {∞} are convex, proper, lsc func-tions and domΦ ⊂ dom f. In this work we generalize theForward-Backward (FB) scheme, considering in the for-ward step the ε–subgradient of f and in the backward stepa variable metric. The proposed scheme can be written as

xk+1 = proxαkΦ,D−1

k

(xk − εkDku

k)

where Dk is a symmetric pos def matrix with boundedeigenvalues, uk ∈ ∂εkf(uk) for εk ≥ 0 and αk is chosenwith an adaptive rule (another option is to set it via apriori selected divergent, square summable sequence). Be-side the theoretical generalization, the proposed methodcan be numerically efficient with suitable choices of Dk

and αk. We describe a concrete example [Bonettini et al,2016] arising in image deblurring from Poisson data, wheref(x) = KL(Hx+ b;g) +βTV (x), with KL the generalizedKullback–Leibler functional, g ∈ Rn the detected image,

corrupted by Poisson noise, H ∈ Rn2

the blurring opera-tor, b ∈ R+ a background term; TV is the Total Variationfunctional, β > 0 the regularization parameter and Φ theindicator function of the nonnegative orthant.

Silvia BonettiniDipartimento di Matematica e Informatica

Universit{a} di [email protected]

Alessandro BenfenatiUniversite Paris-Est Marne-la-ValleeLaboratoire d’Informatique Gaspard Monge (LIGM)[email protected]

Valeria RuggieroDipartimento di Matematica e [email protected]

MS70

Duality Based Iterative Regularization Techniquesfor Inverse Problems

Many applied problems in science and engineering can bemodelled as noisy inverse problems. The high dimensionalsize of such problems requires the development of ever moreefficient numerical procedures to solve them. In this talk,we will focus on iterative regularization techniques, wherethe number of iterations controls both the computationalcomplexity of the method, and its regularization proper-ties. Key in our approach will be the use of duality tech-niques. Numerical results showing state of the art perfor-mance will be also discussed.

Guillaume GarrigosIstituto Italiano di [email protected]

Lorenzo RosascoInstituto Italiano di Technologia andMassachusetts Institute of [email protected]

Silvia VillaDipartimento di MatematicaPolitecnico di [email protected]

MS70

Stochastic Forward-Douglas-Rachford Splitting for

Monotone Inclusions

We propose a stochastic Forward-Douglas-Rachford Split-ting framework for finding a zero point of the sum of threemaximally monotone operators in real separable Hilbertspace, where one of the operators is cocoercive. We char-acterize the rate of convergence in expectation in the caseof strongly monotone operators. We provide guidance onstep-size sequences that achieve this rate, even if the strongconvexity parameter is unknown. Finally, we present nu-merical examples from convex optimization with multipleconstraints and/or regularizers that support the effective-ness of our method.

Volkan Cevher, Bang Cong Vu, Alp [email protected], [email protected],[email protected]

MS71

Estimating Clarke Subgradients of Non-RegularIntegrands by Smoothing

Abstract not available at time of publication.

James V. BurkeUniversity of WashingtonDepartment of [email protected]

MS71

A New Class of Matrix Support Functionals withApplications

Abstract not available at time of publication.

Tim HoheiselMcGill UniversityMontreal, QC, [email protected]

MS71

Convergence of a Scholtes-Type RelaxationMethod for Optimization Problems with Cardinal-ity Constraints

Optimization problems with cardinality constraints haveapplications for example in portfolio optimization. How-ever, due to the discrete-valued cardinality constraint, theyare not easy to solve. We provide a continuous reformula-tion of cardinality constraints and discuss the convergenceof a Scholtes-type relaxation method for the resulting non-linear programs with orthogonality constraints. Further-more, we show preliminary numerical results for portfoliooptimization problems with different risk measures.

Alexandra SchwartzTU DarmstadtGraduate School [email protected]

Michal Cervinka, Martin BrandaCzech Academy of Sciences, Charles University [email protected], [email protected]

Max BucherTU Darmstadt

Page 86: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 175

[email protected]

MS72

A Gaussian Process Trust-Region Method forDerivative-Free Nonlinear Constrained StochasticOptimization

We present the algorithm SNOWPAC for derivative-freeconstrained stochastic optimization. The algorithm buildson a model-based approach for deterministic nonlinear con-strained DFO that introduces an “inner boundary path”to locally convexify the feasible domain and ensure feasi-ble trial steps. We extend this deterministic method via ageneralized trust region approach that accounts for noisyevaluations of the objective and constraints. To reduce theimpact of noise, we fit consistent Gaussian process mod-els to past evaluations. Our approach incorporates a widevariety of probabilistic risk or deviation measures in boththe objective and the constraints. We demonstrate the effi-ciency of the algorithm using several numerical benchmarksand comparisons.

Youssef M. Marzouk, Florian AugustinMassachusetts Institute of [email protected], [email protected]

MS72

DFO/STORM Approaches to Machine LearningSettings

In previous work, we had proposed an algorithmic frame-work STORM (STochastic Optimization using RandomModels), an adaptation of trust-region methods, to solveunconstrained stochastic optimization problems and al-most sure convergence was proven under fairly general con-ditions on the stochasticity. In particular, we assumed thaton a given iteration, fully-linear models of an unknownground truth function could only be constructed with somefixed probability (bounded away from 1), and likewise es-timates of a current iterate and trial iterate could only becomputed with error in O(δ2k) with some fixed probability(bounded away from 1), where δk denotes a trust regionradius parameter. In this talk, we will discuss how thiscan be applied to various settings in machine learning. Inparticular, we will consider a trust-region algorithm thatuses batch stochastic gradients (and Hessians) for model-building and inexact zeroth-order function evaluations ofcommon loss functions for classification tasks. We will alsoconsider an algorithm within the STORM framework thatuses only inexact zeroth-order information for the mini-mization of a 01-loss function.

Matt MenickellyLehigh [email protected]

MS72

Some Convergence Rate Results for Derivative-Based and Derivative-Free Stochastic Optimization

We consider unconstrained optimization problems whereonly “stochastic” estimates of the objective function areobservable as replicates from a Monte Carlo oracle. Wepresent ASTRO and ASTRO-DF — derivative-based andderivative-free implementations of a class of trust-regionalgorithms, where a stochastic local interpolation modelis constructed, optimized, and updated iteratively. Func-tion estimation and model construction within ASTRO and

ASTRO-DF are adaptive in the sense that the extent ofMonte Carlo sampling is determined by continuously mon-itoring and balancing metrics of sampling error (or vari-ance) and structural error (or model bias). ASTRO andASTRO-DF converge with probability one; more interest-ingly, the asymptotic relationship between sampling andthe trust region radii is characterizable in each case, shed-ding light on algorithm efficiency.

Raghu PasupathyISE DepartmentVirginia [email protected]

Sara ShashaaniUniversity of [email protected]

MS72

Direct Search based on Probabilistic Feasible De-scent for Bound and Linearly Constrained Prob-lems

Direct-search methods form a popular class of derivative-free optimization algorithms, that explore the objectivefunction along suitably chosen sets of directions. Underthe presence of constraints, such sets must conform to thegeometry of the feasible set and a number of determin-istic ways of generating them have been proposed in theliterature. However, recent new developments in the un-constrained setting have shown that generating the pollingdirections randomly, ensuring a form of probabilistic de-scent, can lead to significant gains in efficiency, and stilldeliver global convergence at appropriated rates with re-assuring probabilities. In this talk, we describe a direct-search scheme for solving linearly constrained optimizationproblems, based on randomly generating directions thatguarantee probabilistic feasible descent, in a generaliza-tion from the unconstrained case. By applying martingale-type arguments to assess the quality of the directions usedthroughout the algorithms, we are able to prove global con-vergence with probability one as well as convergence rateswith overwhelming probability. Such rates or worst com-plexity bounds indicate, as in the unconstrained case, pos-sible interesting gains over the deterministic counterparts.Numerical experiments comparing our approaches to exist-ing deterministic direct-search solvers seem to also supportsuch prospects.

Serge GrattonENSEEIHT, Toulouse, [email protected]

Clement W. RoyerWisconsin Institute for DiscoveryUniversity of [email protected]

Luis N. Vicente, Zaikun ZhangUniversity of [email protected], [email protected]

MS73

Stochastic Nonlinear Programming for Wind Tur-bine Control

We present a stochastic programming formulation to opti-mize wind turbine pitch and torque controllers to maximizeextracted power from uncertain wind profiles and while

Page 87: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

176 OP17 Abstracts

satisfying a probabilistic constraint on the maximum long-term mechanical stress experienced by the turbine. Theprobabilistic constraint matches stress profiles to a Gum-bel distribution to predict long-term failure probabilities.

Yankai CaoUniversity of [email protected]

Fernando D’AmatoGeneral Electric [email protected]

Victor ZavalaUniversity of [email protected]

MS73

Logical Benders Decomposition for Quadratic Pro-grams with Complementarity Constraints and Bi-nary Variables

We study a logical Benders decomposition approach tosolving binary-constrained quadratic programs with linearcomplementarity constraints. It is based on a satisfiabil-ity master problem to which feasibility cuts are added thatare computed from primal and dual subproblems for cho-sen complementarity pieces and binary variables. Inter-preting the logical Benders decomposition approach froma branch-and-bound point of view, we propose several newmethods for strengthening the feasibility cuts and guidingthe master problem solution. Their efficiency is assessedby numerical experiments.

Francisco Jara-MoroniNorthwestern [email protected]

Andreas WaechterNorthwestern UniversityIEMS [email protected]

Jong-Shi PangUniversity of Southern [email protected]

John E. MitchellRensselaer Polytechnic Institute110, 8th Street, Troy, NY, [email protected]

MS73

New Solution Approaches for the Maximum Reli-ability Stochastic Network Interdiction Problem

We investigate new methods to solve the maximum-reliability stochastic network interdiction problem (SNIP).To begin, we introduce a new extensive formulation that issignificantly more compact than the previously introducedextensive form. We then propose a new path-based for-mulation, which can be solved by delayed constraint gen-eration of paths and cuts derived from submodularity. Weapply results from a recent paper by Ahmed and Atam-turk for a structured submodular set to obtain strongervalid inequalities for this formulation. These cuts are thenembedded within a branch-and-cut (BC) algorithm. Com-

putational results demonstrate that the proposed compactformulation and new BC algorithm are significantly moreefficient than existing methods on a set of test instancesfrom the literature.

Eli TowleUniversity of [email protected]

James LuedtkeUniversity of Wisconsin-MadisonDepartment of Industrial and Systems [email protected]

MS73

Solving chance-constrained problems using a kernelVaR estimator

We present a reformulation of nonlinear optimization prob-lems with chance constraints that replaces a probabilisticconstraint by a nonparametric value-at-risk estimator. Theestimator smooths the historical VaR via a kernel, result-ing in a better approximation of the quantile and reducingthe nonconvexity of the feasible region. An optimizationalgorithm is proposed that permits the efficient treatmentof joint chance constraints. Theoretical and empirical con-vergence properties are presented.

Andreas Waechter, Alejandra Pena-OrdieresNorthwestern UniversityIEMS [email protected], [email protected]

MS74

Stochastic Derivative Free Optimization for Hyper-parameter Tuning in Machine Learning Problems

The performance of many machine learning algorithms de-pends on model hyperparameters and thus on the methodwhich sets these hyperparameters. In particular, hyperpa-rameter optimization can be applied to tune model relatedhyperparameters of convex machine learning problems(kernel SVM, logistic regression), deep learning (DNN,DBN), and optimizing nonsmooth nonconvex loss func-tions, such as AUC. This class of optimization problemscan be viewed as a blackbox optimization problem, whichcan be considered as a (possibly smooth) stochastic opti-mization problem. In this work, we utilize the Trust Re-gion based Derivative Free Optimization (DFO-TR) as ahyperparameter optimization method. We present compu-tational comparison of DFO-TR method with the state-of-the-art methods such as some model-based Bayesian op-timization algorithms as well as the traditional randomsearch approach.

Hiva Ghanbari, Katya ScheinbergLehigh [email protected], [email protected]

MS74

A Levenberg-Marquardt Method for Large-ScaleNoisy Nonlinear Least Squares Problems

Nonlinear least squares problems arise in many applica-tions. We are going to consider the case in which we don’thave at disposal the exact value of the function, of the gra-dient and of the Jacobian matrix, but just approximations

Page 88: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 177

are known. We propose a Levenberg-Marquardt methodthat is suitable to deal with the noise in the function, gra-dient and Jacobian and guarantees global convergence toa solution of the unperturbed problem as the noise leveltends to zero. Our procedure starts with a given noiselevel and we propose a mechanism to measure if the noisein the function values is too high to achieve a good reduc-tion in the unperturbed objective function. In that case weassume to be able to reduce the noise. At each iterationan inexact Levenberg-Marquardt step is taken, and thismakes the method suitable also to large-scale problems.Numerical results will be provided.

Stefania BellaviaUniversita di [email protected]

Serge GrattonENSEEIHT, Toulouse, [email protected]

Elisa RicciettiUniversity of [email protected]

MS74

Beyond SGD: Faster Stochastic Methods for Non-convex Optimization

We study nonconvex finite-sum problems (Empirical RiskMinimization) and analyze stochastic variance reduced(VR) methods for solving them efficiently. Variance reduc-tion methods have recently surged into prominence for con-vex optimization, especially due to their edge over stochas-tic gradient (SGD); but their theoretical analysis has al-most exclusively assumed convexity. In contrast, we provenon-asymptotic rates of convergence (to stationary points)of SVRG (and SAGA) for nonconvex optimization, andshow that these methods converge provably faster thanSGD and even batch gradient descent. We also analyzea subclass of nonconvex problems on which SVRG attainslinear convergence to the global optimum. Our analysisextends, albeit is non-obvious ways to handle nonsmooth,constrained, and parallel (mini-batch) variants — time per-mitting, we will comment on these and point out very cu-rious open problems relating to these methods.

Suvrit SraMassachusetts Institute of [email protected]

Sashank ReddiCarnegie Mellon [email protected]

MS74

Exact Worst-Case Performance of First-orderMethods: Recent Developments

We introduce the performance estimation approach. Thismethodology aims at automatically analyzing the con-vergence properties of first-order algorithms for solving(composite convex) optimization problems. In particular,it allows obtaining tight guarantees for fixed-step first-order methods involving a variety of different oracles -namely explicit, projected, proximal, conditional and in-exact (sub)gradient steps - and a variety of convergencemeasures. During the presentation, we will present the

methodology and illustrate how it can be used for devel-oping new algorithms. On the way, we will present a tool-box allowing to easily use the approach for studying simplefirst-order methods.

Adrien TaylorUCL, [email protected]

Julien HendrickxUniversite catholique de [email protected]

Francois GlineurUniversite Catholique de Louvain (UCL)Center for Operations Research and Econometrics(CORE)[email protected]

MS75

A SMART Stochastic Algorithm for NonconvexOptimization with Applications to Robust MachineLearning

In this paper, we show how to transform any optimizationproblem that arises from fitting a machine learning modelinto one that (1) detects and removes contaminated datafrom the training set while (2) simultaneously fitting thetrimmed model on the uncontaminated data that remains.To solve the resulting nonconvex optimization problem,we introduce a fast stochastic proximal-gradient algorithmthat incorporates prior knowledge through nonsmooth reg-ularization. For datasets of size n, our approach requiresO(n2/3/ε) gradient evaluations to reach ε-accuracy and,when a certain error bound holds, the complexity improvesto O(κn2/3 log(1/ε)). These rates are n1/3 times betterthan those achieved by typical, full gradient methods.

Damek DavisCornell [email protected]

MS75

Improving the Optimized Gradient Method forLarge-Scale Convex Optimization

We recently proposed the optimized gradient method(OGM) [Math. Prog., 151:83-107, Sep. 2016] that is asefficient as Nesterov’s fast gradient method (FGM), andsatisfies a worst-case cost-function convergence bound thatis twice smaller than that of FGM, and that is optimalfor large-dimensional smooth convex problems. Here, wepresent two approaches to improving and extending theOGM. First, we adapt an adaptive restarting scheme orig-inally developed for FGM to heuristically improve the con-vergence rate of OGM, particularly when the iterates en-ter a locally well-conditioned region. Second, we proposea new algorithm called OGM-OG (optimized over gradi-ent) by optimizing the step coefficients with respect to therate of gradient norm decrease, whereas we optimized theoriginal OGM with respect to the cost function decrease.

Donghwan Kim, Jeffrey FesslerUniversity of Michigan

Page 89: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

178 OP17 Abstracts

[email protected], [email protected]

MS75

Accelerated Primal-Dual Method for Affinely Con-strained Problems

Motivated by big data applications, first-order methodshave been extremely popular in recent years. However,naive gradient methods generally converge slowly. Hence,much efforts have been made to accelerate various first-order methods. This talk presents two accelerated meth-ods towards solving structured linearly constrained com-posite convex programming. The first method is the accel-erated linearized augmented Lagrangian method (LALM).Assuming merely weak convexity, we show that LALMowns O(1/k2) if its parameters are adapted. The secondmethod is the accelerated linearized alternating directionmethod of multipliers (LADMM). In addition to the com-posite convexity, it further assumes two-block structure.Different from classic ADMM, our method allows lineariza-tion to the objective and also augmented term to make theupdate simple. Assuming strong convexity on one blockvariable, we show that LADMM also enjoys O(1/k2) con-vergence with adaptive parameters. This result is a signifi-cant improvement over that in [Goldstein et. al, SIIMS’14],which requires strong convexity on both block variablesand no linearization to the objective or augmented term.Numerical experiments will be shown to demonstrate thevalidness of acceleration and superior performance of theproposed methods over existing ones.

Yangyang XuUniversity of [email protected]

MS75

Primal-Dual Algorithms for the Sum of Three Op-erators

In this talk, I will introduce existing primal-dual algorithmsfor minimizing f(x) + g(x) + h(Lx), where f is convex andLipschitz differentiable, g and h are convex (possibly non-differentiable), and L is a linear operator. Then, I willpropose a new primal-dual algorithm, which is a general-ization of two primal-dual algorithms for solving the sumof two operators. In addition, the new algorithm recoversmany operator splitting methods involving two and threeoperators.

Ming YanMichigan State UniversityDepartment of CMSE; Department of [email protected]

MS76

A Copositive Approach for Two-Stage AdjustableRobust Optimization with Uncertain Right-HandSides

We study two-stage adjustable robust linear programmingin which the right-hand sides are uncertain and belong to aconvex, compact uncertainty set. This problem is NP-hard,and the affine policy is a popular, tractable approxima-tion. We prove that under standard and simple conditions,the two-stage problem can be reformulated as a copositiveoptimization problem, which in turn leads to a class oftractable, semidefinite-based approximations that are atleast as strong as the affine policy. We investigate sev-

eral examples from the literature demonstrating that ourtractable approximations significantly improve the affinepolicy. In particular, our approach solves exactly in poly-nomial time a class of instances of increasing size for whichthe affine policy admits an arbitrarily large gap.

Guanglin XuDepartment of Management SciencesThe University of [email protected]

Samuel BurerUniversity of IowaDept of Management [email protected]

MS76

Approximation Guarantees for Monomial Convex-ification in Polynomial Optimization

Consider minimizing a polynomial function over a compactconvex set. Convexifying each monomial separately yieldsa lower bound on the global optimum. For any monomialwhose domain is a subset of the [0, 1]n box, we give upperbounds on the approximation error produced by the clo-sure convex hull of this monomial. The error measure isthe maximum absolute deviation between the actual andapproximatedproduct value. Our bounds are functions ofthe monomial degree. Special structures of the domain forwhich our bounds are tight are also analyzed. For a mul-tilinear monomial over the [1, r]n box, where r is a posi-tive real, we give refined error bounds. As a step towardsaddressing mixed-sign variable domains, we provide errorbounds for a multilinear monomial over the [−1, 1]n box.All the above anaylses together imply an upper bound onthe additive error in the global optimum due to monomialconvexifications of a polynomial.

Akshay GupteDepartment of Mathematical SciencesClemson [email protected]

Warren Adams, Yibo XuClemson [email protected], [email protected]

MS76

The Multilinear Polytope for Gamma-acyclic Hy-pergraphs

We consider the Multilinear polytope defined as the con-vex hull of the set of binary points satisfying a collectionof multilinear equations. Such sets are of fundamental im-portance in many types of mixed-integer nonlinear opti-mization problems, such as binary polynomial optimiza-tion. Utilizing an equivalent hypergraph representation,we study the facial structure of the Multilinear polytopein conjunction with the acyclicity degree of the underlyinghypergraph. We provide explicit characterizations of theMultilinear polytopes corresponding to Berge-acylic andgamma-acyclic hypergraphs. As the Multilinear polytopefor gamma-acyclic hypergraphs may contain exponentiallymany facets in general, we present a highly efficient polyno-mial algorithm to solve the separation problem. Our resultsimply polynomial solvability of the corresponding classes ofbinary polynomial optimization problems and provide newtypes of cutting planes for a variety of mixed-integer non-

Page 90: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 179

linear optimization problems.

Aida KhajaviradIBM TJ Watson research [email protected]

Alberto Del PiaUniversity of [email protected]

MS76

Low-Complexity Relaxations and Convex Hulls ofDisjunctions on the Positive Semidefinite Cone andOther Regular Cones

In this talk we consider linear two-term disjunctions on aregular cone K. The resulting disjunctive sets provide fun-damental non-convex relaxations for mixed-integer conicprograms. We develop a family of structured convex in-equalities that are valid for a disjunctive set of this form.Under mild assumptions on the choice of disjunction, theseinequalities collectively describe the closed convex hull ofthe disjunctive set in the space of the original variables,and if additional conditions are satisfied, a single inequalityfrom this family is adequate for a complete closed convexhull characterization. In the cases where the cone K is thepositive semidefinite cone or a direct product of second-order cones and nonnegative rays, we show that these in-equalities admit equivalent conic representations for cer-tain choices of disjunction. For more general disjunctions,we present low-complexity convex relaxations with favor-able tightness properties in the space of the original vari-ables. Along the way, we also establish connections be-tween two-term disjunctions and non-convex sets definedby rank-two quadratics.

Sercan YildizUniversity of North Carolina at Chapel [email protected]

Fatma Kilinc-KarzanTepper School of BusinessCarnegie Mellon [email protected]

MS77

Global Multi-Objective Optimization and an Ap-plication to Robust Optimization

For globally solving multi-objective optimization problems,so far parameter dependent scalarizations have to be solvediteratively by a global procedure. We present here a directapproach based on a branch and bound scheme and on con-vex underestimators. A new discarding test is presentedwhich uses outer approximations of convex multi-objectiveoptimization problems. We apply these techniques also forcomputing a covering of the solutions of a robust multiob-jective optimization problem. There, decision uncertaintyis taken into account by considering to each variable allpossible realizations and the correspondent objective func-tion values. By choosing a robust approach this leads to aspecial set optimization problem.

Gabriele EichfelderInstitute of MathematicsTechnische Universitat [email protected]

Julia Niebling

Technische Universitat [email protected]

MS77

Multicriteria Tradeoff Analysis for Optimizationunder Uncertainty

We present a theoretical analysis and propose new method-ological approaches to compute improved tradeoff solu-tions for optimization problems under uncertainty. Ourresults use several concepts from multiobjective program-ming including scalarizations and extended notions of ro-bustness and proper efficiency. Their potential impact willbe demonstrated on applications of financial portfolio se-lection and price-aware electric vehicle charging.

Alexander EngauUniversity of Colorado, Denver, [email protected]

MS77

Vector Optimization Methods for Solving MatrixCompletion Problems

Motivated by applications in machine learning, e.g. the wellknown Netflix Challenge, we are interested in the followingproblem: Given an incomplete m×n-data matrix M , whichentries are only known for indices (i, j) ∈ Ω, we are lookingfor a completion X ∈ R

m×n of M . As it is often thecase in real-world problems, we also demand that X has aspecial structure. For example, let us assume X has to bea sparse and low-rank matrix. Therefore, we can formulatethe following matrix optimization problem

minX

∑(i,j)∈Ω

[Xij −Mij ]2 + λ‖X‖∗ + μ‖X‖1.

Here, ‖ · ‖∗ denotes the nuclear norm which promotes low-rank solutions, and ‖ · ‖1 denotes the �1-norm which pro-motes the sparsity of X. The quality of a solution for thisproblem, with respect to the underlying application, highlydepends on a good choice of the regularization parametersλ and μ. In this talk, we will discuss the parameter choicefrom the viewpoint of vector optimization.

Christopher SchneiderFriedrich-Schiller-University of [email protected]

MS77

Indifference Pricing via Convex Vector Optimiza-tion

When the preferences are incomplete, they can be repre-sented by vector valued utility functionals. In this case, theutility maximization problem is a convex vector optimiza-tion problem. We allow utility functions to be multivariateand we define the utility buy and sell prices as set valuedfunctions of the claim. The set-valued buy and sell pricesrecover the complete preference case and they satisfy somemonotonicity and convexity properties as expected. It ispossible to approximate these set valued prices by solvingconvex vector optimization problems.

Firdevs UlusBilkent [email protected]

Birgit Rudloff

Page 91: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

180 OP17 Abstracts

Vienna University of Economics and [email protected]

MS78

MM Algorithms for Mixture Modeling and Robust,Structured Regression

We present a computational algorithm for performing mix-ture modeling and robust, structured regression using theL2E method, a minimum distance estimator. Previous im-plementations for the method were limited to fitting only ahandful of parameters. We introduce an iterative majoriza-tion framework for extending the L2E method proposed inScott (2001, 2009) to handling high-dimensional models.We demonstrate our algorithm on simulated and real dataexamples.

Eric ChiDepartment of StatisticsNorth Carolina State Universityeric [email protected]

MS78

An Overview of MM Algorithms

This talk will survey the history, theory, and applicationsof the MM principle, a framework for constructing mono-tone optimization algorithms in high-dimensional models.The MM principle transfers optimization from the objec-tive function to a surrogate function and simpli es mattersby: (a) separating the variables of a problem, (b) avoid-ing large matrix inversions, (c) linearizing a problem, (d)restoring symmetry, (e) dealing with equality and inequal-ity constraints gracefully, and (f) turning a nondi erentiableproblem into a smooth problem. The art in devising anMM algorithm lies in choosing a tractable surrogate func-tion that hugs the objective function as tightly possible.The EM principle from statistics is a special case of theMM principle. Modern mathematical themes such as spar-sity and parallelization mesh well with the MM principle.

Kenneth [email protected]

MS78

The MM Principle for Split Feasibility Problems

We present a majorization-minimization algorithm for thenon-linear split feasibility problem. The classical multi-setsplit feasibility problem seeks a point in the intersectionof finitely many closed convex domain constraints, whoseimage under a linear mapping also lies in the intersectionof finitely many closed convex range constraints. Split fea-sibility generalizes important inverse problems includingconvex feasibility, linear complementarity, and regressionwith constraint sets. When a feasible point does not exist,methods that proceed by minimizing a proximity functioncan be used to obtain optimal approximate solutions tothe problem. Our work extends the proximity function ap-proach for the linear case to allow for non-linear mappings.Our algorithm is amenable to quasi-Newton acceleration,and comes complete with convergence guarantees undermild assumptions, and applies to proximity functions interms of arbitrary Bregman divergences. We explore sev-eral examples including regression for constrained gener-alized linear models and rank-restricted matrix regression,

and consider a case study in optimization for intensity-modulated radiation therapy.

Jason [email protected]

MS78

MM Algorithms For Variance Components Models

Variance components estimation and mixed model analysisare central themes in statistics with applications in numer-ous scientific disciplines. Despite the best efforts of gen-erations of statisticians and numerical analysts, maximumlikelihood estimation and restricted maximum likelihoodestimation of variance component models remain numer-ically challenging. In this talk, we present a novel itera-tive algorithm for variance components estimation basedon the minorization-maximization (MM) principle. MMalgorithm is trivial to implement and competitive on largedata problems. The algorithm readily extends to morecomplicated problems such as linear mixed models, multi-variate response models possibly with missing data, max-imum a posteriori estimation, and penalized estimation.We demonstrate, both numerically and theoretically, thatit converges faster than the classical EM algorithm whenthe number of variance components is greater than two.

Hua [email protected]

MS79

Conic Programming Reformulations of Two-StageDistributionally Robust Linear Programs overWasserstein Balls

Adaptive robust optimization problems are usually solvedapproximately by restricting the adaptive decisions to sim-ple parametric decision rules. However, the correspondingapproximation error can be substantial. In this paper weshow that two-stage robust and distributionally robust lin-ear programs can often be reformulated exactly as conicprograms that scale polynomially with the problem dimen-sions. Specifically, when the ambiguity set constitutes a2-Wasserstein ball centered at a discrete distribution, thenthe distributionally robust linear program is equivalent to acopositive program (if the problem has complete recourse)or can be approximated arbitrarily closely by a sequence ofcopositive programs (if the problem has sufficiently expen-sive recourse). These results directly extend to the classicalrobust setting and motivate strong tractable approxima-tions of two-stage problems based on semidefinite approx-imations of the copositive cone. We also demonstrate thatthe two-stage distributionally robust optimization problemis equivalent to a tractable linear program when the am-biguity set constitutes a 1-Wasserstein ball centered at adiscrete distribution and there are no support constraints.

Grani A. HanasusantoOperations Research and Industrial EngineeringThe University of Texas at [email protected]

Daniel KuhnEcole Polytechnique Federale de Lausanne

Page 92: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 181

[email protected]

MS79

Appointment Scheduling under Schedule-Dependent Patient No-Show Behavior

This paper studies an appointment scheduling problem un-der schedule- dependent patient no-show behavior. Theproblem is motivated by our studies of independentdatasets from countries in two continents which identifya significant time-of-day effect on patient show-up proba-bilities. We deploy a distributionally robust model, whichminimizes the worst case total expected cost of patientwaiting and service provider’s idle and overtime, by op-timizing the scheduled arrival times of patients. We showthat this model under schedule- dependent patient show-upbehavior can be reformulated as a copositive program andthen be approximated by semidefinite programs. Theseformulations are obtained by a new technique that usesa completely positive program to equivalently represent alinear program with uncertainties present in both the ob-jective function and the right-hand side of the constraintsets. To tackle the case when patient no-shows are en-dogenous on the schedule, we construct a set of dual pricesto guide the search for a good schedule and use the tech-nique iteratively to obtain a near optimal solution. Ourcomputational studies reveal a significant reduction in to-tal expected cost by taking into account the time-of-dayvariation in patient show-up probabilities as opposed toignoring it.

Chung-Piaw TeoDepartment of Decision SciencesNUS Business [email protected]

Qingxia KongErasmus University [email protected]

Shan LiCity University of New [email protected]

Nan LiuColumbia [email protected]

Zhenzhen YanNational University of [email protected]

MS79

A Data-Driven Distributionally Robust Bound onthe Expected Optimal Value of Uncertain Mixed0-1 Linear Programming

We study the expected optimal value of a mixed 0-1 pro-gramming problem with uncertain objective coefficients fol-lowing a joint distribution. We assume that the true dis-tribution is not known exactly, but a set of independentsamples can be observed. Using the Wasserstein metric,we construct an ambiguity set centered at the empiricaldistribution from the observed samples and containing alldistributions that could have generated the observed sam-ples with a high statistical confidence. The problem of in-terest is to investigate the bound on the expected optimalvalue over the Wasserstein ambiguity set. Under standard

assumptions, we reformulate the problem into a copositiveprogramming problem, which naturally leads to a tractablesemidefinite-based approximation. We compare our ap-proach with a moment-based approach from the literaturefor two applications. The numerical results illustrate theeffectiveness of our approach.

Guanglin XuDepartment of Management SciencesThe University of [email protected]

Samuel BurerUniversity of IowaDept of Management [email protected]

MS79

Distribution-Free Robust Optimization with ShapeConstraints

One of the major critiques of the distributional free ro-bust bounds is the discrete nature of its optimum distribu-tion. Increasing failure rate (IFR) and log-concave are twowidely accepted and used distribution classes in various re-search domains. Even though the corresponding math pro-gramming problems involving such distributions are highlynon-convex, we establish an optimal solution structure viaanalyzing a relaxed problem. Furthermore, we develop nu-merical toolbox for optimally solving the distributional freemoment problems with IFR and log-concave distributionassumptions.

Jiang BoShanghai University of Finance and [email protected]

Xi ChenNew York [email protected]

Simai HeShanghai University of Finance and [email protected]

Christopher T. RyanUniversity of [email protected]

Teng ZhangStanford [email protected]

MS80

Guarantees for Subspace Learning by IncrementalGradient Descent on the Grassmannian

It has been observed in a variety of contexts that gradi-ent descent methods have great success in solving low-rankmatrix factorization problems, despite the relevant prob-lem formulation being non-convex. The Singular ValueDecomposition (SVD) is the solution to a non-convex op-timization problem, and there are several highly successfullinear-algebraic algorithms for solving it. Unfortunately,these algorithms cannot easily be extended to problemswith regularizers or missing data. In this talk I will fo-cus on the problem where we seek the d-dimensional sub-space spanned by a streaming data matrix. We apply the

Page 93: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

182 OP17 Abstracts

natural first order incremental gradient descent method,constraining the gradient method to the Grassmannian. Iwill discuss these algorithms and theoretical results in thestandard l2 norm subspace learning problem with noisy orundersampled data. Additionally I will present algorithmsand empirical results for the robust PCA and sparse PCAproblems. (Joint work with Dejiao Zhang.)

Laura BalzanoElectrical Engineering DeptUniversity of [email protected]

MS80

Nonconvex Statisitcal Optimization: Global Explo-ration Versus Local Exploitation

Taming nonconvexity requires a seamless integration ofboth global exploration and local exploitation of latentconvexity. In this talk, we study a generic latent variablemodel that includes PCA, ICA, and dictionary learning asspecial cases. Using this statistical model as illustration,we propose two global exploration schemes for identifyingthe basin of attraction, namely tightening after relaxationand noise regularization, which allow further local exploita-tion via gradient-based methods. Interestingly, given a suf-ficiently good initialization, the local exploitation methodsare often information-theoretic optimal. In sharp contrast,based on an oracle computational model we prove that, onemust pay a computational or statistical price in the globalexploration stage to attain a good initialization (Based onjoint work with Zhaoran Wang, Zhuoran Yang, and JunchiLi).

Han LiuPrinceton [email protected]

MS80

Theory for Local Optima in Nonconvex RegressionProblems

We present results for high-dimensional linear regressionusing robust M-estimators with a regularization term. Weshow that when the derivative of the loss function isbounded, our estimators are robust with respect to heavy-tailed noise distributions and outliers in the response vari-ables, with the usual order of k log p/n rates for high-dimensional statistical estimation. Our results continue aline of recent work concerning local optima of nonconvexM-estimators with possibly nonconvex penalties, where weadapt the theory to settings where the loss function onlysatisfies a form of restricted strong convexity within a localneighborhood. We also discuss second-order results con-cerning the asymptotic normality of our estimators, andprovide a two-step M-estimation algorithm for obtainingstatistically efficient solutions within the local region.

Po-Ling LohUniversity of [email protected]

MS81

The Langevin MCMC: Theory and Methods

The complexity and sheer size of modern data sets, towhichever increasingly demanding questions are posed,give rise to major challenges. Traditional simulation meth-ods often scale poorly with data size and model complexity,

and thus fail for the most complex of modern problems. Weare considering the problem of sampling from a log-concavedistribution. Many problems in machine learning fail intothis framework, such has linear ill-posed inverse problemswith sparsity inducing priors, or large scale Bayesian bi-nary regression.. The purpose of this lecture is to explainhow we can use ideas which have proven very useful in ma-chine learning community to solve large scale optimizationproblems to design efficient sampling algorithms. Most ofthe efficient algorithms know so far may be seen as variantsof the gradient descent algorithms, most often coupled withpartial updates (coordinates descent algorithms). This ofcourse suggests to study methods derived from Euler dis-cretization of the Langevin diffusion. Partial updates mayin this context as Gibbs steps This algorithm may begeneralized in the non-smooth case by regularizing theobjective function. The Moreau-Yosida inf-convolution al-gorithm is an appropriate candidate in such case. We willprove convergence results for these algorithms with explicitconvergence bounds both in Wasserstein distance and in to-tal variation. Numerical illustrations will be presented toillustrate our results.

Eric MoulinesTelecom [email protected]

Alain DurmusTelecom [email protected]

Nicolas BrosseEcole [email protected]

MS81

Converging on the Ultimate Optimization Algo-rithm for Machine Learning

In this talk I’ll survey recent advances in algorithms forminimizing the cost functions arising from convex machinelearning models. I’ll talk about recent work on stochas-tic methods incorporating acceleration, Newton updates,mini-batches, and non-uniform sampling for minimizing fi-nite sums of convex functions.

Mark SchmidtUniversity of British [email protected]

MS81

Fast Convergence of Newton-Type Methods onHigh-Dimensional Problems

We study the convergence rate of Newton-type methods onhigh-dimensional problems. The high-dimensional natureof the problem precludes the usual global strong convexityand smoothness that underlie the classical analysis of suchmethods. We find that restricted version of these condi-tions which typically arise in the study of the statisticalproperties of the solutions are also enough to ensure goodcomputational properties of Newton-type methods. We ex-plore the algorithmic consequences in distributed and on-line settings.

Yuekai SunUniversity of Michigan

Page 94: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 183

[email protected]

MS82

Relative Entropy Optimization and Dynamical Sys-tems

We discuss relative entropy convex programs, which rep-resent a generalization of geometric programs and second-order cone programs. We describe an application usingthese ideas to hitting-time estimation in certain dynamicalsystems, which in turn leads to an approach for checkingset membership.

Venkat ChandrasekaranCalifornia Institute of [email protected]

Parikshit ShahUniversity of Wisconsin/Philips [email protected]

MS82

Gradient Descent Learns Linear Systems

We prove that gradient descent efficiently converges to theglobal optimizer of the maximum likelihood objective ofan unknown linear time-invariant dynamical system froma sequence of noisy observations generated by the sys-tem. Even though the objective function is non-convex,we provide polynomial running time and sample complex-ity bounds under strong but natural assumptions. Linearsystems identification has been studied for many decades,yet, to the best of our knowledge, these are the first poly-nomial guarantees for the problem we consider.

Moritz HardtGoogle, [email protected]

Tengyu MaPrinceton [email protected]

Benjamin RechtUniversity of California, [email protected]

MS82

Efficient Methods for Online Node Classification ina Network

We consider the problem of sequentially classifying nodes ina network. A natural formulation leads to an NP hard com-binatorial objective. We discuss a relaxation technique forsolving a certain dynamic program that leads to a linear-time algorithm with a nearly-optimal mistake bound.

Alexander RakhlinUniversity of [email protected]

MS82

Randomized Approximation of Feed-Forward Net-work Using Compositional Kernels

We describe and analyze a simple random feature scheme(RFS) from prescribed compositional kernels. The com-

positional kernels we use are inspired by the structure ofneural networks and convolutional kernels. The resultingscheme is simple and yields sparse and efficiently com-putable features. Each random feature can be representedas an algebraic expression over a small number of (random)paths in a composition tree. Thus, compositional randomfeatures can be stored very compactly. The discrete na-ture of the generation process enables de-duplication of re-peated features, further compacting the representation andincreasing the diversity of the embeddings. Our approachrather complements and can be combined with previousrandom feature schemes.

Amit Daniely, Roy Frostig, Vineet GuptaGoogle [email protected], [email protected],[email protected]

Yoram SingerPrinceton [email protected]

MS83

On the Construction of Exact Augmented La-grangian Functions for Nonlinear Semidefinite Op-timization

Nonlinear semidefinite programming (NSDP) problems ex-tend the well-known nonlinear programming and the lin-ear semidefinite programming problems, and have manyapplications, for example, in feedback control and struc-tural optimization. The research associated to NSDP is,however, relatively scarce and few methods have been de-veloped. In this work, we propose a general exact aug-mented Lagrangian function for NSDP problems. By ex-act, we mean that the unconstrained minimization of theconstructed function gives a solution of the original prob-lem, when the penalty parameter is large enough. Differ-ently from the traditional exact penalty function, the exactaugmented Lagrangian is defined on the product space ofthe problem’s variables and of the multipliers. Here, we willgive a general formula for such a function and, based onthat, we will construct another one, which is continuouslydifferentiable, and with guaranteed exactness property.

Ellen H. FukudaState University of [email protected]

Bruno LourencoSeikei [email protected]

MS83

Sequential Injective Algorithm for Weakly Univa-lent Vector Equation and Its Application to MixedSecond-Order Cone Complementarity Problem

It is known that the conic complementarity problems andthe variational inequality problems are reformulated equiv-alently as vector equations by using the natural residual orFischer-Burmeister function. Moreover, under some mildassumptions, those vector equations possess the weak uni-valence property. In this study, we first provide a sequen-tial injective algorithm for a weakly univalent vector equa-tion. We note that the algorithm can be cast as a proto-type for many kinds of algorithm such as the smoothingNewton method, regularized smoothing Newton method,semi-smooth Newton method, etc. Then, we apply the

Page 95: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

184 OP17 Abstracts

prototype algorithm and the convergence analysis to theregularized smoothing Newton algorithm for mixed non-linear second-order cone complementarity problems. Weprove the global convergence property under the CartesianP0 assumption, which is strictly weaker than the mono-tonicity assumption.

Shunsuke HayashiGraduate School of Information Sciences, TohokuUniversitys [email protected]

MS83

An Extension of Chubanov’s Algorithm to Sym-metric Cones

We present a new algorithm for homogeneous feasibilityproblems over symmetric cones and discuss its complexity.We will also present the special case where the cone is thepositive semidefinite matrices. Our work is an extensionof Chubanov’s algorithm for linear programs and, analo-gously, it consists of a basic procedure and a step wherethe possible solutions are confined to an intersection of thecone and a half-space. It also has some flavor of the el-lipsoid method and of interior point methods. Some ofthe distinguishing features are that progress is measuredthrough volumes and we make use of the so-called “spec-tral norms”. We will also briefly compare our approach toa recent work by Pena and Soheili.

Bruno LourencoSeikei [email protected]

Tomonari KitaharaTokyo Institute of [email protected]

Masakazu MuramatsuThe University of [email protected]

Takashi TsuchiyaNational Graduate Institute for Policy [email protected]

MS83

Sub-Homogeneous Optimization Problems and itsApplications

We consider an optimization problem with sub-homogeneous functions in its objective and constraintfunctions. Since the absolute value function is sub-homogeneous, our problem is a generalization of theabsolute value optimization (AVO) proposed by Man-gasarian in 2007. The problem includes the 0-1 integerprogramming problem, the linear programming withlinear complementarity constraints, and Lasso regularizedregression. We give a new dual formulation of our problemthat has a closed form. It is an extension of the AVO dualproblem given by Mangasarian. We show that the newdual formulation is essentially the same as the Lagrangeandual, which does not have a closed form in general. To thebest of our knowledge, this property is new even for AVO.

Shota Yamanaka, Nobuo YamashitaKyoto University

[email protected], [email protected]

MS84

Power Cones in Second-Order Cone Form and DualRecovery

Although rational power cones (or equivalently, geometricmean cones) can be represented using second-order cones,the transformation is cumbersome. We prove presolve andpostsolve steps to automate this transformation based ona surprisingly size-economic reformulation procedure. Pri-mal and dual values can thus be computed for the originalpower cone formulation using second-order cone solvers.Computational results with MOSEK are given.

Henrik FribergMOSEK [email protected]

MS84

Partial Polyhedrality and Facial Reduction

In general conic programming, there may be a positive du-ality gap or the optimal value may not be attained whenthere’s no interior feasible solution. Facial Reduction Al-gorithms (FRAs) give a remedy for this situation by giv-ing the minimal cone of the feasible region or by findingthat the problem is infeasible. However, we usually donot use FRAs for linear programming (LP), because thewell-known strong duality theorem holds for LP; no suchnasty cases even if no interior feasible points exist. In thissense, polyhedral cones are ‘harmless.’ So, our claim is thatpolyhedral cones should be treated separately and properlyeven when they appear in general cones. We say that a coneis partially polyhedral when it is written as a direct productof a polyhedral cone and another general cone. We proposea new concept of distance to polyhedrality, and construct anew FRA, FRA-poly, to exploit the partial polyhedrality.As a result, the number of iterations is drastically reducedwhen the distance to the polyhedrality is small; in particu-lar, in the case of the doubly nonnegative cone, FRA-polygives a worst case bound O(n) whereas the classical FRAneeds O(n2). Of possible independent interest, we prove avariant of Gordan-Stiemke’s Theorem and a proper separa-tion theorem that takes into account partial polyhedrality.

Masakazu MuramatsuUniversity of [email protected]

Bruno LourencoSeikei [email protected]

Takashi TsuchiyaNational Graduate Institute for Policy [email protected]

MS84

Low-Order Complexity Results for SOS/SDPMethods in Real Algebra

In this talk, I will present some new algorithmic approachesto polynomial optimization and related problems that arebased on Semidefinite optimization and Sums of squaresand which have low orders of complexity. These resultsgive resolution to some long standing unsolved problems

Page 96: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 185

in the area.

Motakuri RamanaUnited [email protected]

MS84

On An Algorithm for Conic Linear Program

Recently, Chubanov proposed an interesting newpolynomial-time algorithm for linear programs. In thistalk, we extend his algorithm to second-order coneprogramming. The algorithm deals with a homoge-neous feasibility problem and, given a tolerance valueepsilon, finds a primal interior-feasible solution or a dualnonzero feasible-solution, or concludes that there is noepsilon-interior feasible solution to the primal systemin O(n log 1/epsilon) iterations of the basic procedure,by generating a series of the direct product of shrinkingobliquely truncated second-order cones and intervals.Roughly, the basic procedure corresponds to one iterationof interior-point algorithm. The algorithm has similarityto the ellipsoid method in the sense that it generates aseries of shrinking convex bodies of the same type, but italso has some flavor of the interior-point method in thatit utilizes the automorphism group of the cone.

Takashi TsuchiyaNational Graduate Institute for Policy [email protected]

Tomonari KitaharaTokyo Institute of [email protected]

MS85

A Multigrid Algorithm for SDP Relaxations ofSparse Polynomial Problems Arising in PDE Opti-mization

We propose a multigrid approach for the global optimiza-tion of a class of polynomial optimization problems withsparse support. The problems we consider arise from thediscretization of infinite dimensional optimization prob-lems. In many of these applications, the level of discretiza-tion can be used to obtain a hierarchy of optimization mod-els that captures the underlying infinite dimensional prob-lem at different degrees of fidelity. The main difficulty ofapplying multigrid methods to SPDs is that the geometricinformation between grids is lost when the original prob-lem is approximated via an SDP relaxation. We show howa multigrid approach can be applied by developing projec-tion operators to relate the primal and dual variables of theSDP relaxation between lower and higher levels in the hi-erarchy of discretizations. We develop sufficient conditionsfor the operators to be useful in applications. Our condi-tions are easy to verify in practice, and we discuss how theycan be used with infeasible interior point methods. Ourpreliminary results highlight two promising advantages offollowing a multigrid approach in contrast with a pure inte-rior point method: the percentage of problems that can besolved to a high accuracy is higher, and the time necessaryto find a solution can be reduced, especially for large scaleproblems.

Juan Campos SalazarImperial College London

[email protected]

MS85

Numerical and Theoretical Aspects of Shape Opti-mization for Fluid-Structure Interaction

In this talk we consider shape optimization for unsteadyfluid-structure interaction problems that couple Navier-Stokes equations with non-linear elasticity equations. Wefocus on the monolithic approach in the Lagrangian frame-work. It is obtained by applying an ALE transformationto the fluid equations that are usually formulated on thetime dependent physical domain. Shape optimization bythe method of mappings approach requires another trans-formation which maps the ALE reference domain to a ref-erence domain for shape optimization. This yields an op-timal control setting and therefore can be used to drive anoptimization algorithm with adjoint based gradient com-putation. Numerical results for our implementation, whichbuilds on FEniCS, dolfin-adjoint, and IPOPT, are pre-sented. In addition, some theoretical aspects on the dif-ferentiability of the state variables of fluid-structure inter-action models in optimal control settings are discussed.

Johannes HaubnerTechnical University of [email protected]

Michael UlbrichTechnische Universitaet MuenchenChair of Mathematical [email protected]

MS85

Primal-Dual Interior-Point Multigrid Method forTopology Optimization

An interior point method for the structural topology op-timization is proposed. The linear systems arising in themethod are solved by the conjugate gradient method pre-conditioned by geometric multigrid. The resulting methodis then compared with the so-called optimality conditionmethod, an established technique in topology optimization.This method is also equipped with the multigrid precon-ditioned conjugate gradient algorithm. We conclude that,for large scale problems, the interior point method withan inexact iterative linear solver is superior to any othervariant studied in the paper.

Michal KocvaraSchool of MathematicsUniversity of [email protected]

MS86

New Active Set Frank-Wolfe Variants for Mini-mization over the Simplex and the �1 Ball

In this talk, we focus on active-set variants of the Frank-Wolfe algorithm for minimizing a function over the simplexand the �1 ball. We first describe an active-set estimate(i.e. an estimate of the indices of the zero variables inthe optimal solution) for the considered problem that triesto quickly identify as many active variables as possible ata given point, while guaranteeing that approximate opti-mality conditions are satisfied. We then embed our esti-mate into different Frank-Wolfe variants and analyze theirconvergence properties. Finally, we report some numerical

Page 97: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

186 OP17 Abstracts

results showing the effectiveness of the new approaches.

Andrea CristofariDIIAG Sapienza University of [email protected]

Marianna De SantisUniversita di Roma La SapienzaDepartment of Computer, Control, and [email protected]

Stefano LucidiUniversity of [email protected]

Francesco RinaldiUniversity of [email protected]

MS86

Convex Relaxation Without Lifting: Solving LargeSemidefinite Programs without all the Pain

Many non-convex problems can be converted into con-vex problems by “lifting”, or mapping the problem intoa higher dimensional space. Very often, this results in con-vex, but intractable semidefinite programs that are too bigfor convex solvers. In this talk, we discuss new methodsfor semidefinite and rank-constrained problems that don’trequire lifting to higher dimensions. First, we discuss bi-convex relaxations of non-convex problems that have mostof the benefits of convex liftings, but without the addeddimensionality. Then, we discuss PhaseMax - a new con-vex relaxation for rank constrained problems that provablyrecovers global solutions without added dimensionality.

Tom GoldsteinDepartment of Computer ScienceUniversity of [email protected]

MS86

Application of Recent First-Order Methods to DICMicroscopy

One of the most successful techniques to observe biologicalprocesses in living cells without staining is differential in-terference contrast (DIC) microscopy. DIC images have atypical three dimensional appearance and show high con-trast, however they are not suited for topographical andmorphological interpretation, for which it is essential torecover the changes in phase of light, i.e. the phase func-tion. The inverse problem of estimating the specimen’sphase function from a set of DIC color images acquired atdifferent angles is nonlinear and ill-posed, and can be refor-mulated as the following nonconvex minimization problem

minφ∈RN

J(φ) ≡ J0(φ) + μJTV (φ)

where J0 is a least squares term measuring the distancebetween the observed images and the predicted images,and JTV is the total variation regularizer. In this talk wepropose an efficient proximal-gradient method to addressthe problem of phase estimation in DIC microscopy. Themethod exploits a stepsize rule based on a Lanczos-like pro-cess applied to the Hessian matrix of J0, and is equippedwith an Armijo-like linesearch in order to guarantee con-vergence. When JTV is chosen as a smoothed version of

the total variation, the method reduces to a gradient de-scent method. We show that the proposed tool is able toobtain in both the smooth and nonsmooth case accuratereconstructions with a reduced computational demand.

Simone RebegoldiDipartimento di Scienze Fisiche, Informatiche eMatematicheUniversita di Modena e Reggio [email protected]

Lola BautistaEscuela de Ingenieria de Sistemas y Escuela de FisicaUniversidad Industrial de [email protected]

Laure Blanc-FeraudINRIASophia Antipolis, Francelaure.blanc [email protected]

Marco PratoDipartimento di Scienze Fisiche, Informatiche eMatematicheUniversit{a} di Modena e Reggio [email protected]

Luca ZanniDipartimento di Scienze Fisiche, Informatiche eMatematicheUniversita di Modena e Reggio [email protected]

Arturo PlataEscuela de Ingenieria de Sistemas y Escuela de FisicaUniversidad Industrial de [email protected]

MS86

Fast Algorithms for Non-Convex and Non-SmoothEuler’s Elastica Regularization Problems

In this talk, I introduce two efficient optimization algo-rithms for solving Euler’s elastica regularized optimiza-tion problems. The minimizing functional is non-smooth,non-convex, and involves high-order derivatives, so thattraditional gradient descent based methods converge veryslowly. Recent alternating minimization methods show fastconvergence when a good choice of parameters is used.The objective of this talk is to introduce efficient algo-rithms which have simple structures with fewer parame-ters. These methods are based on operator splitting andalternating direction method of multipliers, and subprob-lems can be solved efficiently by Fourier transforms andshrinkage operators. I briefly present the analytical prop-erties of each algorithm, as well as numerical experiments,including comparison with some existing state-of-the-artalgorithms to show the efficiency and the effectiveness ofthe proposed methods.

Maryam YashtiniSchool of MathematicsGeorgia Institute of Technology

Page 98: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 187

[email protected]

MS87

Fast Proximal L1-Banach Descent Methods

Abstract not available at time of publication.

Marwa El [email protected]

MS87

Local Behaviour of Proximal Splitting Methods:Identification, Linear and Finite Convergence

Abstract not available at time of publication.

Jalal FadiliNormandie Univ, [email protected]

MS87

Almost Nonexpansive Operators

Abstract not available at time of publication.

Matthew K. TamInstitute for Numerical and Applied MathematicsUniversity of [email protected]

MS88

A Trust Region Method for Solving Derivative-FreeProblems with Binary and Continuous Variables

Trust region methods are used to solve various black-boxoptimization problems, especially when no derivative in-formation is available. In this talk, we will consider anextension of trust region methods for mixed-integer nonlin-ear programming (MINLP). There are both theoretical andcomputational innovations to handle the binary variables,including restricting the quadratic model, solving mixedinteger quadratic problems and handling well-poisedness.Whereas, of necessity, we address globality with respect tothe binary variables, we are content to obtain good localminima for the continuous variables, at least in part be-cause our typical context involves expensive simulations.We report computational results on analytic and real-lifeproblems and compare the results with existing methods.

Andrew R. ConnIBM T J Watson Research [email protected]

Claudia D’AmbrosioCNRS, LIX, [email protected]

Leo LibertiEcole [email protected]

Delphine SinoquetIFP Energies nouvelles

[email protected]

MS88

Order-Based Error for Managing Ensembles of Sur-rogates in Derivative-Free Optimization

We investigate surrogate-assisted strategies for derivative-free optimization using the mesh adaptive direct search(MADS) blackbox optimization algorithm. In particular,we build an ensemble of surrogate models to be used withinthe search step of MADS, and examine different methodsfor selecting the best model for a given problem at hand.To do so, we introduce an order-based error tailored tosurrogate-based search. We report computational exper-iments for analytical benchmark problems and engineer-ing design applications. Results demonstrate that differentmetrics may result in different model choices and that theuse of order-based metrics improves performance.

Sebastien Le DigabelEcole Polytechnique de [email protected]

Bastien [email protected]

Charles AudetEcole Polytechnique de Montreal - [email protected]

Michael KokkolarasMcGill [email protected]

MS88

A New Derivative-Free Model-Based Method forUnconstrained Nonsmooth Optimization

We consider the unconstrained optimization of a nons-mooth function, where first-order information is unavail-able or impractical to obtain. We propose a modelbased method that explicitly takes into account the non-smoothness of the objective function. A theoretical anal-ysis concerning the global convergence of the approach iscarried out.Furthermore, some results related to a prelim-inary numerical experience are reported.

Giampaolo LiuzziCNR - [email protected]

Stefano LucidiUniversity of [email protected]

Francesco RinaldiUniversity of [email protected]

Luis Nunes VicenteUniversity of [email protected]

MS88

A New Derivative-Free Linesearch Method for

Page 99: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

188 OP17 Abstracts

Solving Integer Programming Problems

In this talk, we define a new derivative-free method for in-teger programming problems with both bound constraintson the variables and general nonlinear constraints. The ap-proach combines a nonmonotone linesearch with a specificpenalty approach for handling the nonlinear constraints.The use of both suitable randomly generated search direc-tions and specific stepsizes in the linesearch guarantee thatall the points are generated in the integer lattice. We ana-lyze the theoretical properties of the method and show ex-tensive numerical experiments on both bound constrainedand nonlinearly constrained problems.

Giampaolo LiuzziCNR - [email protected]

Stefano LucidiUniversity of [email protected]

Francesco RinaldiUniversity of [email protected]

MS89

R-Linear Convergence of Limited Memory Steep-est Descent

The limited memory steepest descent method (LMSD) pro-posed by Fletcher is an extension of the Barzilai-Borwein“two-point step size” strategy for steepest descent meth-ods for solving unconstrained optimization problems. It isknown that the Barzilai-Borwein strategy yields a methodwith an R-linear rate of convergence when it is employed tominimize a strongly convex quadratic. Our work extendsthis analysis for LMSD, also for strongly convex quadrat-ics. In particular, it is shown that the method is R-linearlyconvergent for any choice of the history length parameter.The results of numerical experiments are provided to illus-trate behaviors of the method that are revealed throughthe theoretical analysis.

Frank E. Curtis, Wei GuoIndustrial and Systems EngineeringLehigh [email protected], [email protected]

MS89

Linear Algebra Issues in Nonlinear Optimization

Abstract not available at time of publication.

Dominique OrbanGERAD and Dept. Mathematics and IndustrialEngineeringEcole Polytechnique de [email protected]

MS89

Worst-Case Iteration Complexity of a Trust RegionAlgorithm for Equality Constrained Optimization

We present an optimization method for minimizing a twice-continuously differentiable nonconvex function subject totwice-continuously differentiable equality constraints. Themethod consists of a new trust funnel scheme combined

with the recently proposed trace algorithm that ensuresconvergence to an ε-neighborhood of the feasible region inO(ε−3/2) number of iterations. The algorithm then contin-ues computing iterates using the trace algorithm that aimat reducing the objective function while maintaining nearfeasibility. The resulting scheme has the same complex-ity as for the short-step arc algorithm. The subprob-lems that we employ during both phases take the objectiveand constraint functions into account. This is in contrastto the feasibility phase used in short-step arc, whichtemporarily ignores the objective function. The empiri-cal performance of such an approach is illustrated throughnumerical results.

Frank E. CurtisIndustrial and Systems EngineeringLehigh [email protected]

Daniel RobinsonJohns Hopkins [email protected]

Mohammadreza SamadiLehigh [email protected]

MS89

A Space Transformation Framework for NonlinearOptimization

We present a space transformation framework for nonlin-ear optimization. Instead of tackling the problem in theoriginal space, each iteration of this framework seeks fora trial step by modeling and approximately solving theoptimization problem in another space. We establish theglobal convergence and worst case iteration complexity ofthe framework. Then we show that the framework can bespecialized to a parallel space decomposition frameworkfor nonlinear optimization, which can be regarded as anextension of the domain decomposition method for PDEs.A feature of the decomposition framework is that it incor-porates the restricted additive Schwarz methodology intothe synchronization phase of the method. We will illus-trate how this decomposition framework can be applied todesign parallel algorithms for optimization problems withor without derivatives.

Zaikun Zhang, Luis Nunes VicenteUniversity of [email protected], [email protected]

Serge GrattonENSEEIHT, Toulouse, [email protected]

MS90

Universal Regularization Methods: Varying thePower, the Smoothness and the Accuracy

Adaptive cubic regularization methods have recentlyemerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimalcomplexity amongst second-order methods. Here we con-sider a general/new class of adaptive regularization meth-ods, that use first- or higher-order local Taylor models ofthe objective regularized by a(ny) power of the step sizeand applied to convexly-constrained optimization prob-

Page 100: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 189

lems. We investigate the worst-case complexity/global rateof convergence of these algorithms, in the presence of vary-ing (unknown) smoothness of the objective. We find thatthe methods automatically adapt their complexity to thedegree of smoothness of the objective, while also takingadvantage of the power of the regularization step to satisfyincreasingly better bounds with the order of the models.The bounds vary continuously and robustly, with respectto the regularization power, the model accuracy and thedegree of smoothness.

Coralia CartisUniversity of OxfordMathematical [email protected]

Nick GouldNumerical Analysis GroupRutherford Appleton [email protected]

Philippe L. TointUniversity of Namur, [email protected]

MS90

A Line-Search Approach Deriving by AdaptiveRegularization Framework Using cubics, with aWorst-Case Iteration Complexity of O(e-3/2)

We consider solving unconstrained optimization problemsby means of two popular globalization techniques: trust-region (TR) algorithms and adaptive regularized frame-work using cubics (ARC). Both techniques require the so-lution of a so-called “subproblem’ in which a trial stepis computed by solving an optimization problem involv-ing an approximation of the objective function, called “themodel”. The latter is supposed to be adequate in a neigh-borhood of the current iterate. In this work, we addressan important practical question related with the choice ofthe norm for defining the neighborhood. Given a sym-metric positive definite matrix M that satisfies a specificsecant equation, we propose the use of the M-scaled norm

– defined by ‖x‖M =√xTMx for all x ∈ R

n – in bothTR and ARC techniques. We show that the use of thisnorm induces remarkable relations between the trial stepof both methods that can be used to obtain efficient prac-tical algorithms. Using the M-scaled norm we propose apossible way to derive line-search algorithms enjoying thesame worst case complexity as the ARC algorithms. Thegood potential of the proposed algorithms is illustrated ona set of numerical experiments.

Youssef DiouaneInstitut Superieur de lAeronautique et de lEspace (ISAE)Toulouse, [email protected]

El houcine BergouINRA, Jouy-en-Josas, [email protected]

Serge GrattonENSEEIHT, Toulouse, [email protected]

MS90

Minimal Constraints Qualification that Ensure

Convergence to KKT Points

The Karush-Kuhn-Tucker condition (KKT) is arguably themost important property related to optimality in nonlin-ear optimization and hence many optimization algorithmswere developed that search for KKT points. Naturally,such algorithms can only generate sequences that conformto approximate versions of KKT. Therefore some type ofcontinuity argument is needed to extend this approximateversion of KKT to the exact version at the limit points.This talk will recall a hierarchy of approximate KKT con-ditions associated to different optimization methods andintroduce the respective minimal conditions that must holdat the limit points in order to ensure that they are actuallyKKT. Such conditions are special constraint qualificationsthat we call strict. In this sense we will be able to fullycharacterize the minimal constraint qualifications neededto ensure the global convergence of different algorithmsto KKT points. Also we will draw a complete picture ofthe relationship between these approximate, or sequential,optimality conditions, their respective minimal constraintqualifications and other classical constraint qualificationsthat appear in the literature.

Roberto [email protected]

Jose Mario MartınezUniversity of [email protected]

Alberto RamosFederal University of [email protected]

Paulo J. S. SilvaUniversity of [email protected]

MS90

Monotone Properties of the Barzilai-BorweinMethod

The Barzilai-Borwein method is one of the simpliest meth-ods for unconstrained optimization, which uses the steepestdescent direction with special stepsizes based on the cur-vature informaiton in the direction of the previous step.It is well-konwn that the BB method is much more effi-cient than the steepest descent method, particularly forlarge-scale problems. However, normally the BB methodis not a monotone method. In this talk, we explore certainmonotone properties of the BB method when the objectivefunction is a convex quadratic function.

Yaxiang YuanLaboratory of Scientific and Engineering ComputingChinese Academy of [email protected]

MS91

A Distributed Optimization Method for Large-Scale Sparse Support Vector Machines with an Ap-plication to Healthcare Problems

As we are moving into the era of big data, large-scale so-lutions that are computationally efficient are necessary. Inthis context, the distributed setting becomes very rele-vant. We are specifically motivated by healthcare prob-

Page 101: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

190 OP17 Abstracts

lems, where (medical research) participants could use theirmobile devices distributing storage and most importantlycomputational load, to develop predictive models. Wepresent an iterative splitting distributed algorithm to findan �1 regularized Support Vector Machines (SVM) separa-tion hyperplane with improved guarantees on the conver-gence rate. We compare our method to other approachesthat solve the sparse SVM problem, show results on a syn-thetic dataset and discuss the pros and cons of each one.Lastly, we apply our framework in a real healthcare prob-lem, where based on the de-identified Electronic HealthRecords of patients, we aim to predict heart-related hos-pitalizations in a target year. The induced sparsity in thealgorithm helps us identify the important medical factorsthat lead to hospitalization, accounting for model inter-pretability, which is crucial in the medical domain.

Theodora Brisimi, Alex Olshevsky, Ioannis Paschalidis,Wei ShiBoston [email protected], [email protected], [email protected],[email protected]

MS91

Decentralized Stochastic and Online Optimization

We present a decentralized primal-dual gradient methodfor optimizing a class of finite-sum convex optimizationproblem whose objective function is given by the summa-tion of m local objective functions which are distributedover a network of m agents. In our method, each agent al-ternatively updates its primal and dual estimates by com-puting the primal and dual proximal operator, and bycommunicating these estimates with other agents in thenetwork. We show that this method converges to an ε-optimal solution with no more than 1/ε communicationrounds when the problem is nonsmooth and stochastic.

Soomin LeeGeorgia [email protected]

MS91

A Comprehensive Linear Speedup Analysis forAsynchronous Stochastic Parallel Optimizationfrom Zeroth-Order to First-Order

Asynchronous parallel optimization received substantialsuccesses and extensive attention recently. One of coretheoretical questions is how much speedup (or benefit) theasynchronous parallelization can bring to us. This paperprovides a comprehensive and generic analysis to study thespeedup property for a broad range of asynchronous paral-lel stochastic algorithms from the zeroth order to the firstorder methods. Our result recovers or improves existinganalysis on special cases, provides more insights for under-standing the asynchronous parallel behaviors, and suggestsa novel asynchronous parallel zeroth order method for thefirst time. Our experiments provide novel applications ofthe proposed asynchronous parallel zeroth order methodon hyper parameter tuning and model blending problems.

Ji LiuUniversity of Rochester

[email protected]

MS91

Decentralized Consensus Optimization on Net-works with Delayed and Stochastic Gradients

Decentralized consensus optimization has extensive appli-cations in many emerging big data, machine learning, andsensor network problems. In decentralized computing,nodes in a network privately hold parts of the objectivefunction and need to collaboratively solve for the consen-sual optimal solution of the total objective, while they canonly communicate with their immediate neighbors duringupdates. In real-world networks, it is often difficult andsometimes impossible to synchronize these nodes, and asa result they have to use stale (and stochastic) gradientinformation which may steer their iterates away from theoptimal solution. In this talk, we focus on a decentralizedconsensus algorithm by taking the delays of gradients intoconsideration. We show that, as long as the random de-lays are bounded in expectation and a proper diminishingstep size policy is employed, the iterates generated by thisalgorithm still converge to a consensual optimal solution.Convergence rates of both objective and consensus are de-rived. Numerical results on some synthetic optimizationproblems and on real seismic tomography will also be pre-sented.

Xiaojing Ye, Benjamin SirbDepartment of Mathematics & StatisticsGeorgia State [email protected], [email protected]

MS92

Adaptive Convex Relaxations for Mixed-IntegerOptimization of Gas Pipeline Dynamics

We present an adaptive partitioning method in combina-tion with optimality-based bound tightening approachesto strengthen standard convex relaxations for dynamicpipeline flow optimization problems, which are mathemati-cally formulated as nonconvex mixed-integer nonlinear pro-grams (MINLP). Transient equations for pipeline flow areapproximated on a coarse space-time grid and then outer-approximated with adaptively tightened convex domains.Computation time and solution accuracy for objectivefunctions that maximize compressor efficiency or economicwelfare, respectively, on meshed gas networks are evaluatedand compared to outcomes obtained using primal-dual in-terior point methods. The approach is extended to includediscrete variables that determine whether gas compressorsare active in each optimization time period. Such mixed-integer problems are solved to near global-optimality us-ing the proposed method. Efficiency and scalability of thetechnique makes it promising for extension to the generalclass of large-scale problems that involves dynamic (dif-ferential equation) constraints as well as discrete decisionvariables.

Harsha Nagarajan, Anatoly ZlotnikLos Alamos National [email protected], [email protected]

Fei Wu, Ramteen SioshansiOhio State University

Page 102: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 191

[email protected], [email protected]

MS92

Penalty Alternating Direction Methods for Mixed-Integer Nonlinar Optimization

We present a penalty algorithm that is based on an alter-nating direction method for the penalty subproblems. Thiswork is motivated by a preceding paper (”Solving power-constrained gas transportation problems using an MIP-based alternating direction method”, Comput. Chem.Eng. 2015(82)), in which a variant of the method is used tosolve large-scale non-convex mixed-integer nonlinear feasi-bility problems from steady-state gas transport. In thistalk, the extensions of the method are discussed and wegive a sketch of the convergence theory for the new algo-rithm. The practical strength of the proposed method isdemonstrated by a computational study, in which we applythe method to large-scale real-world problems from steady-state gas transport including, e.g., pooling effects. More-over, we discuss the capabilities of the method as a primalheuristic for general mixed-integer nonlinear optimizationproblems and show its strength on the MINLPLib2 testset.

Martin SchmidtFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment [email protected]

Bjoern Geissler, Antonio MorsiFriedrich-Alexander-Universitat Erlangen Nurnberg,Department Mathematik, [email protected],[email protected]

Lars ScheweFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment Mathematik, [email protected]

MS92

MIP-Based Instantaneous Control of Mixed-Integer PDE-Constrained Gas Transport Problems

We study the transient optimization of gas transport net-works including both discrete controls due to switching ofcontrollable elements and nonlinear fluid dynamics that aredescribed by the system of partial differential Euler equa-tions. This combination leads to mixed-integer optimiza-tion problems subject to nonlinear hyperbolic partial differ-ential equations on a graph. We propose an instantaneouscontrol approach in which suitable Euler discretizationsyield systems of ordinary differential equations on a graph.We show that this networked system of ordinary differen-tial equations is well-posed and that affine-linear solutionsof these systems can be derived analytically. As a con-sequence, we obtain finite dimensional mixed-integer lin-ear optimization problems for every time step that can besolved to global optimality using general-purpose solvers.We illustrate our approach in practice by presenting nu-merical results of realistic gas transport networks.

Mathias SirventFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment [email protected]

Martin Gugat

Friedrich-Alexander-University Erlangen-Nuremberg(FAU)[email protected]

Gunter LeugeringUniversity Erlangen-NurembergInstitute of Applied [email protected]

Alexander MartinUniversity of [email protected]

Martin SchmidtFriedrich-Alexander-Universitat Erlangen-NurnbergDepartment [email protected]

David WintergerstFriedrich-Alexander-Universitat [email protected]

MS92

Reformulating Discrete Aspects in Operative Plan-ning of Gas and Water Networks

Operative planning in gas and water networks usually in-volves discrete decisions and other discrete model aspectswith constraints that couple the resulting integer variablesover time or space. While certain aspects are genuinely dis-crete, other aspects admit reformulations using continuousvariables and constraints that can be helpful in reducingthe combinatorial complexity of solving the overall opti-mization problem. We discuss several examples of suchreformulations that have proved effective in apllications.

Marc C. SteinbachLeibniz Universitat HannoverInstitute for Applied [email protected]

MS93

Relationships Between Constrained and Uncon-strained Multi-Objective Optimization and Appli-cation in Location Theory

In this talk we investigate relationships between con-strained and unconstrained multi-objective optimizationproblems. We mainly focus on generalized convex multi-objective optimization problems, i.e., the objective func-tion is a componentwise generalized convex (e.g., quasi-convex or semi-strictly quasi-convex) function and the fea-sible domain is a convex set. Beside the field of locationtheory the assumptions of generalized convexity are foundin several branches of Economics. We derive a charac-terization of the set of efficient solutions of a constrainedmulti-objective optimization problem using characteriza-tions of the sets of efficient solutions of unconstrainedmulti-objective optimization problems. We demonstratethe usefulness of the results by applying it on constrainedmulti-objective location problems. Using our new resultswe show that special classes of constrained multi-objectivelocation problems (e.g., point-objective location problems,Weber location problems and center location problems) canbe completely solved with the help of algorithms for theunconstrained case. At the end of the talk, we presentsome information about the current development of the

Page 103: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

192 OP17 Abstracts

MATLAB-based software tool ”Facility Location Opti-mizer”.

Christian GuentherMartin-Luther-Universitat Halle-Wittenbergheodor-Lieser-Strae 5, 06120 Halle, [email protected]

Christiane TammerFaculty of Sciences III, Institute of MathematicsUniversity of [email protected]

MS93

An Implicit Filtering Algorithm for Derivative FreeMultiobjective Optimization

We consider multiobjective optimization problems wherethe variables are subject to bound constraints. We assumethat the objective functions are continuously differentiablebut the derivatives are difficult to compute or not avail-able. The method we present is a multiobjective variantof the implicit filtering algorithm, which can be viewed asa line search method based on a difference-approximatedsteepest descent direction, coupled with coordinate search.At each iteration, coordinate search with a fixed step sizeis performed as long as new nondominated points are en-countered. Then, the computed objective function valuesare employed to approximate the gradients of the objec-tive functions, a linear programming problem is solved toapproximate the multiobjective descent direction, and aline search is performed along the computed direction. Incase of failure of the line-search, the step size is reducedand a new iteration started. Global convergence to Paretopoints is stated. Computational experiments are presentedand discussed.

Guido CocchiDip. Ingegneria dell’InformazioneUniversita degli Studi di [email protected]

Giampaolo LiuzziIstituto di Analisi dei Sistemi ed InformaticaCNR, Rome, [email protected]

Alessandra PapiniDipartimento di Energetica ‘S. Stecco’Univerity of [email protected]

Marco SciandroneUniversita degli Studi di [email protected]

MS93

An Iterative Approach for Multiobjective Opti-mization Problems with Heterogeneous Functions

In many applications of multiobjective optimization the ob-jective functions are heterogeneous, for example regardingthe evaluation time. But methods for computing efficientpoints of such problems do not distinguish between theobjectives so far, but treat them all equally. In this pre-sentation an idea to take into account the heterogeneityof the objective functions is introduced. For this purposean iterative approach is considered, which is based on the

trust region concept for scalar optimization problems.

Jana ThomannTU [email protected]

MS94

New Analytical and Stochastic Results on Under-standing Deep Convolutional Neural Network

Abstract not available at time of publication.

Fang HanUniversity of [email protected]

MS94

Probabilistic Image Models and Extensions of thePerona-Malik Filter

The Perona-Malik model has been very successful at restor-ing images from noisy input. In this talk, we show how thePerona-Malik model can be reinterpreted and extended us-ing the language of Gaussian scale mixtures. Specifically,we show how the expectation-maximization EM algorithmapplied to Gaussian scale mixtures leads to the lagged-diffusivity algorithm for computing stationary points ofthe Perona-Malik diffusion equations. Moreover, we showhow mean field approximations to these Gaussian scalemixtures lead to a modification of the lagged-diffusivityalgorithm that better captures the uncertainties in therestoration. Since this modification can be hard to com-pute in practice we propose relaxations to the mean fieldobjective to make the algorithm computationally feasible.Our numerical experiments show that this modified lagged-diffusivity algorithm often performs better at restoring tex-tured areas and fuzzy edges than the unmodified algorithm.As a second application of the Gaussian scale mixtureframework, we show how an efficient sampling procedurecan be obtained for the probabilistic model, making thecomputation of the conditional mean and other expecta-tions algorithmically feasible. Again, the resulting algo-rithm has a strong resemblance to the lagged-diffusivityalgorithm.

Dirk LorenzTU [email protected]

Lars MeschederMPI [email protected]

MS94

A Family of Stochastic Surrogate Optimization Al-gorithms

We present a family of optimization techniques for solvinglarge-scale structured optimization problems that typicallyoccur in machine learning, where the objective functionis either a large finite sum, or an expectation. Our ap-proaches are based on incrementally updating a model (sur-rogate) of the objective. When using upper-bounds, we ob-tain incremental and stochastic majorization-minimizationalgorithms, which are shown to asymptotically provide sta-tionary points for non-convex problems, and we recoverclassical convergence rates of first-order methods in theconvex case. When the objective is strongly convex, we

Page 104: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 193

also show that using upper bounds is not necessarily ap-propriate. By using instead strong convexity lower bounds,we obtain much faster convergence, both in theory and inpractice. Besides, our algorithms for finite sums can beaccelerated in the sense of Nesterov, providing significantimprovements when the problem is ill-conditioned.

Julien MairalInria [email protected]

MS94

Expectation-Maximization Algorithms for Par-tially Observed Continuous-Time Markov Pro-cesses

Continuous-time Markov processes enjoy numerous appli-cations in sciences and engineering. In many settings, mul-tiple independent realizations of these processes are ob-served at unevenly spaced time points. In such scenarios,estimation of state transition rates is a nontrivial optimiza-tion problem, because these rates can be functions of manyparameters related to the covariates attached to each re-alization of the process (e.g., each realization may corre-spond to a patient disease trajectory with patient-specificcovariates). I will discuss expectation-maximization (EM)algorithms to solve such optimization problems in the con-text of finite state space models, birth-death processes, andmultitype branching processes, with applications to analy-ses of electronic medical records and intrahost evolution ofbacteria.

Vladimir MininUniversity of [email protected]

MS95

Global Optimization with Orthogonality Con-straints via Stochastic Diffusion on Manifolds

Orthogonality constrained problems have wide applicationsin many problems including p-harmonic flow, 1-bit com-pressive sensing, eigenvalues problems in electronic struc-ture calculation and many others. One of the main chal-lenges of orthogonality constraints are the non-convexitywhich makes these problems hard to achieve their globaloptimizers. In this talk, I will discuss our recent work of anglobal optimization method for orthogonality constrainedproblems. Our method is based on an idea of iterative per-turbing local minimizers via a stochastic differential equa-tion on the Stiefel manifold. We theoretically analyze theglobal convergence guarantee of the proposed method, andour numerical tests also validate its effectiveness.

Rongjie LaiRensselaer Polytechnic [email protected]

Zaiwen Wen, honglin yuan, xiaoyi gupeking universitywenzw@pku,edu.cn, [email protected],[email protected]

MS95

Reducing the Cost of the Fock Exchange Operator

The Fock exchange operator plays a central role in modernquantum chemistry and materials science. I will describe

the recently developed adaptive compression strategy forreducing the cost of the Fock exchange operator from anoptimization perspective, and discuss the progress for realmaterials calculations using planewaves as well as quantumchemistry basis sets.

Lin LinUniversity of California, BerkeleyLawrence Berkeley National [email protected]

MS95

Electronic Structure Calculation using Semidefi-nite Programs

Electronic structure calculation can be modelled by a vari-ational approach where the two-body reduced density ma-trix (RDM) is the unknown variable. It can lead to an ex-tremely large-scale semidefinite programming (SDP) prob-lem. This talk will present a practically efficient second-order type method.

Zaiwen WenPeking UniversityBeijing International Center For Mathematical [email protected]

MS95

A Newton-Krylov Method for Solving CoupledCluster Equations

The coupled cluster approach to solving the quantum manybody problem is considered one of the most accurate ap-proaches in quantum chemistry. However, the coupled clus-ter equations are a system of nonlinear equations withmany variables (tens of thousands to millions). Theseequations are currently solved by either an inexact Newtonmethod or a quasi-Newton method called Direct Inversionin Iterative Subspace (DIIS). These methods experiencedifficulty in convergence in some cases. In this talk, wewill describe using a Newton-Krylov method to overcomethe convergence difficulty. The effectiveness of the Newton-Krylov method depends largely on the construction of aneffective preconditioner for solving the Newton correctionequation iteratively.

Chao YangLawrence Berkeley National [email protected]

Jiri BrabecCzeck Academy of [email protected]

Karol KolwalskiPacific Northwest National [email protected]

MS96

New Results for Sparse Methods for Logistic Re-gression and Related Classification Problems

We propose and analyze methods in logistic regression thatinduce sparse solutions. We introduce a novel conditionnumber associated with the data for logistic regressionthat measures the degree of non-separability of the data(how close the data is to being separable), and that nat-urally enters the analysis and the computational proper-

Page 105: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

194 OP17 Abstracts

ties of methods. We take advantage of very recent newresults in first-order methods for convex optimization dueto Bach and others, to present new computational guaran-tees for the performance of methods for logistic regression.In the high-dimensional regime in particular, these guar-antees precisely characterize how the methods impart bothdata-fidelity and implicit regularization, for any dataset.

Robert M. FreundMITSloan School of [email protected]

Paul GrigasUC [email protected]

Rahul [email protected]

MS96

Fast Algorithms for Large Scale Generalized Dis-tance Weighted Discrimination

High dimension low sample size statistical analysis is im-portant in a wide range of applications. In such situations,the highly appealing discrimination method, support vec-tor machine, can be improved to alleviate data piling atthe margin. This leads naturally to the development ofdistance weighted discrimination (DWD), which can bemodeled as a second-order cone programming problem andsolved by interior-point methods when the scale (in samplesize and feature dimension) of the data is moderate. Here,we design a scalable and robust algorithm for solving largescale generalized DWD problems. Numerical experimentson real data sets from the UCI repository demonstrate thatour algorithm is highly efficient in solving large scale prob-lems, and sometimes can even be more efficient than thepopular LIBSVM for solving the corresponding SVM prob-lems.

Kim-Chuan TohNational University of SingaporeDepartment of [email protected]

Defeng SunDept of MathematicsNational University of [email protected]

Xin-Yee LamNational University of [email protected]

J.S. MarronUniversity of North [email protected]

MS96

Stochastic First-Order Methods in Data Analysisand Reinforcement Learning

Stochastic first-order methods provide a basic algorithmictool for online learning and data analysis In this talk, wesurvey several innovative applications including risk-averseoptimization, online principal component analysis, and re-

inforcement learning. We will show that rate of conver-gence analysis of the stochastic optimization algorithmsprovide sample complexity analysis for these online learn-ing applications.

Mengdi WangPrinceton [email protected]

MS96

Random Permutations Fix a Worst Case for CyclicCoordinate Descent

Variants of the coordinate descent approach for minimiz-ing a nonlinear function are distinguished in part by theorder in which coordinates are considered for relaxation.Three common orderings are cyclic (CCD), in which wecycle through the components of x in order; randomized(RCD), in which the component to update is selected ran-domly and independently at each iteration; and random-permutations cyclic (RPCD), which differs from CCD onlyin that a random permutation is applied to the variablesat the start of each cycle. Known convergence guaranteesare weaker for CCD and RPCD than for RCD, though inmost practical cases, computational performance is simi-lar among all these variants. There is a certain family ofquadratic functions for which CCD is significantly slowerthan for RCD; a recent paper of Sun and Ye has exploredthe poor behavior of CCD on this family. The RPCD ap-proach performs well on this family, and this paper explainsthis good behavior with a tight analysis. We discuss gener-alizations of the analysis to quadratics in which the Hessianhas a single dominant eigenvalue whose components are nottoo diverse.

Stephen WrightUniversity of WisconsinDept. of Computer [email protected]

Ching-Pei LeeUniversity of [email protected]

MS97

Communication Avoiding Primal and Dual BlockCoordinate Descent Methods

Primal and dual block coordinate descent are iterativemethods for solving regularized and unregularized opti-mization problems. Distributed-memory parallel imple-mentations of these methods have become popular in an-alyzing large machine learning datasets. However, exist-ing implementations communicate at every iteration which,on modern data center and supercomputing architec-tures, often dominates the cost of floating-point computa-tion. We evaluate the performance of new communication-avoiding variants of the primal and dual block coordi-nate descent for ridge regularized least squares. We testour communication-avoiding algorithms on dense datasets(both synthetic and real) to illustrate the speedups attain-able on a Cray XC30 supercomputer. We evaluate thestrong scaling performance on up to 12288 cores to il-lustrate the speedups attainable. Our results show thatthe communication-avoiding algorithms achieve speedupsof 1.2x – 4.8x on real problems obtained from LIBSVM.

Aditya Devarakonda, Kimon Fountoulakis, JamesDemmel, Michael Mahoney

Page 106: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 195

UC [email protected], [email protected], [email protected], [email protected]

MS97

AIDE: Fast and Communication Efficient Dis-tributed Optimization

In this paper, we present two new communication-efficientmethods for distributed minimization of an average of func-tions. The first algorithm is an inexact variant of theDANE algorithm that allows any local algorithm to returnan approximate solution to a local subproblem. We showthat such a strategy does not affect the theoretical guar-antees of DANE significantly. In fact, our approach canbe viewed as a robustification strategy since the methodis substantially better behaved than DANE on data par-tition arising in practice. It is well known that DANEalgorithm does not match the communication complexitylower bounds. To bridge this gap, we propose an acceler-ated variant of the first method, called AIDE, that not onlymatches the communication lower bounds but can also beimplemented using a purely first-order oracle. Our empir-ical results show that AIDE is superior to other communi-cation efficient algorithms in settings that naturally arisein machine learning applications.

Sashank ReddiCarnegie Mellon [email protected]

Jakub KonecnyThe University of [email protected]

Peter RichtarikUniversity of [email protected]

Barnabas PoczosCarnegie Mellon [email protected]

Alex SmolaYahoo! Research, Santa Clara, [email protected]

MS97

Speeding-Up Large-Scale Machine Learning UsingGraphs and Codes

In this talk I will present three simple combinatorial ideasto speed up parallel and distributed learning algorithms.We will start off with serial equivalence in asynchronousparallel optimization, its significance, and how we can guar-antee it using a recent phase transition result on graphs.We continue on the issue of stragglers (i.e., slow nodes)in distributed systems, where we will use erasure codes torobustify gradient based algorithms against delays. In ourthird example, we will reduce the high communication costof data-shuffling in distributed learning, using the seem-ingly unrelated notion of coded caching. For all the above,we will see real world experiments that indicate how thesesimple ideas can significantly improve the speed and accu-racy of large-scale learning.

Dimitris PapailiopoulosUC Berkeley

[email protected]

MS97

A General Framework for Communication-EfficientDistributed Optimization

The scale of modern datasets necessitates the developmentof efficient distributed optimization methods for machinelearning. We present a general-purpose framework for thedistributed environment, CoCoA, that has an efficient com-munication scheme and is applicable to a wide variety ofproblems in machine learning and signal processing. Weextend the framework to cover general non-strongly convexregularizers, including L1-regularized problems like lasso,sparse logistic regression, and elastic net regularization,and show how earlier work can be derived as a specialcase. We provide convergence guarantees for the classof convex regularized loss minimization objectives, lever-aging a novel approach in handling non-strongly convexregularizers and non-smooth loss functions. The resultingframework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive setof experiments on real distributed datasets.

Virginia SmithUC [email protected]

MS98

Perturbed Iterates Analysis for Stochastic Opti-mization

We introduce and analyze stochastic optimization methodswhere the input to each update is perturbed by boundednoise. We show that this framework forms the basis ofa unified approach to analyze asynchronous implementa-tions of stochastic optimization algorithms, Using our per-turbed iterate framework, we provide new analyses of theHogWild algorithm and asynchronous stochastic coordi-nate descent, that are simpler than earlier analyses, removemany assumptions of previous models, and in some casesyield improved upper bounds on the convergence rates. Weproceed to apply our framework to develop and analyzeKroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate ex-perimentally on a 16-core machine that the sparse and par-allel version of SVRG is in some cases more than four ordersof magnitude faster than the standard SVRG algorithm.

Horia Mania, Xinghao PanUniversity of California, [email protected], [email protected]

Dimitris PapailiopoulosUniversity of [email protected]

Benjamin Recht, Kannan RamchandranUniversity of California, [email protected], [email protected]

Michael JordanUniversity of California at [email protected]

MS98

Robustness Mechanisms for Statistical Generaliza-

Page 107: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

196 OP17 Abstracts

tion in Large-Capacity Models

Recent successes in neural networks have demonstratedthat models with an excessive numbers parameters canachieve tremendous improvements in pattern recognition.Moreover, empirical evidence demonstrates that such per-formance is achievable without any obvious forms of reg-ularization or capacity control. These findings reveal thattraditional learning theory fails to explain why large neu-ral networks generalize. In this talk, I will discuss possiblemechanisms to explain generalization in such large models,appealing to insights from linear predictors. I will close byproposing some possible paths towards a framework of gen-eralization that explains contemporary experimental find-ings in large-scale machine learning.

Benjamin RechtUniversity of California, [email protected]

MS98

Iterative Optimization Methods: From Robustnessto Regularization

Abstract not available at time of publication.

Lorenzo RosascoInstituto Italiano di Technologia andMassachusetts Institute of [email protected]

MS98

Implicit Regularization Through Optimization

Abstract not available at time of publication.

Nati SrebroToyota Technological Institute at [email protected]

MS99

Acceleration of Randomized Block Coordinate De-scent Methods via Identification Function

We present a block coordinate descent method, based onthe paper [Peter Richtarik, Martin Takac, Iteration com-plexity of randomized block-coordinate descent methodsfor minimizing a composite function, Math. Program.(2014)], applied to the minimization problem

minx

f(x) + ψ(x) (1)

s.t. l ≤ x ≤ u

where f is a smooth function and ψ is a convex, possi-bly nonsmoth, function with a block coordinate separablestructure.Using the concept of identification function presented inthe paper [Francisco Facchinei, Andreas Fischer, Chris-tian Kanzow, On the accurate identification of active con-straints, SIAM Journal on Optimization (1998)], we pro-pose an identification function for (1), capable of findingthe active constraints of the problem. In particular, forψ(x) = λ‖x‖1, λ > 0, we, additionally, obtain informationabout the null coordinates of stationary point of (1).We construct a nonuniform randomized method that ex-ploits the information of the identification function, in-creasing the probability of selecting the block coordinatesthat are active, according to this strategy.

Our numerical experimentation produced promising re-sults, including tests of a parallel implementation of thealgorithm.

Ronaldo Lopes, Paulo J. S. Silva, Sandra A. SantosUniversity of [email protected], [email protected], [email protected]

MS99

A First-Order Primal-Dual Algorithm with Breg-man Distance Functions

In this talk we analyze the ergodic convergence rate for afirst-order primal-dual algorithm for non-smooth convex-cave saddle-point problems of the form

minx

maxy

〈Kx, y〉 + f(x) + g(x) − h∗(y),

where K is a linear operator, f, h∗ are convex functionswith simple-to-compute proximal mappings and g is a con-vex function with Lipschitz continuous gradient. The pre-sented algorithm alternates proximal gradient steps in theprimal and dual variables and hence it is well suited forlarge scale problem as they arise in numerous imaging andmachine learning applications. We present the algorithmin its most general form using proximal-like Bregman dis-tance functions, which can have advantages for example incase of simplex constraints. It turns out that the algorithmhas an ergodic O(1/k) convergence rate in the most generalcase but it can also be accelerated (via dynamic choices ofthe step sizes) to achieve the optimal O(1/k2) convergencerate for problems strongly convex in the primal or dualvariable. Furthermore, if the problem is strongly convex inboth the primal and dual variables, it also yields an opti-mal linear convergence rate. We show applications of thealgorithm to important problems in imaging.

Thomas PockInstitute for Computer Graphics and VisionGraz University of [email protected]

Antonin ChambolleCentre de Mathematiques AppliqueesEcole Polytechnique, [email protected]

MS99

A Two-Phase Gradient Method for Singly Lin-early Constrained Quadratic Programming Prob-lems with Lower and Upper Bounds

We propose an algorithmic framework for the solution ofQuadratic Programming (QP) problems with a single lin-ear equality constraint and bounds on the variables. In-spired by the GPCG algorithm [J.J. More and G. Toraldo,SIAM J. Optim., 1, 1991] for bound-constrained convexQP problems, our algorithm alternates between two phasesuntil convergence: an identification phase, which performsspectral Gradient Projection steps until either a candidateactive set is identified or no reasonable progress is made,and a minimization phase, which reduces the objectivefunction in the working set defined by the identificationphase, by applying either the Conjugate Gradient methodor spectral Gradient Projection methods recently proposedfor unconstrained QP (see, e.g., [R. De Asmundis, D. diSerafino, W.W. Hager, G. Toraldo and H. Zhang, Com-put. Optim. Appl. 59, 2014]). Convergence results as

Page 108: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 197

well as numerical experiments are presented, showing theeffectiveness of the proposed approach.

Daniela di SerafinoSecond University of NaplesDepartment of Mathematics and [email protected]

Gerardo ToraldoUniversity of Naples ”Federico II”Department of Agricultural [email protected]

Marco ViolaSapienza Universita di RomaDip. di Ingegneria Informatica, Automatica e [email protected]

MS100

Pareto Fronts with Gaussian Process ConditionalSimulations

In the case of expensive or time consuming black-box simu-lators, efficient surrogate-based methods to perform multi-objective optimization have been proposed. Many of themrely on Gaussian Process (GP) models, that are exploitedto drive the optimization to the set of optimal compromisesolutions in the design and objective spaces, i.e., the Paretoset and Pareto front, respectively. Yet, when it comes topredicting the actual location of these sets, say to selectpromising designs interactively, the common practice oftaking the predictive mean of the models may prove tobe insufficient, as it does not take into account their uncer-tainties. To this end, we propose the use of GP conditionalsimulations to get realizations of Pareto sets and Paretofronts. From the latter, based on concepts from randomclosed set theory, i.e., the Vorobev expectation and devia-tion, we obtain a continuous approximation of the Paretofront and a measure of the remaining variability of the setof non-dominated points. Then, to benefit from the struc-ture of the Pareto set, generally a low-dimensional mani-fold in the design space, we train a Gaussian Process LatentVariable Model (GP-LVM) from the conditional Pareto setsimulations. The combined GP-LVM and GP models en-able selecting points on the estimated Pareto front inter-actively, and more accurately. Finally, we show how thismethodology naturally adapts to accommodate noise in ob-servations.

Mickael BinoisUniversity of [email protected]

Victor PichenyMIAT Universite de Toulouse and [email protected]

MS100

Derivative-Free Methods for Hyperparameter Op-timization of Neural Networks

We discuss the application of derivative-free optimizationmethodologies in the context of machine learning. Morespecifically, our main application will be determining val-ues of the hyperparameters of deep neural networks. Wewill discuss different formulations for the problem, includ-ing different ways to handle categorical variables. We willpresent a computational evaluation to determine which for-mulations work better for the problem at hand, showing

along the way that derivative-free methodologies can per-form better than the popular random search approach, andare competitive with state-of-the-art algorithms for hyper-parameter optimization algorithms.

Giacomo NanniciniCarnegie Mellon UniversityTepper School of [email protected]

MS100

A Hybrid Framework for Constrained Derivative-Free Optimization Combining Accelerated Ran-dom Search with a Trust Region Approach

The Accelerated Random Search (ARS) algorithm by Ap-pel, Labarre, and Radulovic (2004) has been proven to con-verge to the global minimum faster than the classic PureRandom Search algorithm. This talk presents a hybridmethod that uses a derivative-free trust-region algorithmin conjunction with an extension of the Accelerated Ran-dom Search (ARS) algorithm to constrained optimization.Subsets of sample points generated by ARS are passed tothe trust-region algorithm, which is then used to search forlocal minima. In the numerical experiments, ConstrainedARS is combined with CONORBIT (Regis and Wild 2016),which is a trust-region method that uses interpolating Ra-dial Basis Function (RBF) models, and the resulting hybridalgorithm is compared with alternative methods on a seriesof test problems for constrained optimization.

Rommel G. RegisSaint Joseph’s [email protected]

MS100

Bayesian Optimization under Mixed Constraintswith a Slack-Variable Augmented Lagrangian

Bayesian optimization methods sequentially update a pos-terior distribution over the objective function. To addressblackbox-constrained optimization problems, we employaugmented Lagrangian-based merit functions in order todefine a sampling criterion. We show how the form of themerit function is critical for sampling efficiency. Numericalresults show that the resulting method may be especiallypowerful when there are several disjoint local minimizers.

Stefan WildArgonne National [email protected]

MS101

Scaling Up Gauss-Newton Methods for ExpensiveLeast-Squares Problems

Nonlinear least squares problems are used for a range ofimportant scientific applications such as data fitting, dataassimilation for weather forecasting and climate modelling.A typical feature of such problems is that they involve ahuge amount of data, making them very high dimensionaland extremely expensive to solve. Similar big data prob-lems also arise in machine learning where the optimiza-tion is often performed using randomized block-coordinategradient descent methods. Inspired by such approaches,we present a randomized block-coordinate adaptation fornonlinear least squares problems. We provide global con-vergence guarantees for our approach and encouraging nu-

Page 109: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

198 OP17 Abstracts

merical results.

Jaroslav FowkesEmirates Data Science LaboratoryUniversity of [email protected]

Coralia CartisUniversity of OxfordMathematical [email protected]

MS101

New Approaches for Global Optimization Methods

We present some dimensionality reduction techniques forglobal optimization algorithms, in order to increase theirscalability. Inspired by ideas in machine learning, and ex-tending the approach of random projections in Zhang etal (2016), we present some new algorithmic approaches forglobal optimisation with theoretical guarantees of good be-haviour and encouraging numerical results.

Adilet OtemissovThe Alan Turing Institute for Data ScienceLondon, [email protected]

Coralia CartisUniversity of OxfordMathematical [email protected]

MS101

Estimation Performance and Convergence Rate ofthe Generalized Power Method for Phase Synchro-nization

The problem of phase synchronization consists of estimat-ing a collection of phases based on noisy measurementsof relative phases, and has numerous applications. Mo-tivated by its order-optimal estimation error in the senseof Cramer-Rao bound, two approaches have been recentlydeveloped to compute the maximum likelihood estima-tor (MLE). One is the semidefinite relaxation (SDR) andthe other is to directly solve the non-convex quadraticoptimization formulation of the MLE by the generalizedpower method (GPM). In this paper, we focus on thelatter approach. Our contribution is twofold. First, webound the rate at which the estimation error decreasesand use this bound to show that all iterates—not just theMLE—achieve an estimation error of the same order asthe Cramer-Rao bound. This implies that one can termi-nate the GPM at any iteration and still obtain an estimatorwith theoretically guaranteed estimation error. Second, weshow that under the same assumption on the noise poweras that for the SDR method, the GPM will converge to theMLE at a linear rate with high probability. This answers aquestion raised by Boumal and shows that GPM is compet-itive in terms of both theoretical guarantees and numericalefficiency with the SDR method. As a by-product, we givean alternative proof of a result of Boumal, which assertsthat every second-order critical point of the aforementionednon-convex quadratic optimization is globally optimal in acertain noise regime.

Man Chung Yue, Huikang LiuDepartment of Systems Engineering and EngineeringManagement

The Chinese University of Hong [email protected], [email protected]

Anthony Man-Cho SoThe Chinese University of Hong Kong, Hong [email protected]

MS102

Convergence of Asynchronous Algorithms withUnbounded Delay

Asynchronous algorithms have the potential to vastly speedup parallel computations by eliminating the idle time inwhich computing nodes wait for the updates of other nodes.Their advantages may be even greater at scale, where syn-chronization may be the dominant computational cost.However proving convergence of asynchronous algorithmscan be complicated. In this talk we prove the convergenceof a very general asynchronous-parallel algorithm underunbounded delays, which can be either stochastic or de-terministic. This is done using novel Lyapunov functiontechniques, which can be generalized to other convergenceanalyses.

Robert R. HannahUniversity of California, Los [email protected]

Wotao YinUniversity of California, Los AngelesDepartment of [email protected]

MS102

Accelerating Optimization under Uncertainty viaOnline Convex Optimization

We consider two paradigms that account for uncertaintyin optimization models: robust optimization (RO) andjoint estimation-optimization (JEO). We examine recentdevelopments on efficient and scalable iterative first-ordermethods for these problems, and show that these iterativemethods can be viewed through the lens of online convexoptimization (OCO). Our applications of interest presentfurther flexibility in OCO via three simple modificationsto standard OCO assumptions: we introduce two new con-cepts of weighted regret and online saddle point problemsand study the possibility of making lookahead (anticipa-tory) decisions. We demonstrate how these new flexibilitiesare instrumental in exploiting structural properties of func-tions, and provide improved convergence rates for iterativeRO and JEO methods.

Nam Ho-NguyenCarnegie Mellon [email protected]

Fatma Kilinc-KarzanTepper School of BusinessCarnegie Mellon [email protected]

MS102

Faster Convergence Rates for Subgradient Meth-ods under an Error Bound Condition

The purpose of this work is to provide new analyses of sev-eral subgradient methods for convex functions satisfying an

Page 110: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 199

error bound condition. Under mild assumptions this errorbound condition is equivalent to the well-known Kurdukya-�Lojasiewicz (KL) inequality and is weaker than uniformconvexity. To this end, there are two main contributions.First, new nonergodic, nonasymptotic, convergence ratesare provided for the classical subgradient method underthe common scenario of decaying but nonsummable step-sizes. Second, for a constant but sufficiently small stepsize,it is shown that the subgradient method achieves linearconvergence up to a certain tolerance level that dependson the stepsize and other problem parameters. While thiswas previously known for smooth convex functions satisfy-ing the Polyak-�Lojasiewicz inequality, we extend it to themore general KL inequality and do not require smooth-ness. When the KL exponent is less than or equal to 1/2we show that for a particular choice of decaying stepsize thesubgradient method achieves the same convergence rate asa recently proposed restarted method. We also derive newconvergence rates for stochastic problems.

Patrick Johnstone, Pierre MoulinUniversity of Illinois at [email protected], [email protected]

MS102

QuickeNing: A Generic Quasi-Newton Algorithmfor Faster Gradient-Based Optimization

We propose an approach to accelerate gradient-based op-timization algorithms by giving them the ability to exploitcurvature information using quasi-Newton update rules.The proposed scheme, called QuickeNing, is generic andcan be applied to a large class of first-order methods suchas incremental and block-coordinate algorithms; it is alsocompatible with composite objectives, meaning that it hasthe ability to provide exactly sparse solutions when the ob-jective involves a sparsity-inducing regularization. Quick-eNing relies on limited-memory BFGS rules, making it ap-propriate for solving high-dimensional optimization prob-lems; with no line-search, it is also simple to use and toimplement. Besides, it enjoys a worst-case linear conver-gence rate for strongly convex problems. We present ex-perimental results where QuickeNing gives significant im-provements over competing methods for solving large-scalehigh-dimensional machine learning problems.

Hongzhou [email protected]

Julien MairalInria [email protected]

Zaid HarcahouiUniversity of [email protected]

MS103

Necessary Optimality Conditions for Some Non-convex Facility Location Problems

The problem of locating a new facility with simultaneousconsideration of existing attraction and repulsion points isa problem with great practical relevance e.g. in the fields ofeconomy, city planning or industrial design. Unfortunately,the consideration of negative weights makes this problemin general a nonconvex one, so none of the established al-gorithms for location problems are able to solve it. We will

therefore present a new approach to derive necessary op-timality conditions for such problems using the nonconvexsubdifferentials by Ioffe and Kruger/Mordukhovich. Whilethere are many strong theoretical results on these subdif-ferentials, it is rarely possible to explicitly calculate themor use them for applications. After giving a brief review ondefinition, properties and calculus of the mentioned sub-differentials we will precisely calculate the correspondingsubdifferentials for certain distance functions. By takingadvantage of the special structure of the problems we canthen derive necessary optimality conditions for some in-stances of scalar semi-obnoxious facility location problems.Furthermore, we will present some new scalarization resultsand use them to establish necessary optimality conditionsfor multicriteria semi-obnoxious facility location problems.At the end of the talk, we will give an outlook on openquestions and possible future developments.

Marcus HillmannMartin-Luther-Universitat Halle-WittenbergTheodor-Lieser-Strae 5, 06120 Halle, [email protected]

MS103

Nonsmooth Optimization Framework for Elastog-raphy Inverse Problem of Tumor Identification

This talk will focus on a nonsmooth formulation of the in-verse problem of identifying a variable parameter in mixedvariational problems. We will talk about various optimiza-tion formulations for this inverse problem including a re-cently introduced convex modified output-least-squares, anonconvex output-least-squares, among others. We willpresent some new stability estimates for the inverse prob-lem under data perturbation using the OLS and the MOLSformulations. Using techniques from multi-objective opti-mization, we will shed some light onto a proper selectionof the optimal regularization parameter. We will presentapplications of our theoretical results to the elastographyinverse problem of tumor identification. Numerical exam-ples will be given.

Baasansuren JadambaRochester Institute of [email protected]

MS103

Stable Identification in Ill-Posed Variational Prob-lems

In this talk, we will discuss a non-regular PDE-constrainedoptimization problem with scalar as well as vector objec-tives. By employing a constructible representation of theconstraint cone, we will devise a new family of dilatingcones which is closely related to the Henig dilating cone,and use it to introduce a family of conically regularizedproblems. We will give new stability estimates for the reg-ularized problems in terms of the regularization parameter.We will apply our results to L1-constrained least-squaresproblems. This is a joint work with Prof. Miguel Sama.

Akhtar KhanRochester Institute of [email protected]

MS103

Pontryagin Maximum Principle for Optimal Con-

Page 111: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

200 OP17 Abstracts

trol on Banach Manifolds

We prove the Pontryagin Maximum Principle for optimalcontrol problems in which the state evolves on a Banachmanifold, introducing and illustrating the method of La-grangian charts for analysis of geometric control problems.In addition we demonstrate connections between a type ofcontrollability, exact penalization of constraints, and nor-mality of extremals.

Robert KipkaKent University,North Canton, OH 44720, [email protected]

Yuri LedyaevWestern Michigan [email protected]

MS104

Numerical Experiments on Dynamically ChoosingCutting Models and Scalings in Bundle Methods

Bundle methods are frequently employed for minimizingthe sum of convex functions. Large scale problems of thistype arise e.g. in Lagrangian relaxation of resource con-straints that couple a multitude of objects involved in somescheduling task or in scenario based stochastic program-ming applications. If a few objects or scenarios turn outto be critical in this process, it may be worth to providemore detailed cutting models for those while the othersare subsumed in a common model. Which ones are criti-cal is typically not known in advance and may well changethroughout the optimization process. We report on exper-iments within the callable library ConicBundle for dynam-ically identifying critical parts and adapting the bundlechoices and scaling information accordingly.

Christoph HelmbergTU [email protected]

MS104

On Solving the Quadratic Shortest Path Problem

The goal of the quadratic shortest path problem (QSPP) isto find a path in a digraph such that the sum of weights ofarcs plus the sum of interaction costs over all pairs of arcson the path is minimized. An instance of the quadraticshortest path problem (QSPP) is said to be linearizableif there exists an instance of the shortest path problem(SPP) on the same graph such that the cost of each s-t path in both problems is the same. Although the lin-earization problem for quadratic assignment problem iswell studied, checking the linearizability of the QSPP isstill an open question. In this talk, we present an algorithmthat examines whether a QSPP instance on the directedgrid graph is linearizable. This algorithm has complexityO(p3q2 + p2q3) where p, q ≥ 2 are the size of the directedgrid graph. We will also consider other special cases of theQSPP.

Hao HuTilburg University

[email protected]

MS104

Computational Study of Some Valid Inequalitiesfor k-Way Graph Partitioning

We consider the maximum k-cut problem that consists inpartitioning the vertex set of a graph into k subsets suchthat the sum of the weights of edges joining vertices in dif-ferent subsets is maximized. We focus on identifying effec-tive classes of inequalities to tighten the semidefinite pro-gramming relaxation. We carry out an experimental studyof four classes of inequalities from the literature: clique,general clique, wheel and bicycle wheel. We consideredmultiple combinations of these classes and tested them onboth dense and sparse instances for different values of k.Our computational results suggest that the bicycle wheeland wheel are the strongest inequalities for k = 3, andthat for greater k, the wheel inequalities are the strongestby far. Furthermore, we observe an improvement in theperformance for all choices of k when both bicycle wheeland wheel are used, at the cost of substantially increasedCPU time on average when compared with using only oneof them.

Vilmar Rodrigues de Sousa, Miguel F. AnjosMathematics and Industrial Engineering & GERADPolytechnique [email protected],[email protected]

Sebastien Le DigabelEcole Polytechnique de [email protected]

MS104

Graph Bisection Revisited

The graph bisection problem (GBP) is the problem of par-titioning the vertex set of a graph into two sets of givensizes such that the total weight of edges joining these setsis minimized. In this talk we present a semidefinite pro-gramming relaxation for the GBP with a matrix variable oforder n, where n is the order of the graph. Our relaxation isequivalent to the currently strongest semidefinite program-ming relaxation by Wolkowicz and Zhao from 1999, withmatrices of order 2n. To further improve our relaxation,we impose valid inequalities. Our strengthened SDP relax-ation outperforms all previously considered SDP boundsfor the GBP, including those tailored for highly symmetricgraphs.

Renata SotirovTilburg [email protected]

MS105

From Clifford Algebras and Rigidity Theory toLarge Completely Positive Semidefinite Rank

Is the set of quantum correlations closed? This is an im-portant question in quantum information theory and weuse it as a motivation for this talk. We approach thisquestion by looking at a related object: the cone of com-pletely positive semidefinite matrices. An n×n matrix A iscalled completely positive semidefinite (cpsd) if there existpositive semidefinite d × d matrices X1, . . . , Xn such thatA = (Tr(XiXj)). The smallest such d is called the cpsd-

Page 112: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 201

rank of A. If there exists an upper bound on this rank interms of n then the cone of completely positive semidefinitematrices is closed. Since the set of quantum correlationsforms an affine slice of the cpsd cone this would settle ourmotivating question. We show that if such an upper boundexists then it has to be at least exponential in n. For thiswe use the seminal work of Tsirelson, connecting bipartitecorrelations to quantum correlations, and the above men-tioned connection between quantum correlations and cpsdmatrices. Bipartite correlations arise as a projection of themore commonly known correlation matrices. We proceedto show the optimality of our construction by connectingrecent results from rigidity theory to the work of Tsirelson.

Sander Gribling, David de LaatCentrum Wiskund & Informatica, The [email protected], [email protected]

Monique LaurentCWI, Amsterdamand Tilburg [email protected]

MS105

Quantum and Non-Signalling Graph Isomorphisms

We introduce a two-player nonlocal game, called the(G,H)-isomorphism game, where classical players can winwith certainty if and only if the graphs G and H are iso-morphic. We then define the notions of quantum and non-signalling isomorphism, by considering perfect quantumand non-signalling strategies for the (G,H)-isomorphismgame, respectively. First, we prove that non-signalling iso-morphism coincides with the well-studied notion of frac-tional isomorphism. Second, we show that quantum iso-morphism is equivalent to the feasibility of two polynomialsystems in non-commuting variables, obtained by relaxingthe standard integer programming formulations for graphisomorphism to Hermitian variables. On the basis of thiscorrespondence, we show there exist graphs that are quan-tum isomorphic but not isomorphic.

David RobersonNanyang Technological [email protected]

MS105

Some Problems Concerned with Graph Decompo-sitions

We consider various computational questions arising fromthe decomposition of (combinatorial) Laplacian matricesof graphs as convex combinations of Kronecker products.The questions have been originally proposed in the con-text of quantum information theory, in connection with aspecial case of the separability problem for density matri-ces. While the general problem is known to be NP-hard,the restricted one is feasible in a number of cases, but itscomplexity has not been yet determined.

Simone SeveriniDepartment of Computer ScienceUniversity College [email protected]

MS105

Quantum Correlations: Conic Formulations, Di-

mension Bounds and Matrix Factorizations

In this talk we focus on the sets of two-party correlationsgenerated from a Bell scenario involving two spatially sep-arated systems with respect to various physical models.We show that the sets of quantum and classical correla-tions can be expressed as projections of affine sections ofthe completely positive semidefinite (cpsd) cone and thecompletely positive cone, respectively. Furthermore, thesmallest size of a quantum representation for a quantumcorrelation is equal to the smallest cpsd-rank over all itscpsd completions. This correspondence allows to set up adictionary between properties of quantum correlations andfeatures the cpsd cone and additionally, between dimen-sionality questions concerning quantum correlations andalgebraic properties of the cpsd-rank. My goal in this talkis to give a brief summary of the most important resultsobtained in both directions.

Antonios VarvitsiotisNanyang Technological University, SingaporeCentre for Quantum Technologies, [email protected]

MS106

Parallel Subspace Correction Method for StronglyConvex Problem

We investigate a parallel subspace correction (PSC) frame-work for strongly convex optimization based domain de-composition method. At each iteration, the algorithm solvesubproblems simultaneously to derive directions on sub-spaces and take steps sizes along the directions to find anew point. We establish the linear convergence rate of ouralgorithm under some mild condition about inexact solu-tion of the subproblem, and design a new step size strat-egy which can both guarantee the sufficient reduction ofthe objective function and the uniform upper bound of thestep sizes. The numerical comparison about with other ex-isting PSC methods shows the potential of our algorithm.Moreover, we compare our method with the BB methodfor solving large scale strongly problem from PDE whichshows that our method is very efficient and powerful.

Qian DongAcademy of Mathematics and Systems Science, [email protected]

Xin LiuAcademy of Mathematics and Systems ScienceChinese Academy of [email protected]

Zaiwen WenPeking UniversityBeijing International Center For Mathematical [email protected]

Ya-Xiang YuanChinese Academy of SciencesInst of Comp Math & Sci/Eng, [email protected]

MS106

Distributed Algorithms for Orthogonal Con-strained Optimization Problem

To construct distributed approach for solving orthogonal

Page 113: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

202 OP17 Abstracts

constrained optimization problems is usually regarded asimpossible mission, due to the low scalability of orthogo-nalization procedure. In this talk, we propose a Jacobi typecolumn-wise block coordinate descent method for solving aclass of orthogonal constrained optimization problems, andestablish the global iterate convergence to stationary pointof our proposed approach. A distributed algorithm is con-sequently implemented. Numerical experiments illustratethat the new algorithm has brilliant performance and highscalability in solving discretized Kohn-Sham total energyminimization problems.

Xin Liu, Bin GaoAcademy of Mathematics and Systems ScienceChinese Academy of [email protected], [email protected]

Ya-Xiang YuanChinese Academy of SciencesInst of Comp Math & Sci/Eng, [email protected]

MS106

New Gradient Methods Improving Yuan’s Step-Size

We propose a new framework to generate stepsizes for gra-dient methods by improving Yuan’s stepsize. By adoptingdifferent criterions, we propose several new gradient meth-ods. For 2-dimensional convex quadratic problems, weprove that the new methods have either finite terminationsor converges R-superlinearly. For general n-dimensionalconvex quadratic problems, we prove that the new meth-ods converges R-linearly. Numerical experiments show thatthe new methods enjoys lower complexity and outperformsthe compared gradient methods.

Cong SunBeijing University of Posts and [email protected]

Jinpeng LiuBeihang [email protected]

MS106

Total Least Squares with Tikhonov Regularization:Hidden Convexity and Efficient Global Algorithms

The Tikhonov regularized total least squares (TRTLS) isa non-convex optimization problem dealing with an over-estimated system of linear equations. A standard approachfor (TRTLS) is to reformulate it as a problem of findinga zero point of some decreasing concave nonsmooth uni-variate function such that the classical bisection searchand Dinkelbach’s method can be applied. In this paper,by exploring the hidden convexity of (TRTLS), we refor-mulate it as a new problem of finding a zero point of astrictly decreasing, smooth and concave univariate func-tion. This allows us to apply the classical Newtons methodto the reformulated problem, which converges globally tothe unique root with an asymptotic quadratic convergencerate. Moreover, at every iteration of the Newtons method,no optimization subproblem such as the extended trust-region subproblem is needed to evaluate the new univariatefunction value as it has an explicit expression. Promisingnumerical results based on the new algorithm are reported.

Meijia Yang

Beihang [email protected]

Yong XiaSchool of Mathematics and System SciencesBeihang [email protected]

Jiulin WangBeihang [email protected]

Jiming PengUniversity of [email protected]

MS107

Taylor Approximation for PDE-Constrained Op-timal Control Problems under High-DimensionalUncertainty

In this talk, we present an efficient method based on Taylorapproximation for PDE-constrained optimal control prob-lems under high-dimensional uncertainty. The computa-tional complexity of the method does not depend on thenominal but only on the intrinsic dimension of the uncer-tain parameter, thus the curse of dimensionality is brokenfor intrinsically low-dimensional problems. Further correc-tion for the Taylor approximation is proposed, which leadsto an unbiased evaluation of the statistical moments in theobjective function. We apply our method for a turbulencemodel with infinite-dimensional random viscosity.

Peng ChenUT [email protected]

Omar GhattasThe University of Texas at [email protected]

Umberto VillaUniversity of Texas at [email protected]

MS107

Robust Gradient-Based Shape Optimization forCFD Applications

With advances in numerical techniques and increase ofavailable computational power, Computational Fluid Me-chanics has evolved from an analysis tool towards a de-sign tool. Despite the widespread use of optimization al-gorithms to design engineering parts, these are mostly em-ployed on deterministic problems where the operating andmanufacturing conditions are well known. In reality how-ever, operational and manufacturing conditions are seldomdeterministic, e.g. wind direction and manufacturing tol-erances. For such problems, a deterministic optimizationapproach may result in an optimal that is very sensitiveto the uncertainties, i.e., lacks robustness, thus leadingto a lower overall performance. In a robust optimizationapproach, uncertainties are accounted by introducing thevariance of the objective as a second objective. This ap-proach however, suffers from the curse of dimensionality.In this context we developed a gradient based optimiza-tion technique using adjoint methods combined with Poly-nomial Chaos Expansions (PCE) to efficiently calculate the

Page 114: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 203

gradients (mean objective and variance). These gradientsare then combined, using weighted averaging, to find thebest solution for each problem. The proposed approachis applied to shape optimization of internal and externalflows. Different weighting factors allowed to control thelevel of robustness of each design. Overall a more robustdesign led to a worse mean performance. For this cases aParetto front is obtained.

Joao Duarte Carrilho MirandaVrije Universiteit [email protected]

Francesco ContinoVrije Universiteit BrusselBrussels University (VUB)[email protected]

Chris LacorVrije Universiteit [email protected]

MS107

Robustness Measures for Robust Design with Mul-tiple Objectives

Engineering design often involves the optimization of dif-ferent competing objectives. Here, the aim is to find aset of solutions that fulfill the concept of Pareto optimal-ity. A further significant step to realistic multi-objectivedesigns is to take into account uncertainties affecting thesystem for finding robust optimal solutions. In a previ-ous work, multi-objective optimization and robust designwere combined using the epsilon-constraint approach to-gether with the non-intrusive polynomial chaos approachin the context of aerodynamic shape optimization. Thecomputational effort for solving the sequence of resultingsingle-objective optimization problems was reduced by us-ing a one-shot approach suited for constrained optimiza-tion problems as well as dimension-adaptivesparse grids.The aim is to extend the existing approach and investigateon robustness measures in the context of multi-objectiveoptimization. Apart from the use of common stochasticquantities like mean and variance, we propose the use ofa signed distance function as an additional constraint foroptimization to account for average losses in the multi-objective space. The use of more sophisticated robustnessmeasures can make the search for a global optimum moredifficult and more computationally expensive. As a result,a global search is conducted on a surrogate model that isbuilt using a Gaussian process model. The model is refinedfor optimization using an expected improvement method.

Lisa KuschTU [email protected]

Lionel MathelinLIMSI - CNRS, [email protected]

Nicolas R. GaugerTU [email protected]

MS107

Optimal Sensor Placement Subject to Efficiency

and Robustness Constraints

We consider the problem of optimally placing sensor net-works for parabolic stochastically perturbed processes un-der constraints involving the robustness of measurementsand efficiency criteria. The mathematical formulationarises from applications in engineering and science thatinclude location of thermostats in large buildings and in-dustrial processes. We formulate an optimization problemwhere the Riccati equation is a constraint, the design vari-ables are sensor locations, and the objective functional in-volves the trace of the solution to the Riccati equation. Inaddition, we consider constraints associated with robust-ness and efficiency: The robustness criterion is formulatedin terms of perturbations to the differential operator ofthe underlying process and the efficiency one involves per-formance of prescribed sensor locations. We address well-posedness of the problem, and provide numerical tests.

Carlos N. RautenbergHumboldt-University of [email protected]

MS108

Dual Extrapolation for Solving Variational Inequal-ities with a Linear Optimization Oracle

We present a novel method for solving variational inequal-ities that relies only on the ability to solve simple (e.g.,linear optimization) subproblems over the region of inter-est. The main idea of the method is to apply a generaliza-tion of the Frank-Wolfe method to solve the subproblemswithin the “dual extrapolation” method for variational in-equalities due to Nesterov. By combining analyses of thedual extrapolation method and the Frank-Wolfe method,we show that our method is able to achieve a solution withaccuracy ε using at most: (i) O(1/ε4) simple (e.g., linearoptimization) subproblem oracle calls in the case of mono-tone operators with bounded variation, and (ii) O(1/ε2)simple (e.g., linear optimization) subproblem oracle calls inthe case of Lipschitz continuous operators. We also discussextensions and related results that apply to non-smoothconvex optimization.

Paul GrigasUC [email protected]

Yurii NesterovCOREUniversite catholique de [email protected]

MS108

Relatively-Smooth and Relatively-ContinuousConvex Optimization by First-Order Methods,and Applications

The usual approach to developing and analyzing first-ordermethods for smooth or non-smooth convex optimizationassumes that the gradient or the function is uniformlysmooth with some Lipschitz constant L or M , respec-tively. However, in many settings the convex functionf(·) satisfies neither of these two assumptions– for ex-ample, f(x) := − ln det(HXHT ) in D-optimal design, orf(w) := 1

n

∑ni=1 max{0, 1 − yix

Ti w} + λ

2‖w‖22 in support

vector machines. Herein we develop and present a differentway to define smoothness, strong convexity and “uniform”continuity that is based on a user-specified “reference func-

Page 115: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

204 OP17 Abstracts

tion” h(·) that should be computationally tractable; and weshow that most convex functions are smooth (or continu-ous) with respect to a corresponding relatively-simple ref-erence function h(·). We extend two standard algorithms– the prox gradient scheme and the dual averages scheme– to our new smooth setting, and we extend the subgra-dient scheme to our new non-smooth setting, with associ-ated computational guarantees. Last of all, we apply ournew approach to develop a new first-order method for theD-optimal design problem, with associated computationalcomplexity analysis. This is a joint work with Robert Fre-und (MIT) and Yurii Nesterov (UCL).

Haihao LuMITOperations Research [email protected]

Robert M. FreundMITSloan School of [email protected]

Yurii NesterovCOREUniversite catholique de [email protected]

MS108

Detecting Communities by Voting Model

In this talk, we analyze a voting model based on randompreferences of participants. Each voter can choose a partywith certain probability, which depends on the divergencebetween his personal preferences and the position of theparty. Our model represents a rare example of a commu-nity detection (or, clustering) model with unique equilib-rium solution. This is achieved by allowing the flexiblepositions for the parties (centers of clusters). We proposean efficient algorithm for finding the equilibrium state withlinear rate of convergence. It can be interpreted an alter-nating minimization scheme. Again, this is a rare examplewhen such a scheme has provable global rate of conver-gence.

Yurii NesterovCOREUniversite catholique de [email protected]

MS108

Sample Complexity Analysis for Low-Order Opti-mization Algorithms

In this talk we discuss sample complexities for variouszeroth-order algorithms for (stochastic) convex optimiza-tion. Essentially, the analysis is meant to understand howmany times one will need to call the (stochastic) zeroth-order oracle in order to get an ε optimal solution in expec-tation. Such algorithms and analysis are important for var-ious Bayesian optimization models and applications arisingfrom revenue management. We shall also extend our dis-cussions to include certain non-convex models.

Shuzhong ZhangDepartment of Industrial & Systems EngineeringUniversity of Minnesota

[email protected]

MS109

Prioritizing Hepatitis C Treatment Decisions inU.S. Prisons

Hepatitis C virus (HCV) prevalence in prison systems isten times higher than the general population, and henceprison systems offer a unique opportunity to control theHCV epidemic. New treatment drugs are effective, butproviding treatment to all inmates is prohibitively expen-sive, which precludes universal HCV treatment in prisonsystems. As such, current practice recommends prioritizingtreatment in prisons; however, there is controversy abouthow prioritization decision should be made. We proposea restless bandit modeling framework to support hepatitisC treatment prioritization decisions in U.S. prisons. Wefirst prove indexability and derive several structural prop-erties of the well-known Whittle’s index, based on which,we derive a closed-form expression for Whittles index forpatients with advanced liver disease. From the interpre-tation of this closed-form expression, we anticipate thatthe performance of the Whittle’s index would degrade asthe treatment capacity increases; and we further proposea capacity-adjusted closed-form index policy. We param-eterize and validate our model using real-world data fromGeorgia state prison system and published studies. We testthe performance of our proposed policy using a detailed,clinically-realistic simulation model and show that our pro-posed policy can significantly improve the overall effective-ness of the hepatitis C treatment programs in prisons com-pared several benchmark policies, including the Whittle’sindex policy.

Turgay AyerGeorgia Institute of [email protected]

Can Zhang, Anthony BonifonteGeorgia [email protected], [email protected]

Anne SpauldingEmory [email protected]

Jagpreet ChhatwalHarvard Medical [email protected]

MS109

Optimized Treatment Schedules for ChronicMyeloid Leukemia

Over the past decade, several targeted therapies have beendeveloped to treat Chronic Myeloid Leukemia (CML). De-spite an initial response to therapy, drug resistance remainsa problem for some CML patients. Recent studies haveshown that resistance mutations that preexist treatmentcan be detected in a substantial number of patients, andthat this may be associated with eventual treatment fail-ure. One proposed method to extend treatment efficacy isto use a combination of multiple targeted therapies. How-ever, the design of such combination therapies (timing, se-quence, etc.) remains an open challenge. In this work wemathematically model the dynamics of CML response tocombination therapy and analyze the impact of combina-tion treatment schedules on treatment efficacy in patients

Page 116: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 205

with preexisting resistance. We then propose an optimiza-tion problem to find the best schedule of multiple therapiesbased on the CML evolution according to our ordinary dif-ferential equation model. The resulting optimization prob-lem contains ordinary differential equation constraints andinteger variables. By exploiting the problem structure, weare able to find near optimal solutions within an hour. Wedetermine optimal combination strategies that maximizetime until treatment failure, using parameters estimatedfrom clinical data in the literature. This is joint work withJasmine Foo, Kevin Leder, Junfeng Zhu from the Univer-sity of Minnesota and David Dingli from the Mayo Clinic.

Qie HeUniversity of [email protected]

MS109

Response-Guided Dosing

Within the broad field of personalized medicine, therehas been a recent surge of clinical interest in the idea ofresponse-guided dosing. Roughly speaking, the goal is todevise strategies that administer the right dose to the rightpatient at the right time. We will present stochastic mod-els that attempt to formalize such optimal dosing prob-lems. Theoretical results about the structure of optimaldosing strategies and associated solution methods rooted inconvex optimization, stochastic dynamic programming, ro-bust optimization, and Bayesian learning will be described.Computational results on rheumatoid arthritis will be dis-cussed.

Jakob KotasUniversity of [email protected]

Archis GhateUniversity of [email protected]

MS110

Mesh Adaptive Direct Search Algorithms for Mul-tifidelity Optimization Problems

Multifidelity optimization occurs in engineering designwhen multiple engineering simulation codes are available atdifferent levels of fidelity. An example of this might be inaerodynamic optimization of the shape of a wing, in whichthe computation of aerodynamic quantities, such as lift anddrag, can be computed using a full Navier-Stokes solver, oran Euler solver, or a linearized potential code. High fidelitysimulations are more accurate, but also more computation-ally expensive. The goal of this work is the design of algo-rithms that optimize with respect to the high-fidelity sim-ulation, but exploit the use of lower fidelity codes as muchas possible. Two new surrogate-based mesh adaptive di-rect search (MADS) algorithms will be presented, in whichinterpolating surrogates are constructed and updated frompreviously evaluated iterates of the algorithm to speed con-vergence. The first algorithm employs a recursive Searchstep that optimizes a surrogate function constructed fromthe next lower fidelity level simulation augmented with aninterpolating surrogate that accounts for the difference be-tween adjacent levels of fidelity. The second approach is anaugmentation of the optimization problem, in which the fi-delity level is incorporated as a variable in the problem, anda relaxable constraint is added to force the solution to be

at the highest level of fidelity. Some preliminary numericalresults are presented.

Mark AbramsonBrigham Young [email protected]

MS110

Hybrid Derivative-Free Methods for CompositeNonsmooth Optimization

In this talk, the combination of global solvers andderivative-free trust-region algorithms is considered forminimizing a composite function Φ(x) = f(x) + h(c(x)),where f and c are smooth and h is convex but may benon-smooth. Global convergence results and worst-casecomplexity bounds are discussed. The performance of thealgorithms is illustrated considering the parameter estima-tion of biochemical models.

Geovani N. GrapigliaFederal University of [email protected]

MS110

Manifold Sampling for Piecewise Linear NonconvexOptimization

We develop a manifold sampling algorithm for the uncon-strained minimization of a nonsmooth composite functionh◦F when h is a nonsmooth, piecewise linear function andF is smooth but expensive to evaluate. The trust-regionalgorithm classifies points as in the domain of h as belong-ing to different manifolds, and uses this knowledge whencomputing search directions. Since h is known, classify-ing objective manifolds using only the function values Fi issimple. We prove that all cluster points of the sequence ofalgorithm iterates are Clarke stationary; this holds thoughpoints evaluated by the algorithm are not assumed to bedifferentiable and when only approximate derivatives of Fare available. Numerical results show that manifold sam-pling using zero-order information is competitive with gra-dient sampling algorithms that are given exact gradientvalues.

Jeffrey LarsonArgonne National [email protected]

Kamil KhanMassachusetts Institute of TechnologyDepartment of Chemical [email protected]

Kamil KhanArgonne National LaboratoryMathematics and Computer Science [email protected]

Stefan WildArgonne National [email protected]

MS110

A Derivative-Free Vu-Algorithm for Convex Finite-

Page 117: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

206 OP17 Abstracts

Max Functions

The VU-algorithm is a superlinearly convergent methodfor minimizing nonsmooth, convex functions. At each iter-ation, the algorithm separates the space into two orthog-onal subspaces, called the V-space and the U-space, suchthat the nonsmoothness of the objective function is due toits projection onto the V-space only, and on the U-spacethe projection is smooth. This structure allows for a quasi-Newton step parallel to the U-space, then parallel to theV-space a proximal-point step is taken. The point of thiswork is to establish a derivative-free variant of the VU-algorithm for convex, finite-max objective functions. It isshown that approximate subgradients are sufficient to getconvergence to within acceptable tolerance in this setting.

Chayne PlanidenUniversity of British Columbia [email protected]

MS111

Control and Estimation of Traffic Flow on Net-works

This talk presents a new mixed integer programming for-mulation of traffic density estimation and control problemsin highway networks modeled by the Lighthill WhithamRichards Partial Differential Equation. We first present anequivalent formulation of the problem using an Hamilton-Jacobi equation. Then, using a semi-analytic formula,we show that the model constraints resulting from theHamilton-Jacobi equation result in linear constraints, al-beit with unknown integers. We then pose the problemof estimating the density of traffic, given incomplete andinaccurate traffic data, as a Mixed Integer Program. Wealso show that the same framework can be used for bound-ary flow control on single highway links and networks. Wethen present a numerical implementation of both estima-tion and control methods on realistic examples involvingexperimental measurement data.

Simoni MicheleUT [email protected]

Edward CanepaKing Abdullah University of Science and TechnologyDepartment of Electrical [email protected]

Christian ClaudelUniversity of Texas at [email protected]

MS111

Total Variation Diminishing Runge-Kutta Methodsfor the Optimal Control of Conservation Laws: Sta-bility and Order-Conditions

Optimal control problems subject to conservation laws areamong challenging optimization problems due to difficul-ties that arise when shocks are present in the solution. Suchoptimization problems have application, for instance, ingas networks. Beside theoretical difficulties at the contin-uous level, discretization of such optimal control problemsshould be done with care in order to guarantee conver-gence. In this talk, we present stability results of the totalvariation diminishing (TVD) Runge-Kutta (RK) methodsfor such optimal control problems. In particular we show

that enforcing strong stability preserving (SSP) for bothforward and adjoint problem results to a first order time-discretization. However, requiring SSP only for forwardproblem is sufficient to obtain a stable discrete adjoint. Wealso present order-conditions for the TVD-RK methods inthe optimal control context.

Soheil HajianHumboldt University of [email protected]

Michael HintermuellerWeierstrass Institute for Applied Analysis and [email protected]

MS111

Optimal Control of Hyperbolic Balance Laws withState Constraints

This talk deals with the treatment of pointwise state con-straints in the context of optimal control of hyperbolic non-linear scalar balance laws. We study an optimal controlproblem governed by balance laws with an off/on switchingdevice that allows to interrupt the flux, where the switchingtimes between on- and off-modes are the control variables.Such a problem arises, for example, when considering theoptimal control of the traffic density on a one-way streetwith a traffic light in the middle. The appearance of stateconstraints presents a special challenge, since solutions ofnonlinear hyperbolic balance laws develop discontinuitiesafter finite time, which prohibits the use of standard meth-ods. In this talk, we will build upon the recently devel-oped sensitivity- and adjoint calculus by Pfaff and Ulbrichto derive necessary optimality conditions. In addition, wewill use Moreau-Yosida regularization for the algorithmictreatment of the pointwise state constraints. Hereby, wewill prove convergence of the optimal controls and weakconvergence of the corresponding Lagrange multiplier esti-mates of the regularized problems.

Johann SchmittTechnische Universitaet [email protected]

Stefan UlbrichTechnische Universitaet DarmstadtFachbereich [email protected]

MS111

Chance Constrained Optimization on Gas Net-works Governed by the Isothermal Euler Equations

We study optimization problems on gas networks governedby the isothermal Euler equations with nonconstant com-pressibility factor under uncertainty. The deterministicproblem consists of choosing a pressure control such thatprescribed pressure bounds are fulfilled on the whole net-work. However, the demands of customers, i.e., the bound-ary mass flows, are typically unknown. Statistical infor-mation for the demands is available and allows the con-struction of a distribution function (e.g. normal) arounda mainly temperature dependent mean value. We considera chance constrained (or probabilistic constrained) opti-mization problem, i.e., we replace our original constraintsby one constraint stating that the probability to be feasibleshould exceed a prescribed threshold.

David Wintergerst

Page 118: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 207

Friedrich-Alexander-Universitat [email protected]

MS112

Rapid, Robust, and Reliable Blind Deconvolutionvia Nonconvex Optimization

We study the question of reconstructing two signals f andg from their convolution y = f ∗ g. This problem, knownas blind deconvolution, pervades many areas of science andtechnology, including astronomy, medical imaging, optics,and wireless communications. A key challenge of this intri-cate non-convex optimization problem is that it might ex-hibit many local minima. We present an efficient numericalalgorithm that is guaranteed to recover the exact solution,when the number of measurements is (up to log-factors)slightly larger than the information-theoretical minimum,and under reasonable conditions on f and g. The pro-posed regularized gradient descent algorithm converges ata geometric rate and is provably robust in the presence ofnoise. To the best of our knowledge, our algorithm is thefirst blind deconvolution algorithm that is numerically effi-cient, robust against noise, and comes with rigorous recov-ery guarantees under certain subspace conditions. More-over, numerical experiments do not only provide empiricalevidence of our theory, but they also demonstrate that ourmethod yields excellent performance even in situations be-yond our theoretical framework.

Xiaodong LiDepartment of StatisticsUniversity of California [email protected]

Shuyang LingDepartment of MathematicsUniversity of California at [email protected]

Thomas StrohmerUniversity of California,DavisDepartment of [email protected]

Ke WeiDepartment of MathematicsUniversity of California at [email protected]

MS112

Breaking Sample Complexity Barriers via Noncon-vex Optimization?

In the past decade there has been significant progress inunderstanding when convex relaxations are effective forfinding low complexity models from a near minimal num-ber of data samples (e.g. sparse/low rank recovery from afew linear measurements). Despite such advances convexoptimization techniques are often prohibitive in practicedue to computational/memory constraints. Furthermore,in some cases convex programs are suboptimal in terms ofsample complexity and provably require significantly moredata samples than what is required to uniquely specify thelow complexity model of interest. In fact for many suchproblems certain sample complexity barriers have emergedso that there are no known computationally tractable al-gorithm that can beat the sample complexity achieved bysuch convex relaxations. Motivated by a problem in imag-

ing, In this talk I will discuss my recent results towardsbreaking such barriers via natural nonconvex optimizationtechniques.

Mahdi SoltanolkotabiUniversity of Southern [email protected]

MS112

Guarantees of Riemannian Optimization for LowRank Matrix Reconstruction

We establish theoretical recovery guarantees of a family ofRiemannian optimization algorithms for low rank matrixreconstruction, which is about recovering an m× n rank rmatrix from p < mn number of linear measurements. Thealgorithms are first interpreted as the iterative hard thresh-olding algorithms with subspace projections. Then, basedon this connection, we prove that with a proper initial guessthe Riemannian gradient descent method and a restartedvariant of the Riemannian conjugate gradient method areguaranteed to converge to the measured rank r matrix pro-vided the number of measurements is proportional to nr2

up to a log factor. Empirical evaluation shows that the al-gorithms are able to recover a low rank matrix from nearlythe minimum number of measurements necessary.

Ke WeiDepartment of MathematicsUniversity of California at [email protected]

Jian-Feng CaiDepartment of MathematicsHong Kong University of Science and [email protected]

Tony [email protected]

Shingyu LeungHong Kong University of Science and [email protected]

MS112

Nonconvex Recovery of Low-Complexity Models

General nonconvex optimization problems are NP-hard. Inapplied disciplines, however, nonconvex problems abound,and heuristic algorithms are often surprisingly effective.The ability of nonconvex heuristics to find high-qualitysolutions for practical problems remains largely mysteri-ous. In this talk, I will describe a family of nonconvexproblems which can be solved efficiently. This family hasthe characteristic structure that (1) every local minimizeris also global, and (2) the objective function has a neg-ative directional curvature around all saddle points (rid-able saddle). Natural (nonconvex) formulations for a num-ber of important problems in signal processing and ma-chine learning lie in this family, including the eigenvectorproblem, complete dictionary learning (CDL), generalizedphase retrieval (GPR), orthogonal tensor decomposition,and various synchronization and deconvolution problems.This benign geometric structure allows a number of op-timization methods to efficiently find a global minimizer,without special initializations. This geometric approach tosolving nonconvex problems has led to new types of com-putational guarantees for a number of practical problems,

Page 119: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

208 OP17 Abstracts

including dictionary learning, generalized phase retrieval,and sparse blind deconvolution. I will describe several openchallenges in terms of both theory and algorithms, and giveapplications of these ideas to computer vision and the anal-ysis of microscopy data.

John WrightDepartment of Electrical EngineeringColumbia [email protected]

Ju SunStanford [email protected]

Qing Qu, Yuqian ZhangColumbia UniversityElectrical [email protected], [email protected]

MS113

Two-Stage Linear Decision Rules for Multi-StageStochastic Programming

Multi-stage stochastic linear programs (MSLPs) are notori-ously hard to solve in general. An approximation of MSLPcan be obtained by applying linear decision rules (LDRs)on the recourse decisions, i.e., by restricting the set of fea-sible policies into affine functions of uncertain parameters.This reduces MSLP into a static problem which providesan upper bound on the optimal objective value. Similarly,it is possible to obtain a lower bound by applying LDRson the dual decisions. Under certain (somehow restrictive)assumptions, the restricted primal and dual problems areboth tractable linear programs. In this work, we introducetwo-stage LDR whose application reduces MSLP (or itsdual) into a two-stage stochastic linear program which canbe solved efficiently by means of decomposition. In ad-dition to potentially yielding better policies and bounds,this approach requires many fewer assumptions than arerequired to obtain an explicit reformulation when using thestandard static LDR approach. As an illustrative examplewe apply our approach to a capacity expansion model, andfind that the two-stage LDR policy has expected cost be-tween 20% and 34% lower than the static LDR policy, andin the dual yields lower bounds that are between 0.1% and3.3% better.

Merve BodurUniversity of TorontoDepartment of Mechanical and Industrial [email protected]

James LuedtkeUniversity of Wisconsin-MadisonDepartment of Industrial and Systems [email protected]

MS113

Progressive Hedging Like Methods for Comput-ing Lagrangian Dual Bounds in Stochastic Mixed-Integer Programming

A new primal-dual decomposition method is presented,based on an integration of the Progressive Hedging (PH)and the Frank-Wolfe (FW) methods, referred to as FW-PH, for computing high-quality Lagrangian bounds of thestochastic mixed-integer programming problem (SMIP).The deterministic equivalent (DE) form of the SMIP typi-

cally lacks convexity due to the assumed integrality restric-tions, and since PH is a specialization of the alternating di-rection method of multipliers (ADMM), the application ofPH to the SMIP is not theoretically supported due to thislack of convexity. Thus, PH applied to the SMIP is under-stood as a heuristic approach without convergence guar-antees, where either cycling or suboptimal convergence ispossible. Although Lagrangian bounds may be computedafter each PH iteration, these bounds often show limitedimprovement with increasing number of iterations, and theamount of improvement is highly sensitive to the value ofthe PH penalty parameter. Motivated by these observa-tions, we modify the PH method so that the generated se-quence of Lagrangian bounds is guaranteed to converge tothe optimal Lagrangian bound for any positive-valued PHpenalty parameter. The new integrated method is shownto be both theoretically supported due to an integrationof established theory for ADMM and the FW method, andpractically implementable. Numerical experiments demon-strate the improvement of FW-PH over the PH method forcomputing high-quality Lagrangian bounds.

Andrew C. EberhardRMIT University, Discipline of Mathematical SciencesSchoolof [email protected]

Natashia BolandUniversity of [email protected]

Brian C. DandurandArgonne National [email protected]

Jeff LinderothUniversity of Wisconsin-MadisonDept. of Industrial and Sys. [email protected]

James LuedtkeUniversity of Wisconsin-MadisonDepartment of Industrial and Systems [email protected]

Fabricio Oliveira, Jeffrey ChristiansenMathematical Sciences,School of Science, RMIT [email protected],[email protected]

MS113

A Sequential Sampling Algorithm for StochasticMultistage Programs

Stochastic Dual Dynamic Programming (SDDP) has be-come a prominent approach to tackle multistage stochasticprograms. The convergence argument for this algorithmrely on existence of finitely many Benders cuts that can begenerated. This is possible only when the probability dis-tributions associated with underlying stochastic processesdefining the optimization problem are known, and oftenrepresented as a scenario tree. However, in many appli-cations one may not have a priori description of scenar-ios, and their probabilities. For such cases, the traditionalBenders cuts in SDDP are no longer available, and one hasto resort to sequential sampling approach where empiricalestimates of the minorants can be calculated. We will dis-

Page 120: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 209

cuss convergence of one such sequential sampling algorithmwhich we refer to as the Stochastic Dynamic Linear Pro-gramming. This algorithm is a dynamic extension of theregularized two-stage stochastic decomposition for stage-wise independent multistage stochastic linear programs. Itturns out that the use of regularization becomes the keyto convergence of such algorithms. We will also presentresults from our computational experiments conducted ona short-term distributed storage control problem. Theseresults show that our distribution-free approach providesprescriptive solutions and values which are statistically in-distinguishable from those obtained from SDDP, while im-proving computational times significantly.

Harsha GangammanavarSouthern Methodist University, Department ofEngineeringManagement, Information, and [email protected]

MS113

Nested Decomposition of Multistage Stochastic In-teger Programs with Binary State Variables

We propose a valid nested decomposition algorithm formultistage stochastic integer programming problems whenthe state variables are binary. We prove finite convergenceof the algorithm as long as the cuts satisfy some sufficientconditions. We discuss the use of well known Benders andinteger optimality cuts within this algorithm, and intro-duce new cuts derived from a reformulation of the problemwhere local copies of state variables are introduced. Wepropose a stochastic variant of this algorithm and proveits finite convergence with probability one. Numerical ex-periment on a large-scale generation expansion planningproblem will be presented.

Shabbir AhmedGeorgia Institute of Technology, Industrial andSystems [email protected]

Andy SunGeorgia TechCollege of [email protected]

MS114

Mixed-Integer Linear Programming for a PDE-Constrained Dynamic Network Flow

To compute a numerical solution of PDEs it is commonpractice to find a suitable finite dimensional approxima-tion of the differential operators and solve the obtainedfinite dimensional system, which is linear if the PDE is lin-ear. In case the PDE is under an outer influence (control),a natural task is to find an optimal control with respect toa certain objective.In this talk general methods and concepts are given forlinear PDEs with a finite set of control variables, whichcan be either binary or continuous. The presence of thebinary variables restricts the feasible controls to a set ofhyperplanes of the control space and makes conventionalmethods inapplicable. We show that under the assump-tion that all additional constraints of the system are linear,these problems can be modeled as a mixed-integer linearprogram (MILP), and numerically solved by a linear pro-gramming (LP) based branch-and-cut method. If the dis-cretized PDE is directly used as constraints in the MILP,

the problem size scales with the resolution of the discretiza-tion in both space and time. A reformulation of the modelallows to solve the PDE in a preprocessing step, whichnot only helps to significantly reduce the number of con-straints and computation time, but also makes it possibleto use adaptive finite element methods. It is further shownthat the scaling of state-constraints with finer discretiza-tion can be reduced significantly by enforcing them in alazy-constraint-callback.

Fabian GnegelHelmut-Schmidt UniversitaetHamburg, [email protected]

Armin FugenschuhHelmut Schmidt University [email protected]

Marcus StiemerHelmut-Schmidt-University ofFederal Armed Forces [email protected]

Michael DudzinskiHelmut Schmidt University of the Federal Armed [email protected]

MS114

Mixed-Integer PDE-Constrained Optimization forGas Networks

We discuss a PDE-constrained optimal control problemthat arises in the optimal control of natural gas transportnetworks subject to nonlinear friction. Following a tra-ditional first-discretize-then-optimize approach, we startfrom a MINLP formulation of the problem and focus onissues that arise when applying decoupling and roundingmethods to it. While rounding methods such as Sum-Up-Rounding, which solve the MINLP approximately by solv-ing an NLP and an MILP, are a powerful tool that is usedto great effect in the field of optimal control, they generallyrequire solutions to the relaxed NLP to resemble solutionsto the original MINLP. We discuss several aspects of ourproblem formulation that render it inaccessible to classicaldecoupling methods and present reformulations, approxi-mations and solution techniques that work around theseissues and open up alternate sources of information thatmay be exploited by approximate MINLP solvers.

Mirko HahnArgonne National [email protected]

MS114

Gradient Descent Methods for Mixed-IntegerPDE-Constrained Optimization

Motivated by challenging tasks such as minimal cost oper-ation of gas, water or traffic networks we consider mixed-integer optimization problems subject to evolution typePDE constraints. The integer restrictions model discretecontrol components, for example valves, gates or signalsigns in the mentioned applications. We consider relax-ation and re-parameterization techniques in order to solvesuch problems in an appropriate sense locally using gra-dient descent methods. The descent strategies are cho-sen such that gradients of certain sub-problems can be ob-

Page 121: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

210 OP17 Abstracts

tained numerically very efficiently by solutions of suitableadjoint PDEs. The performance of these methods will bedemonstrated on numerical examples.

Falk M. HanteFriedrich-Alexander-Universitat [email protected]

MS114

Introduction to Mixed-Integer PDE ConstrainedOptimization

We introduce mixed-integer PDE-constrained optimizationproblems, discuss some theoretical and computational chal-lenges of this new class of problems, and propose a classifi-cation of this problem class. We present a test-problemlibrary for mixed-integer PDE-constrained optimizationwritten in AMPL, and present preliminary numerical re-sults using existing mixed-integer solvers. We emphasizeproblem formulations that ensure convex relaxations, andtight relaxations that result in smaller search trees. Wealso discuss transformations that simplify the problem for-mulations. Our goal is to highlight the impact of modelingchoices at the interface between integer programming andPDE constrained optimization.

Sven LeyfferArgonne National [email protected]

MS115

Compositions of Convex Functions and Fully Lin-ear Models

Model-based optimization methods use an approximationof the objective function is to guide the optimization al-gorithm. Proving convergence of such methods often ap-plies an assumption that the approximations form fully lin-ear models – an assumption that requires the true objec-tive function to be smooth. However, some recent meth-ods have loosened this assumption and instead workedwith functions that are compositions of smooth functionswith simple convex functions (the max-function or the �-1norm). In this talk, we examine the error bounds resultingfrom the composition of a convex lower semi-continuousfunction with a smooth vector-valued function when it ispossible to provide fully linear models for each componentof the vector-valued function.

Warren HareUniversity of British Columbia, Okanagan [email protected]

MS115

From Least Squares Solutions of Linear InequalitySystems to Convex Least Squares Problems

In our contributions, we consider least squares solutions(if any) in problems of three different types: for a finitenumber of linear inequality systems, for infinitely manylinear systems, and for general convex systems. Our objec-tive focuses on existence results (for properly defined leastsquares solutions), characterizations of such solutions, andin some cases algorithms for computing them.

Jean-Baptiste Hiriart-UrrutyUniversity of ToulouseFrance

[email protected]

MS115

VU Decomposition and Partial Smoothness forSublinear Functions

VU decomposition and partial smoothness are related no-tions for generic nonsmooth functions. Sublinear functionsare widely used in optimization algorithms as they them-selves contain important structures. We discuss how tocharacterize several concepts related to partial smoothnessfor this type of functions. In particular, for a closed sub-linear function h we introduce a relaxed decompositiondepending on certain Vε and Uε subspaces that convergecontinuously to the V- and U-counterparts as ε tends tozero. To the new VεUε decomposition corresponds a re-laxed smooth manifold that contains the manifold where his partly smooth.

Shuai LiuIMPA, [email protected]

Claudia A. SagastizabalIMPAVisiting [email protected]

Mikhail V. SolodovInst de Math Pura e [email protected]

MS115

A Unified Approach to Error Bounds for Struc-tured Convex Optimization Problems

Error bounds, which refer to inequalities that bound thedistance of vectors in a test set to a given set by a residualfunction, have proven to be extremely useful in analyzingthe convergence rates of a host of iterative methods forsolving optimization problems. In this paper, we presenta new framework for establishing error bounds for a classof structured convex optimization problems, in which theobjective function is the sum of a smooth convex func-tion and a general closed proper convex function. Such aclass encapsulates not only fairly general constrained min-imization problems but also various regularized loss mini-mization formulations in machine learning, signal process-ing, and statistics. Using our framework, we show that anumber of existing error bound results can be recoveredin a unified and transparent manner. To further demon-strate the power of our framework, we apply it to a class ofnuclear-norm regularized loss minimization problems andestablish a new error bound for this class under a strictcomplementarity-type regularity condition. We then com-plement this result by constructing an example to showthat the said error bound could fail to hold without theregularity condition. We believe that our approach willfind further applications in the study of error bounds forstructured convex optimization problems.

Zirui ZhouSimon Fraser [email protected]

MS116

Relaxations of the Complementarity-Based Formu-

Page 122: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 211

lations of the L0 Problem

In many high-dimensional statistical learning problemsthere is a reason to believe that the solution of interestis sparse. To induce sparsity, the objective function ofsuch learning problems includes �0-norm of the decisionvariable that penalizes based on the number of nonzeroelements. For computational tractability, the nonconvex�0-norm is usually replaced with its convex �1-norm surro-gate. From the learning perspective, however, recent stud-ies have shown that the local solutions to nonconvex prob-lems could be superior to the global solutions of convexproblems. One way to model the �0-norm is by introduc-ing nonconvex complementarity constraints. In this talk,we present our findings on convex relaxations of such non-convex problems.

Sam Davanloo, Ben ChaikenOhio State [email protected], [email protected]

MS116

Rank Minimization using a Complementarity Ap-proach

The problem of minimizing the rank of a matrix subject toconstraints can be cast equivalently as a semidefinite pro-gram with complementarity constraints (SDCMPCC). Theformulation requires two positive semidefinite matrices tobe complementary. We investigate calmness of locally op-timal solutions to the SDCMPCC formulation and henceshow that any locally optimal solution is a KKT point, un-der an assumption on the original set of constraints. Wedevelop a penalty formulation of the problem. We presentcalmness results for locally optimal solutions to the penaltyformulation. We also develop a proximal alternating lin-earized minimization (PALM) scheme for the penalty for-mulation, and investigate the incorporation of a momen-tum term into the algorithm. Computational results arepresented.

John E. MitchellRensselaer Polytechnic Institute110, 8th Street, Troy, NY, [email protected]

Xin ShenMonsantoSt Louis [email protected]

MS116

Algorithms for Dictionary Learning and SparseCoding Problems

This talk discusses numerical schemes for sparse coding andits extension, convolutionary sparse coding, arising frommachine learning, high dimensional statistics, and big dataanalysis. The sparse coding problems give rise to a non-convex, nonsmooth, and large-scale optimization problem.By using the recent techniques for sparse optimization, wedevelop first order necessary and second order sufficient op-timality conditions for a local minimizer. Based on theseconditions, a Newton scheme is proposed to achieve the lo-cal 2nd order convergence, where convex program is solvedat each step. Aubin’s property is established and used toprove the 2nd order convergence. Finally, statistical prop-

erties of the proposed algorithms will be discussed.

Jinglai ShenUniversity of Maryland Baltimore [email protected]

Xiao WangDept. of Statistics, Purdue [email protected]

MS116

On the Global Resolution �0-Norm MinimizationProblems via ADMM Schemes

We consider the minimization of an objective function con-taining an �0-norm term which implicitly emphasizes theneed for a sparse minimizer. Such a problem is widely con-sidered in image processing and statistical learning. How-ever, because of the nonconvex and discontinuous nature ofthis norm, many use the �1-norm or other variants as sub-stitutes, and far less is known about directly solving the�0 problem. In this work, we consider an approach for thedirect resolution of this problem through a reformulationas an equivalent continuous mathematical program wtihcomplementarity constraints (MPCC). We propose the so-lution of this problem by considering an ADMM schemewhich requires the solution of two subproblems in eachiteration. Of these, the first is convex while the secondis nonconvex. However, we observe that this nonconvexsubproblem possesses a hidden convexity property and itssolution can be provided in closed form. Based on thisobservation, both subproblems may be solved efficiently.Numerical experiments indicate that the scheme producesnear global optimal solutions and the effort scales modestlywith problem size.

Yue XiePennsylvania State [email protected]

Uday ShanbhagDepartment of Industrial and Manufacturing EngineeringPennsylvania State [email protected]

MS117

Solving a Quadratic Problem on a Bounded IntegerDomain Using a Quantum Annealer

Quantum annealers solve Quadratic Unconstrained Bi-nary Optimization (QUBO) problems via transformationto Ising models. Current implementations of quantum de-vices, however, have limited numbers of qubits and arefurthermore prone to various sources of noise. In prac-tice, this restricts the usage of the quantum devices to alimited number of qubits and a limited range of applica-ble ferromagnetic biases and couplings. One way to solveunconstrained quadratic integer problems using quantumannealers is encoding integer variables as binary ones. Inthis talk, we discuss several integer encodings and proposeone that is more robust to the limitations of the currentlyavailable quantum devices.

Sahar KarimiUniversity of Waterloo

Page 123: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

212 OP17 Abstracts

[email protected]

MS117

Optimization: Quantum vs Classical

Can quantum computers meet the tantalizing promise ofsolving complex calculations – such as optimization prob-lems or database queries – faster than classical computersbased on transistor technologies? Although IBM recentlyopened up their five-qubit programmable quantum com-puter to the public to tinker with, the holy grail of a use-ful large-scale programmable universal quantum computeris decades away. While working mid-scale programmablespecial-purpose quantum optimization machines exist, aconclusive detection of quantum speedup remains contro-versial despite promising results by Google Inc. In this talkI outline how we can predict the typical difficulty of opti-mization problems without solving them with the goal of”tickling” any quantumness out of these machines while, atthe same time, aiding in the search for the “killer” appli-cation domain where quantum optimization might excel.Finally, an overview of different sequential, non-tailored,as well as specialized tailored classical state-of-the-art al-gorithms is given. Current quantum annealing technologiesmust outperform these to claim the crown in the race forquantum speedup.

Helmut G. KatzgraberTexas A&M [email protected]

MS117

Building a Performance Model for Quantum An-nealing

A primary objective in optimization research is to buildperformance models for heuristic algorithms, which de-scribe the relationships between input properties, algo-rithm parameters, and performance. D-Wave’s annealingbased quantum computers fall within the adiabatic quan-tum model of computation. Since theoretical results in thismodel are sparse, we use empirical methods to understandand characterize performance. I will describe some chal-lenges that arise when developing performance models forthese novel computing systems, and survey what is knownso far.

Catherine McGeochD-Wave Systems [email protected]

MS117

Leveraging Quantum Computing for OptimizingData Analysis

Abstract not available at time of publication.

Immanuel TrummerCornell [email protected]

MS118

Estimating Large Covariance Matrices Using Prox-imity Operators

The covariance matrix is one of the most fundamental ob-jects in statistics, and yet its reliable estimation is noto-riously difficult in high dimensional settings. The sample

covariance matrix is known to perform poorly as an es-timator when the number of variables is large relative tothe sample size. Indeed, the key to tractable estimationof a high-dimensional covariance matrix lies in exploitingany knowledge of the covariance structure. Convex regu-larization penalties are a natural way to incorporate suchproblem-specific structural knowledge. In this talk, we dis-cuss several recently introduced estimators of the covari-ance matrix which are formulated as proximity operatorsof certain convex penalties.

Jacob BienDepartment of Biological StatisticsCornell [email protected]

MS118

Proximal Methods for Penalized Concomitant M-Estimators

In statistics, the task of simultaneously estimating a re-gression vector and an additional model parameter is oftenreferred to as concomitant estimation. Huber introduceda generic method for formulating “maximum likelihood-type” estimators (or M-estimators) with a concomitant pa-rameter from a convex criterion. Recent work in penal-ized regression has extended this framework to the high-dimensional setting. In this contribution, we provide ageneric proximal algorithmic approach to solving convexoptimization problems associated with this class of penal-ized concomitant M-estimators. The class of estimatorsincludes the scaled Lasso and the TREX as special cases.We present novel proximity operators arising from differ-ent concomitant estimators and show their applicability instandard proximal algorithm schemes. We illustrate boththe algorithmic performance of the optimization routinesand the statistical performance of the different estimatorson selected synthetic and real-world examples.

Christian L. MuellerSimons Center for Data AnalysisSimons [email protected]

Patrick L. CombettesDepartment of MathematicsNorth Carolina State [email protected]

MS118

A New Perspective on Stochastic 0th and 1st OrderProximal Methods

Abstract not available at time of publication.

Vianney PerchetCMLA Research Center for Applied MathsEcole normale superieure de [email protected]

MS118

Hierarchical Convex Optimization and ProximalSplitting for High-Dimensional Statistics

The proximal splitting algorithms are powerful convex op-timization tools to exploit multiple prior knowledge for im-provements of variety of high dimensional estimation tasksin data sciences and engineerings. These algorithms can it-

Page 124: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 213

eratively approximate an unspecial vector among possiblyinfinitely many minimizers of a superposition of multiplenonsmooth convex functions. With elegant translations ofthe solution set, i.e., the set of all minimizers, into the fixedpoint sets of nonexpansive mappings, the hybrid steepestdescent method allows a hierarchical optimization, i.e., fur-ther strategic selection of a most desirable vector amongthe solution set, by minimizing an additional convex func-tion over the solution set. In this talk, we introduce fixedpoint theoretic interpretations of the proximal splitting al-gorithms and their enhancements by the hybrid steepestdescent method with applications to recent advanced sta-tistical estimation problems. The applications include cer-tain enhancements of the state-of-the-art concomitant es-timators, e.g., the Sqrt-Lasso estimator, the TREX esti-mator, and the generalized TREX estimator. This talkis based on a joint work with Dr.Yamagishi of the TokyoInstitute of Technology.

Isao YamadaDept. Communications and Computer EngineeringTokyo Institute of [email protected]

MS119

Sketchy Decisions: Convex Optimization with Op-timal Storage

This work proposes a novel paradigm for designing opti-mization algorithms for large-scale convex problems withstructured solutions. The two key ideas are (i) to main-tain only a sketch of the decision variable and (ii) touse algorithmic templates that are compatible with thesesketches. A natural target for this approach is the class ofconvex matrix optimization problems with low-rank solu-tions. This talk develops and analyzes a new sketch-basedalgorithm for a model problem from this class. Numeri-cal evidence establishes that the algorithm scales to hugeinstances while retaining the advantages of convex opti-mization. joint work with Alp Yurtsever, Quoc Tran Dinh,Madeliene Udell, and Joel Tropp

Volkan Cevher, Alp Yurtsever, Quoc Tran [email protected], [email protected],[email protected]

Madeleine R. UdellStanford UniversityStanford [email protected]

Joel TroppApplied and Computational [email protected]

MS119

Recent Advances in Frank-Wolfe Optimization

We will review the venerable Frank-Wolfe algorithm and itsvariants in the context of modern applications in machinelearning and signal processing. We will summarize someof its latest algorithmic and theoretical improvements, in-cluding a global linear convergence analysis of variants ofFrank-Wolfe that seeded further developments presented inthis symposium. We will also mention current open ques-tions, including the application of Frank-Wolfe to saddle

point problems.

Simon Lacoste-JulienUniversity of [email protected]

MS119

Decomposition-Invariant Linearly ConvergentConditional Gradient Algorithm for StructuredPolytopes

Recently, several works have shown that natural modifi-cations of the classical conditional gradient method (akaFrank-Wolfe algorithm) for constrained convex optimiza-tion, provably converge with a linear rate when the feasibleset is a polytope, and the objective is smooth and strongly-convex. However, all of these results suffer from signifi-cant shortcomings: 1) large memory requirement (and run-ning time) due to storing an explicit convex decompositionof the current iterate, and 2) the worst case convergencerate depends unfavorably on the dimension. In this workwe present a new conditional gradient variant and a cor-responding analysis that improves on both of the aboveshortcomings. In particular, both memory and computa-tion overheads are only linear in the dimension, and inaddition, in case the optimal solution is sparse, the newconvergence rate replaces a factor which is at least linear inthe dimension in previous work, with a linear dependenceon the number of non-zeros in the optimal solution. Atthe heart of our method, and corresponding analysis, is anovel way to compute decomposition-invariant away-steps.While our theoretical guarantees do not apply to any poly-tope, they apply to several important structured polytopesthat capture central concepts such as paths in graphs, per-fect matchings in bipartite graphs, marginal distributionsthat arise in structured prediction tasks, and more.

Dan GarberTTI [email protected]

Ofer [email protected]

MS119

Polytope Conditioning and Linear Convergence ofthe Frank-Wolfe Algorithm

It is known that the gradient descent algorithm convergeslinearly when applied to a strongly convex function withLipschitz gradient. In this case the rate of convergenceis determined by condition number of the function. Ina similar vein, it has been shown that a variant of theFrank-Wolfe algorithm with away steps converges linearlywhen applied a strongly convex function over a polytope.In a nice extension of the unconstrained case, the rate ofconvergence is determined by the product of the conditionnumber of the function and a certain condition number ofthe polytope. We shed new light into the latter type ofpolytope conditioning. In particular, we show that previ-ous and seemingly different approaches to define a suitablecondition measure for the polytope are essentially equiv-alent to each other. Perhaps more interesting, they canall be unified via a parameter of the polytope that formal-izes a key premise linked to the linear convergence of thealgorithm. We also give new insight into the linear conver-gence property. For a convex quadratic objective, we showthat the rate of convergence is determined by the condition

Page 125: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

214 OP17 Abstracts

number of a suitably scaled polytope.

Javier Pena, Daniel RodriguezCarnegie Mellon [email protected], [email protected]

MS120

Quantum Bilinear Optimization

We study optimization programs given by a bilinear formover non-commutative variables subject to linear inequali-ties. Problems of this form include the entangled value oftwo-prover games, entanglement-assisted coding for clas-sical channels and quantum-proof randomness extractors.We introduce an asymptotically converging hierarchy of ef-ficiently computable semidefinite programming (SDP) re-laxations for this quantum optimization. This allows usto give upper bounds on the quantum advantage for allof these problems. Compared to previous work of Piro-nio, Navascues and Acin, our hierarchy has additional con-straints. By means of examples, we illustrate the impor-tance of these new constraints both in practice and foranalytical properties. Moreover, this allows us to give ahierarchy of SDP outer approximations for the completelypositive semidefinite cone introduced by Laurent and Pi-ovesan.

Mario BertaInstitute for Quantum Information and MatterCalifornia Institute of Technology, [email protected]

MS120

Quantum Channels, Spectrahedra and a TracialHahn-Banach Theorem

This talk will discuss matrix convex sets and their tra-cial analogs which we call contractively tracial convex sets.In both contexts completely positive (cp) maps play acentral role: unital cp maps in the case of matrix con-vex sets and trace preserving cp (CPTP) maps in thecase of contractively tracial convex sets. CPTP maps,also known as quantum channels, are fundamental objectsin quantum information theory. Free convexity is inti-mately connected with Linear Matrix Inequalities (LMIs)L(x) = A0 +A1x1 + ...+Agxg � 0 and their matrix convexsolution sets {X : L(X) is positive semidefinite}, calledfree spectrahedra. The Effros-Winkler Hahn-Banach Sep-aration Theorem for matrix convex sets states that matrixconvex sets are solution sets of LMIs with operator coef-ficients. Motivated in part by cp interpolation problems,we will develop the foundations of convex analysis and du-ality in the tracial setting, including tracial analogs of theEffros-Winkler Theorem. This is based on joint work withJ. William Helton and Scott McCullough.

Igor KlepDepartment of MathematicsThe University of [email protected]

MS120

Convex Separation from Convex Optimization forLarge-Scale Problems

We present a scheme, based on Gilbert’s algorithm forquadratic minimization [SIAM J. Contrl., vol. 4, pp. 61-80, 1966], to prove separation between a point and an arbi-trary convex set S?Rn via calls to an oracle able to perform

linear optimizations over S. Compared to other methods,our scheme has almost negligible memory requirements andthe number of calls to the optimization oracle does not de-pend on the dimensionality n of the underlying space. Westudy the speed of convergence of the scheme under dif-ferent promises on the shape of the set S and/or the loca-tion of the point, validating the accuracy of our theoreticalbounds with numerical examples. Finally, we present someapplications of the scheme in quantum information theory.There we find that our algorithm out-performs existing lin-ear programming methods for certain large scale problems,allowing us to certify nonlocality in bipartite scenarios withupto 42 measurement settings. We apply the algorithmto upper bound the visibility of two-qubit Werner states,hence improving known lower bounds on Grothendieck’sconstant KG(3). Similarly, we compute new upper boundson the visibility of GHZ states and on the steerability limitof Werner states for a fixed number of measurement set-tings.

Miguel NavascuesInstitute of Quantum Optics and Quantum InformationAustian Academy of [email protected]

MS120

Using Noncommutative Polynomial Optimizationfor Matrix Factorization Ranks

In this talk I will explain how tracial noncommutative poly-nomial optimization can be used to compute lower boundson the cpsd-rank (completely positive semidefinite rank) ofa matrix. Here the cpsd-rank is a quantum inspired gen-eralization of the cp-rank, where we consider Gram factor-izations by positive semidefinite matrices instead of non-negative vectors. Inspired by this approach we have alsoderived new techniques to lower bound the cp-rank, and Iwill compare this to recent results by Fawzi and Parrilo.Finally, I will discuss how this relates to the dimensionrequired to realize quantum correlations.

Sander Gribling, David de LaatCentrum Wiskund & Informatica, The [email protected], [email protected]

Monique LaurentCWI, Amsterdamand Tilburg [email protected]

MS121

A Dual Active-Set Algorithm for RegularizedMonotonic Regression

Monotonic (isotonic) Regression (MR) is a powerful toolused for solving a wide range of important applied prob-lems. One of its features, which poses a limitation on itsuse in some areas, is that it produces a piecewise con-stant fitted response. For smoothing the fitted response,we introduce a regularization term in the MR formulatedas a least distance problem with monotonicity constraints.The resulting Smoothed Monotonic Regression (SMR) is aconvex quadratic optimization problem. We focus on theSMR, where the set of observations is completely (linearly)ordered. Our Smoothed Pool-Adjacent-Violators (SPAV)algorithm is designed for solving the SMR. It belongs tothe class of dual active-set algorithms. We proved its fi-nite convergence to the optimal solution in, at most, niterations, where n is the problem size. One of its advan-

Page 126: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 215

tages is that the active set is progressively enlarging byincluding one or, typically, more constraints per iteration.This resulted in solving large-scale SMR test problems ina few iterations, whereas the size of that problems wasprohibitively too large for the conventional quadratic op-timization solvers. Although the complexity of the SPAValgorithm is O(n2), its running time was growing in ourcomputational experiments almost linearly with n.

Oleg Burdakov, Oleg SysoevLinkoping [email protected], [email protected]

MS121

Parameter Estimation and Variable Selection forBig Systems of High-Dimensional Linear OrdinaryDifferential Equations

In this talk, we will introduce several new methods, whichare based on a new matrix framework, to solve an inverseproblem in a big system of high-dimensional linear ordinarydifferential equations. In the procedure to reconstruct thesystem, both parameter estimation and variable selectionproblems are discuessed. Finally, several applications areintroduced to show the efficiency of our algorithms for real-world challenges.

Leqin WUJinan [email protected]

Ya-Xiang YuanChinese Academy of SciencesInst of Comp Math & Sci/Eng, [email protected]

Xing QiuUniversity of Rochesterxing [email protected]

Hulin WuDepartment of BiostatisticsUniversity of Texas Health Science Center at [email protected]

MS121

A TV-SCAD Approach for Image Deblurring withImpulsive Noise

We consider image deblurring problem in the presence ofimpulsive noise. It is well known that total variation (TV)regularization with L1-norm data fitting (TVL1) workswell only when the level of impulsive noise corruption islow. For high level impulsive noise, the TVL1 model workspoorly mainly because the corrupted data is over fitted. Inthis paper, we propose to use TV regularization with non-convex SCAD data fitting for image deblurring with highlevel impulsive noise. By decomposing the SCAD func-tion as the difference of two convex functions, the TV-SCAD model is reformulated as a DC programming. Wethen adopt a classical DC algorithm to solve the TV-SCADmodel via solving a sequence of TVL1-equivalent problems.Theoretically, it is guaranteed that any limit point of thesequence of points generated by the proposed approach is acritical point of the nonconvex optimization problem. Nu-merically, we present extensive numerical results to demon-strate the superiority of the proposed approach.

Junfeng Yang

Nanjing University, [email protected]

MS121

Completely Positive Binary Tensors

A symmetric tensor is completely positive (CP) if it is asum of tensor powers of nonnegative vectors. This papercharacterizes completely positive binary tensors. We showthat a binary tensor is completely positive if and only if itsatisfies two linear matrix inequalities. This result can beused to determine whether a binary tensor is completelypositive or not. When it is, we give an algorithm for com-puting its cp-rank and the decomposition. When the orderis odd, we show that the cp-rank decomposition is unique.When the order is even, we completely characterize whenthe cp-rank decomposition is unique. We also discuss howto compute the nearest cp-approximation when a binarytensor is not completely positive.

Jinyan FanDepartment of Mathematics, Shanghai Jiao TongUniversityShanghai 200240, P.R. [email protected]

Jiawang NieUniversity of California, San [email protected]

Anwa ZhouShanghai University, [email protected]

MS122

Adaptive Finite Element Solvers for MPECs inFunction Space

Abstract not available at time of publication.

Michael HintermuellerHumboldt-University of [email protected]

MS122

PDE-Constrained Optimization Using Epi-Regularized Risk Measures

Many science and engineering applications necessitate theoptimization of PDEs with uncertain inputs such as coeffi-cients, boundary conditions and initial conditions. In thistalk, I formulate these problems as risk-averse optimiza-tion problems in Banach space. Such problems are oftennonsmooth and require an enormous number of samples toaccurately quantify the uncertainty in the governing PDE.To circumvent these issues, I present a general smoothingtechnique for risk measures based on epigraphical calculus.I show that the resulting smoothed risk measures are dif-ferentiable and epi-converge to the original nonsmooth riskmeasure. Moreover, I prove consistency of this approxima-tion for both minimizers and stationary points. I concludewith numerical examples demonstrating these results.

Drew P. KouriOptimization and Uncertainty QuantificationSandia National Laboratories

Page 127: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

216 OP17 Abstracts

[email protected]

MS122

Optimal Control of Quasilinear Parabolic Equa-tions with State Constraints

We present a modern abstract framework for optimal con-trol problems governed by quasilinear parabolic evolutionequations in divergence form. The optimal control prob-lem is presumed to also include state constraints. To obtaincontinuity of the state, we build upon maximal parabolicregularity techniques combined with recent elliptic regu-larity results for negative Sobolev spaces with integrabilityorder greater than the space dimension. This way, we areable to handle rather involved forcing terms in the quasi-linear evolution equation, including such which arise as so-lution operators to a number of subordinated equations;in this sense, we also treat systems of PDEs at once. Forthe optimal control problem, the quasilinear structure givesrise to further complications, in particular the lack of suit-able a priori bounds for sequences of states. Moreover, wecannot exclude the phenomenon of blow-up of solutions.We thus restrict the optimal control problem to the set ofcontrols admitting global-in-time solutions and show thatdoing so, under standard assumptions, gives the same op-timality conditions as one would have obtained if the prob-lem of blow-up did not arise. The downside of this ansatzis that the proof of existence of globally optimal controlsbecomes harder, for which we propose to use the objec-tive functional to add favorable properties to an infimalsequence.

Hannes MeinlschmidtTU [email protected]

MS122

Risk-Averse PDE-Constrained Optimization andNash Equilibrium Problems

Uncertainty is ubiquitous in engineering and the naturalsciences. Thus, whenever we model real-world problemsit is important that this uncertainty is incorporated. Formany applications, this leads to partial differential equa-tions (PDEs) with uncertain inputs. Upon transitioningfrom modeling and simulation to optimization, we typi-cally arrive at a stochastic optimization problem with dis-tributed parameters or decision variables. In order to han-dle the randomness of the objective functional and control-to-state mapping, we make use of risk measures. This leadsto non-smooth, stochastic PDE-constrained optimizationproblems. After establishing reasonable conditions thatallow us prove the existence of a solution and derive first-order optimality conditions, we discuss smoothing tech-niques (including their asymptotic behavior). In addition,we demonstrate how these ideas can be extended to (multi-objective) PDE-constrained generalized Nash equilibriumproblems and so-called MOPECs (multiple optimizationproblems with equilibrium constraints). We then demon-strate the effects of various risk measures on the overallcost and design of the decisions via numerical examples foreach of the problem classes mentioned above.

Thomas M. SurowiecDepartment of Mathematics and Computer SciencePhilipps University of Marburg

[email protected]

MS123

A Universal Catalyst for First-Order Optimization

Abstract not available at time of publication.

Zaid HarcahouiUniversity of [email protected]

MS123

A Novel, Simple Interpretation of Nesterovs Accel-erated Method as a Combination of Gradient andMirror Descent

Abstract not available at time of publication.

Lorenzo OrecchiaBoston [email protected]

MS123

A Variational Perspective on Accelerated Methodsin Optimization

Abstract not available at time of publication.

Ashia WilsonStatistics Dept.U.C. [email protected]

MS123

Regularized Nonlinear Acceleration

We describe a convergence acceleration technique forgeneric optimization problems. Our scheme computes es-timates of the optimum from a nonlinear average of the it-erates produced by any optimization method. The weightsin this average are computed via a simple linear system,whose solution can be updated online. This accelerationscheme runs in parallel to the base algorithm, providingimproved estimates of the solution on the fly, while theoriginal optimization method is running. Numerical ex-periments are detailed on classical classification problems.

Alexandre d’AspremontCNRS - ENS, [email protected]

MS124

Optimization of Spatiotemporal Fractionation inPhoton Radiotherapy with Optimality Bounds

Radiotherapy treatments are typically fractionated, whichmeans that radiation dose is delivered over several daysor weeks rather than in a single treatment session. Incurrent clinical practice, the total radiation dose is splitevenly and the patient is treated in the same manner onevery day of the treatment. It has recently been demon-strated that treatments can be improved by altering theradiation dose distribution from day to day in a practicecalled nonuniform fractionation. In contrast to the uni-formly fractionated treatments that can be computed withconvex optimization algorithms, nonuniformly fractionated

Page 128: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 217

treatments are based on a biological dose-response modelcalled biologically effective dose (BED), which leads to anonconvex quadratic formulation. In the current work, wesolve nonuniformly fractionated treatment plan models tolocal optimality and use a semidefinite programming relax-ation to prove the near-optimality of the computed plans.Using clinical liver cases, we demonstrate that nonuniformfractionation can substantially reduce the biological effectof radiation in the healthy liver tissue while maintainingtreatment effectiveness. In addition, we find that the com-puted locally optimal treatment plans are close to realizingthe maximum potential benefit of nonuniform fractiona-tion. This is joint work with David Papp and Jan Unkel-bach.

Melissa GaddyDepartment of MathematicsNorth Carolina State [email protected]

David PappNorth Carolina State [email protected]

Jan UnkelbachDepartment of Radiation OncologyMassachusetts General Hospital and Harvard [email protected]

MS124

Multi-Modality Optimal Radiation Therapy

Each radiation type, e.g. X-rays, protons, neutrons, cur-rently used for treating cancers has its unique radiobiolog-ical power (cell-kills for a given physical dose) and phys-ical dose distribution (relative dose between tumor andnormal tissue). We present a multi-modality optimal ra-diation therapy (MMORT) approach, which balances atrade-off between different radiation types by optimizingthe number of fractions and a fractional dose for eachmodality. We maximize the biological effect, as definedby a linear-quadratic cell survival model, on the tumorwhile constraining the normal tissue damage to an accept-able level. We first present a spatiotemporally separatedMMORT model to gain qualitative insights into the prob-lem. This formulation leads to a QCQP, which is solvedusing KKT conditions. Second, we present a spatiotem-porally integrated formulation. Optimization variables inthis case are the radiation intensities of the beamlets, whichthen determine a spatial dose distribution, and the num-ber of fractions for each modality. Various OAR constrainttypes prevalent in practice, i.e., mean dose, maximum dose,and dose-volume constraints, are incorporated. Numeri-cal simulations using dual modalities with multiple OARconstraints will be used to provide structural insights intothe optimal solutions and quantify the potential benefitsof MMORT.

Minsun KimDepartment of Radiation OncologyUniversity of Washington School of [email protected]

Archis Ghate, Sevnaz NourollahiUniversity of Washington

[email protected], [email protected]

MS124

Biologically-Informed Radiotherapy Planning us-ing a Global Optimization Approach

Radiotherapy is one of the main modalities for cancer treat-ment. The goal of radiotherapy is to deliver sufficientradiation dose to the tumor region to eradicate the dis-ease while sparing the surrounding healthy tissue to thelargest extent possible. To achieve this goal, radiother-apy plans for individual cancer patients are designed todeliver the desired spatial dose distribution to the patient.The radiotherapy plan will be then used on a daily ba-sis to deliver a daily fraction of the prescribed radiationdose over the course of the treatment. However, there isbiological evidence suggesting that additional therapeuticgain may be achieved if we allow for temporal variation inthe radiotherapy plan. This research aims at developinga spatiotemporal radiotherapy planning approach to eval-uate the potential benefit of varying radiotherapy plansand thus the spatial dose distribution over the treatmentcourse. The spatiotemporal planning problem is modeledas a non-convex quadratically-constrained quadratic pro-gramming problem, which is solved using global optimiza-tion techniques. The proposed approach is applied to aclinical cancer case in order to test the computational per-formance of the solution method and to quantify the po-tential therapeutic benefit of varying the radiotherapy planover the course of the treatment.

Ehsan SalariWichita State [email protected]

Ali AdibiDepartment of Industrial, Systems, and ManufacturingEng.Wichita State [email protected]

MS124

Adaptive SBRT Planning for Interfraction Motion

Conventional treatment is delivered in 30 fractions (treat-ments) over approximately one month. However, SBRTdelivers up to 5 fractions in up to two weeks; interfractionmotion (changes in patient geometry between fractions)can result in overdosing organs at risk (OAR) near thetumor. Current adaptive strategies may reduce dose toOARs, but at the cost of under-dosing the tumor withoutcompensation in future fractions. We investigate adap-tive planning strategies that compensate for fractions withthese so-called ”unfavorable” geometries that under-dosethe tumor by boosting (increasing) dose in fractions with”favorable” geometry (while still satisfying OAR dose lim-its) via a stochastic programming approach.

Victor WuUniversity of [email protected]

Marina A. EpelmanUniversity of MichiganIndustrial and Operations [email protected]

Randall Ten HakenUniversity of Michigan

Page 129: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

218 OP17 Abstracts

[email protected]

Kristy BrockMD [email protected]

Martha MatuszakUniversity of [email protected]

MS125

Asynchronous Parallel Operator Splitting Methodsfor Convex Stochastic Programming

We describe an decomposition method for convex stochas-tic programs that resembles progressive hedging but onlyneeds to process a subset of the scenarios at each iteration,can vary the penalty parameter by both iteration and sce-nario, and can operate asynchronously. We discuss how toimplement the algorithm and how it is derived from moregeneral block-iterative and projective splitting frameworks.We describe some applications and computational results.

Jonathan EcksteinRutgers [email protected]

Jean-Paul WatsonSandia National LaboratoriesDiscrete Math and Complex [email protected]

MS125

Object-Parallel Augmented Lagrangian Solution ofContinuous Stochastic Programs

We describe an “object-parallel’ C++ approach to imple-menting iterative optimization methods, with the goal ofcoding the algorithms in a readable, MATLAB-like style,but with efficient parallel implementation of underlyinglow-level linear algebra operations. Recent advances inaugmented Lagrangian algorithms and mathematical mod-eling environments, combined with our “object-parallel’software development techniques suggest efficiently scal-able solver implementations for stochastic programmingproblems, without employing decomposition methods.

Gyorgy Matyasfalvi, Jonathan EcksteinRutgers [email protected], [email protected]

MS125

Scenario-Based Decomposition for Parallel Solu-tion of the Contingency-Constrained ACOPF

We present a nonlinear stochastic programming formu-lation for a large-scale contingency-constrained optimalpower flow problem. Using a rectangular IV formulation tomodel AC power flow in the transmission network, we con-struct a nonlinear, multi-scenario optimization formulationwhere each scenario considers nominal operation followedby a failure an individual transmission element. Given thenumber of potential failures in the network, these problemsare very large; yet need to be solved rapidly. In this pa-per, we demonstrate that this multi-scenario problem canbe solved quickly using a parallel decomposition approachbased on progressive hedging and nonlinear interior-point

methods. Parallel and serial timing results are shown usingtest cases from Matpower, a MATLAB-based frameworkfor power flow analysis.

Jean-Paul WatsonSandia National LaboratoriesDiscrete Math and Complex [email protected]

Carl Laird, Anya CastilloSandia National [email protected], [email protected]

MS125

Highly Parallelized, Exact Progressive Hedging forStochastic MIPs

Progressive hedging, though an effective heuristic forstochastic mixed integer programs (MIPs), is not guaran-teed convergence in this case. Here, we describe BBPH, abranch and bound algorithm that uses PH within each nodethat, given enough time, will always converge to the opti-mal solution. In addition to providing a theoretically con-vergent wrapper for PH applied to MIPs, computational re-sults demonstrate that for some difficult problem instancesbranch and bound can find improved solutions.

David WoodruffUniversity of California, [email protected]

MS126

Finding Planted Graphs with Few Eigenvalues us-ing the Schur-Horn Relaxation

Extracting structured planted subgraphs inside largegraphs is a fundamental question that arises in a rangeof domains. We describe a tractable approach based onconvex optimization to recover certain families of graphsembedded in larger graphs containing spurious edges. Ourmethod relies on tractable semidefinite descriptions of thespectrum of a matrix, and we give conditions on the eigen-structure of a planted graph in relation to the noise levelunder which our algorithm succeeds.

Utkan Onur CandoganCalifornia Institute Of [email protected]

Venkat ChandrasekaranCalifornia Institute of [email protected]

MS126

Graph Structure in Polynomial Systems: ChordalNetworks

We introduce a novel representation of structured poly-nomial ideals, which we refer to as chordal networks. Thesparsity structure of a polynomial system is often describedby a graph that captures the interactions among the vari-ables. Chordal networks provide a computationally con-venient decomposition of a polynomial ideal into simpler(triangular) polynomial sets, while preserving its underly-ing graphical structure. We show that many interestingfamilies of polynomial ideals admit compact chordal net-work representations (of size linear in the number of vari-ables), even though the number of components could be

Page 130: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 219

exponentially large. Chordal networks can be computedfor arbitrary polynomial systems using a refinement of thechordal elimination algorithm from (Cifuentes2016). Fur-thermore, they can be effectively used to obtain severalproperties of the variety, such as its dimension, cardinal-ity and equidimensional components, as well as an efficientprobabilistic test for radical ideal membership. We applyour methods to examples from algebraic statistics and vec-tor addition systems; for these instances, algorithms basedon chordal networks outperform existing techniques by or-ders of magnitude.

Diego Cifuentes, Pablo A. ParriloMassachusetts Institute of [email protected], [email protected]

MS126

Local Graph Profiles: Algorithms and Large-ScaleApplications

We study the problem of counting the number of sub-graphs incident on each vertex and each edge in a largegraph. These are called the vertex-local and edge-localgraph profile, respectively, and describe the graphs connec-tivity structure. We present a state-of-the-art algorithmfor local profile counting of 5-subgraphs which can be par-allelized in a shared-memory architecture. Additionally,graph profile calculation is applied to graph theoretic op-timization problems such as clustering and classification.

Ethan R. ElenbergThe University of Texas at [email protected]

Alex DimakisUniversity of Texas at [email protected]

MS126

What You Ask Is What You Get: Query Designand Robust Algorithms for Crowdsourced Cluster-ing

We consider the problem of clustering unlabeled data us-ing a crowd of non-expert workers. For example, clus-tering a database of images of birds into different specieswith the collective help of workers who are not expertsin bird classification. While they may not be able to la-bel an image directly, they can compare the images andjudge whether they are similar. As the workers are notexperts, the answers obtained are noisy. Therefore it iscrucial to (1) design queries that can reduce the noise lev-els in the responses, and (2) design algorithms to reliablycluster the noisy data. We consider two query models, edgequery model (comparing two images per query), and trian-gle query model (comparing three images per query). Un-der natural modeling assumptions, we show that trianglequeries provide more reliable answers than edge queries.For both the query models, we analyze a robust convexclustering algorithm that attempts to decompose a par-tially observed adjacency matrix into low-rank and sparsecomponents. In this setting, we obtain sufficient conditionsfor the exact recovery of the clusters. We also provide ex-periments on real datasets as well as numerical simulationsthat validate our theoretical results. In particular, (1) tri-angle queries significantly outperform edge queries and (2)using the convex algorithms to denoise the data substan-

tially reduces the number of errors in clustering.

Ramya Korlakai VinayakCalifornia Institute of [email protected]

MS127

Semidefinite Programs with a Dash of Smoothness:Why and When the Low-Rank Approach Works

Semidefinite programs (SDP’s) can be solved in polyno-mial time by interior point methods, but scalability canbe an issue. To address this shortcoming, over a decadeago, Burer and Monteiro proposed to solve SDP’s with fewequality constraints via rank-restricted, non-convex surro-gates. Remarkably, for some applications, local optimiza-tion methods seem to converge to global optima of thesenon-convex surrogates reliably. Although some early the-ory supports this empirical success, a complete explana-tion of it remains an open question. In this presentation,we show that the Burer-Monteiro formulation of SDP’s ina certain class almost never has any spurious local op-tima, that is: the non-convexity of the low-rank formu-lation is benign. This class of SDP’s covers applicationssuch as max-cut, community detection in the stochasticblock model, robust PCA, phase retrieval and synchro-nization of rotations. The crucial assumption we makeis that the low-rank problem lives on a manifold, so thattheory and algorithms from optimization on manifolds canbe used. Optimization on manifolds is about minimizing acost function over a smooth manifold, such as spheres, low-rank matrices, orthonormal frames, rotations, etc. We willpresent the basic framework as well as parts of the moregeneral convergence theory, including recent complexity re-sults. (Toolbox: http://www.manopt.org) Select parts arejoint work with P.-A. Absil, A. Bandeira, C. Cartis and V.Voroninski.

Nicolas BoumalMathematics DepartmentPrinceton [email protected]

MS127

Fast and Provable Algorithms for Low-Rank Han-kel Matrix Completion

In this talk, we consider the problem of low-rank Han-kel matrix completion, with applications to the reconstruc-tion of spectrally sparse signals from a random subset ofn regular time domain samples. We introduce an itera-tive hard thresholding (IHT) algorithm and a fast itera-tive hard thresholding (FIHT) algorithm for efficient lowrank Hankel matrix completion to reconstruct spectrallysparse signals. Theoretical recovery guarantees have beenestablished for FIHT, showing that O(r2 log2(n)) num-ber of samples are sufficient for exact recovery with highprobability. Empirical performance comparisons establishsignificant computational advantages for IHT and FIHT.In particular, numerical simulations on 3D arrays demon-strate the capability of FIHT on handling large and high-dimensional real data.

Jian-Feng CaiDepartment of MathematicsHong Kong University of Science and [email protected]

Tianming Wang

Page 131: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

220 OP17 Abstracts

Department of MathematicsUniversity of [email protected]

Ke WeiDepartment of MathematicsUniversity of California at [email protected]

MS127

The Projected Power Method: An Efficient Non-convex Approach for Joint Alignment from Pair-wise Differences

Various applications involve assigning discrete label val-ues to a collection of objects based on some noisy data.Due to the discrete—and hence nonconvex—structure ofthe problem, computing the maximum likelihood estimates(MLE) becomes intractable at first sight. This papermakes progress towards efficient computation of the MLEby focusing on a concrete joint alignment problem—thatis, the problem of recovering n discrete variables {xi|i =1, ..., n} given noisy observations of their modulo differ-ences {xi−xj mod m}. We propose a novel low-complexityprocedure, which operates in a lifted space by representingdistinct label values in orthogonal directions, and which at-tempts to optimize quadratic functions over hyper cubes.Starting with a first guess computed via a special method,the algorithm successively refines the iterates via projectedpower iterations. We prove that the proposed projectedpower method makes no error—and hence converges tothe MLE—in a suitable regime. Numerical experimentshave been carried out on both synthetic and real data todemonstrate the practicality of our algorithm. We expectthis algorithmic framework to be effective for a broad rangeof discrete assignment problems.

Yuxin ChenDepartment of StatisticsStanford [email protected]

Emmanuel CandesStanford UniversityDepartments of Mathematics and of [email protected]

MS127

Gradient Descent for Rectangular Matrix Comple-tion

From recommender systems to healthcare analytics low-rank recovery from partial observations is prevalent inmodern data analysis. There has been significant progressover the last decade in providing rigorous guarantees forlow-rank recovery problems based on convex relaxationtechniques. However, the computational complexity ofthese algorithms render them impractical for large-scaleapplications. Recent advances in nonconvex optimizationexplain the surprising effectiveness of simple first-order al-gorithms for many low rank matrix recovery problems, es-pecially for positive semidefinite matrices. The commontheme of these algorithms is to work directly with the lowrank factors of the semidefinite variable. In this talk, Iwill discuss how similar ideas can be applied to rectangu-lar matrix completion. We provide rigorous convergenceguarantee to show such simple algorithms are effective andcan overcome the scalability limits faced by popular convex

relaxation approach.

Qinqing ZhengDepartment of Compute ScienceUniversity of [email protected]

John LaffertyDepartment of StatisticsUniversity of [email protected]

MS128

Local Complexities and Adaptive Algorithms inStochastic Convex Optimization

We extend the traditional worst-case, minimax analysis ofstochastic convex optimization by introducing a localizedform of minimax complexity for individual functions. Ourmain result gives function-specific lower and upper boundson the number of stochastic subgradient evaluations neededto optimize either the function or its ”hardest local alter-native” to a given numerical precision. The bounds areexpressed in terms of a localized and computational ana-logue of the modulus of continuity that is central to statis-tical minimax analysis. We show how the computationalmodulus of continuity can be explicitly calculated in con-crete cases, and relates to the curvature of the function atthe optimum. We also prove a superefficiency result thatdemonstrates it is a meaningful benchmark, acting as acomputational analogue of the Fisher information in sta-tistical estimation. We also discuss adaptive algorithmsthat optimally achieve these rates, showing that fast con-vergence is often possible.

John C. DuchiStanford UniversityDepartments of Statistics and Electrical [email protected]

Sabyasachi ChatterjeeUniversity of [email protected]

John LaffertyDepartment of StatisticsUniversity of [email protected]

Yuancheng ZhuUniversity of [email protected]

MS128

Efficient Second Order Online Learning by Sketch-ing

We propose Sketched Online Newton (SON), an online sec-ond order learning algorithm that enjoys substantially im-proved regret guarantees for ill-conditioned data. SON isan enhanced version of the Online Newton Step, which, viasketching techniques enjoys a running time linear in the di-mension and sketch size. We further develop sparse formsof the sketching methods (such as Oja’s rule), making thecomputation linear in the sparsity of features. Together,the algorithm eliminates all computational obstacles in pre-vious second order online learning approaches.

Alekh Agarwal

Page 132: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 221

Microsoft Research New [email protected]

Haipeng LuoMicrosoft [email protected]

MS128

Oracle Complexity of Second-Order Methods

Second-order iterative methods, which utilize Hessians aswell as gradients, are an integral part of the mathematicaloptimization toolbox, and have seen a resurgence of inter-est recently in the context of various large-scale optimiza-tion problems. In this talk, I’ll describe some recent resultson the oracle complexity of such methods, and their (possi-bly unexpected) limitations in improving on gradient-basedmethods in terms of worst-case guarantees.

Ohad Shamir, Yossi Arjevani, Ron ShiffWeizmann Institute of [email protected],[email protected], [email protected]

MS128

Tight Complexity Bounds for Optimizing Compos-ite Objectives

We provide tight upper and lower bounds on the complex-ity of minimizing the average of m convex functions usinggradient and prox oracles of the component functions. Weshow a significant gap between the complexity of determin-istic vs randomized optimization. For smooth functions, weshow that accelerated gradient descent and an acceleratedvariant of SVRG are optimal in the deterministic and ran-domized settings respectively, and that a gradient oracleis sufficient for the optimal rate. For non-smooth func-tions, having access to prox oracles reduces the complexityand we present optimal methods based on smoothing thatimprove over methods using just gradient accesses.

Blake WoodworthToyota Technological Institute [email protected]

Nati SrebroToyota Technological Institute at [email protected]

MS129

Switching Controls for Reaction-Diffusion Equa-tion

We present first steps towards an efficient numericalmethod for optimization of advection-diffusion processesdescribed by a partial differential equation, and subject tonon-smoothness arising from phase transitions and fromswitching controls. Our approach is based on a decomposi-tion approach to mixed-integer optimal control with com-binatorial constraints, and on a simultaneous optimizationframework, in which a discretized PDE enters the opti-mization problem as a nonlinear constraint. We discussthe application to a Rankine cycle process, widely estab-lished in process engineering for recovery of energy from anexhaust heat source, that has recently found novel use asa promising hybridization concept for heavy duty trucks.

Christian Kirches

Institute for Mathematical OptimizationBraunschweig University of [email protected]

Paul MannsTechnische Universitat [email protected]

MS129

On the (Non)Convexity of Different Mixed-IntegerOptimal Control Formulations

There is a huge variety of different approaches to solvemixed-integer optimal control problems. The differencesexist on several levels and can usually be combined toobtain the final numerical approach: 1) which method isused to solve the optimal control problem in the first place,e.g., a direct or indirect method, dynamic programming, ormoment based approaches; 2) which modeling techniquesare used to represent logical implications, e.g., bigM, dis-junctive programming, or vanishing constraints; 3) whichmethod is used to discretize the differential equations, e.g.,single shooting, multiple shooting, collocation and 4) canvariable substitutions or reformulations be applied, e.g.,a switching time optimization. We discuss the aspect of(non)convexity of some of these approaches and give in-sight on different of the aforementioned levels, e.g., mul-tiple shooting vs single shooting in the context of globaloptimal control and an “increase’ in nonconvexity due tothe enhanced time transformation technique.

Sebastian SagerOtto-von-Guericke Universitat Magdeburg, [email protected]

MS129

New Decomposition Strategies for Mixed-IntegerOptimal Control

Mixed-Integer Optimal Control Problems (MIOCPs) areoptimization problems that combine the difficulties of com-binatorial decisions with underlying dynamical systems.Usually, these combinatorial decisions are characterizedas switching decisions between the systems different op-erations modes. A common example is the optimal gearswitching of a truck.We consider a first discretize, then optimize ap-proach resulting in a mixed-integer nonlinear program(MINLP) combined with the outer convexification relax-ation method, where the binary variables only enter affinelyin the MIOCP. Sager [S. Sager. Numerical methods formixed-integer optimal control problems. Toenning, Lue-beck, Marburg: Der andere Verlag; 2005] proposed to de-compose the MINLP into a NLP and a MILP. In this studywe focus on new theoretical and numerical insights regard-ing this decomposition process. We mainly investigate theassociated MILP, also described as combinatorial integralapproximation (CIA) or control approximation problem inintegral sense.Commonly used approaches in this context are the sumup rounding strategy and branch and bound [S. Sager,M. Jung, C. Kirches. Combinatorial integral approxima-tion. Mathematical Methods of Operations Research 73.3(2011): 363-380.], which we both apply to a altered versionof the CIA.

Clemens ZelleOtto-von-Guericke UniversitaetMagdeburg, Germany

Page 133: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

222 OP17 Abstracts

[email protected]

Sebastian SagerOtto-von-Guericke Universitat Magdeburg, [email protected]

MS130

Decomposable Nonsmooth Optimization: Methodsand Applications

In this talk, we consider nonsmooth optimization problemswith decomposable structures. Such decomposable struc-tures include special structures such as the difference ofconvex decompositions and smooth compositions of nons-mooth convex functions. We will discuss versions of thebundle method for solving such problems. Preliminary nu-merical results will be presented and comparison of thesemethods with other methods of nonsmooth optimizationwill be reported. We will also consider some applicationsof decomposable nonsmooth optimization in machine learn-ing and regression analysis.

Adil BaghirovCIAO, School of ITMS,The University of [email protected]

MS130

A Chain Rule for Convexly Generated SpectralMax Functions

Eigenvalue optimization problems arise in the control ofcontinuous and discrete time dynamical systems. The spec-tral abscissa (the largest real part of an eigenvalue of a ma-trix) and the spectral radius (the largest eigenvalue in mod-ulus) are examples of functions of eigenvalues, or spectralfunctions, connected to such problems. More specifically,the abscissa and radius are spectral max functions—themaximum of a real-valued function over the spectrum ofa matrix. In 2001, Burke and Overton characterized theregular subdifferential of the spectral abscissa and showedthat the spectral abscissa is subdifferentially regular in thesense of Clarke when all active eigenvalues are nonderoga-tory. In this talk, we describe the new techniques used toobtain these results for the a more general class of con-vexly generated spectral max functions, and demonstratetheir application to the spectral radius.

Julia EatonUniversity of [email protected]

James V. BurkeUniversity of WashingtonDepartment of [email protected]

MS130

Epsilon Subdifferential Computation in Computa-tional Convex Analysis

Computational convex analysis focuses on the computationof convex operators that routinely appear in convex analy-sis e.g. Legendre-Fenchel conjugate, Moreau envelope, etc.After recalling what computation the (open source) CCAnumerical library allows, we will switch the focus from con-vex transforms like the Legendre-Fenchel transform to con-

vex operators like the subdifferential. We will present re-sults on efficiently computing the epsilon subdifferential ofa convex univariate function.

Yves [email protected]

Warren HareUniversity of British Columbia, Okanagan [email protected]

Anuj BajajWayne State [email protected]

MS130

The Nesterov Smoothing Technique and Minimiz-ing Differences of Convex Functions with Applica-tions to Facility Location and Clustering

In this talk, we propose new optimization algorithms tosolve a number of problems in multifacility location andhierarchical clustering. Our methods are based on the Nes-terov smoothing technique and algorithms for minimizingdifferences of convex functions. The algorithms obtainedimprove existing methods for solving these problems.

Mau Nam NguyenFariborz Maseeh Department of Mathematics andStatisticsPortland State [email protected]

Wondi GeremewStockton [email protected]

Sam Reynolds, Tuyen TranPortland State [email protected], [email protected]

MS131

Network-Flow Polytopes Satisfy the Hirsch Con-jecture

We solve a problem in the combinatorics of polyhedra mo-tivated by the network simplex method. We show that theHirsch conjecture holds for the diameter of the graphs ofall network-flow polytopes, in particular the diameter of anetwork-flow polytope for a network with n nodes and marcs is never more than m + n − 1. A key step to provethis is to show the same result for classical transportationpolytopes.

Steffen BorgwardtUniversity of Colorado DenverDepartment of Mathematical and Statistical [email protected]

MS131

Superlinear Diameters for Combinatorial Abstrac-tions of Polyhedra

The asymptotic growth of the diameters of polyhedra andtheir combinatorial abstractions is relevant to the efficiencyof the simplex method for linear optimization. In this talk,we outline the construction of a combinatorial abstraction

Page 134: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 223

of polyhedra whose diameter is superlinear, which givesevidence against the Linear Hirsch Conjecture. This is jointwork with Tristram C. Bogart.

Edward KimUniversity of Wisconsin-La [email protected]

MS131

On the Circuit Diameter Conjecture

A key concept in optimization is the combinatorial diam-eter of a polyhedron. From the point of view of optimiza-tion, we would like to relate it to the number of facets fand dimension d of the polyhedron. In the seminal paper ofKlee and Walkup, the Hirsch conjecture, that the bound isf−d, was shown to be equivalent to several seemingly sim-pler statements, and was disproved for unbounded polyhe-dra through the construction of a particular 4-dimensionalpolyhedron with 8 facets. The Hirsch bound for polytopeswas only recently narrowly exceed by Santos. We consideranalogous questions for a variant of the combinatorial di-ameter called the circuit diameter. In this variant, pathsare built from the circuit directions of the polyhedron, andcan travel through the interior. We are able to recoverthe equivalence results that hold in the combinatorial case.Further, we show that validity of the circuit analogue ofthe non-revisiting conjecture for polytopes would imply alinear bound on the circuit diameter of all unbounded poly-hedra. Finally, we prove a circuit version of the 4-stepconjecture. Our methods require adapting the notion ofsimplicity to work with circuits. We show that it sufficesto consider such circuit simple polyhedra for studying cir-cuit analogues of the Hirsch conjecture, and use this toprove the equivalences of the different variants. This isjoint work with S. Borgwardt and T. Yusun.

Steffen BorgwardtUniversity of Colorado DenverDepartment of Mathematical and Statistical [email protected]

Tamon Stephen, Timothy YusunSimon Fraser [email protected], [email protected]

MS131

A New Model for Realization Spaces of Polytopes

The theory of psd lifts of polytopes offers a new model forrealization spaces of polytopes via the slack ideal associatedto the polytope. In this talk I will discuss this model andvarious features of it.

Rekha ThomasUniversity of [email protected]

Joao GouveiaUniversidade de [email protected]

Antonio MacchioUniversity of [email protected]

Amy WiebeUniversity of Washington

[email protected]

MS132

Classical Approximation Algorithms for QuantumConstraint Satisfaction Problems

The study of approximation algorithms for Boolean satisfi-ability problems such as MAX-k-SAT is a well-establishedline of research. In the quantum setting, there is a phys-ically motivated generalization of MAX-k-SAT known asthe k-Local Hamiltonian problem (k-LH), which is of in-terest for two reasons: From a complexity theoretic per-spective, k-LH is complete for the quantum analogue ofNP, and from a physics perspective, k-LH asks one to es-timate the energy of a quantum system when cooled tovery low temperatures. For the latter reason in particu-lar, the condensed matter physics community has devoteddecades to developing heuristic algorithms for k-LH and re-lated problems. However, recent years have witnessed thedevelopment of the first classical approximation algorithmsfor k-LH. This talk will overview and discuss some of theseexisting results, as well as (time permitting) preview recentwork in progress on generalizing the Goemans-Williamsonalgorithm for MAX-CUT to approximate physically moti-vated special cases of k-LH (joint work with Yi-Kai Liu(NIST, USA)).

Sevag GharibianVirginia Commonwealth [email protected]

MS132

Parameter Tuning for Optimization in QuantumAnnealing Processors

D-Wave quantum computing systems employ quantum an-nealing, a computing paradigm related to simulated an-nealing but different in several important ways. In thistalk I will provide an overview of the D-Wave computingframework from the perspective of a classical optimizationresearcher: How can I make this thing solve a constraintsatisfaction problem? I will then give a gentle introductionto practicalities. How can we minimize the effect of noiseand error when solving a problem? Finally, I will introducea more advanced perspective: What is a quantum Boltz-mann distribution, and what can it do for you? This talkis intended for a general optimization audience.

Andrew KingD-Wave [email protected]

MS132

Analyzing Quantum Cryptographic Protocols us-ing Semidefinite Programming

Die-rolling is the cryptographic task where two mistrustful,remote parties wish to generate a random D-sided die-rollover a communication channel. Optimal quantum proto-cols for this task have been given by Aharon and Silman(New Journal of Physics, 2010) but are based on optimalweak coin-flipping protocols which are currently very com-plicated and not very well understood. In this talk, I willfirst present very simple classical protocols for die-rollingwhich have decent (and sometimes optimal) security whichis in stark contrast to many other two-party cryptographictasks. I will also present quantum protocols and discusshow to analyze their security using semidefinite program-

Page 135: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

224 OP17 Abstracts

ming. By modifying optimal solutions to these semidefiniteprograms, I will show how to design quantum protocolswhich have security provably better than the given classi-cal protocols.

Jamie SikoraCentre for Quantum TechnologiesNational University of [email protected]

MS132

Quantum Speed-Ups for Semidefinite Program-ming

We give a quantum algorithm for solving semidefiniteprograms (SDPs). It has worst case running time

n12m

12 s poly(log(n), log(m),R, 1/δ), with n and s the di-

mension and sparsity of the input matrices, respectively, mthe number of constraints, δ the accuracy of the solution,and R an upper bound on the trace of the optimal solu-tion. This gives a square-root unconditional speed-up overany classical method for solving SDPs both in n and m.We prove the algorithm cannot be substantially improved

giving a Ω(n12 + m

12 ) quantum lower bound for solving

semidefinite programs with constant s, R and δ. In someinstances, the algorithm offers even exponential speedups.

Krysta SvoreMicrosoft [email protected]

MS133

Perspective-Based Proximal Data Processing

We show that many problems in data processing involveconvex perspective functions. We analyze pertinent prop-erties of such functions and show how to handle themin proximal algorithms. Applications to high-dimensionalstatistics are presented.

Patrick L. CombettesDepartment of MathematicsNorth Carolina State [email protected]

MS133

Efficient Bayesian Computation by ProximalMarkov Chain Monte Carlo: When Langevin meetsMoreau

Modern imaging methods rely strongly on Bayesian infer-ence techniques to solve challenging problems. Currently,the predominant Bayesian computation approach is convexoptimisation, which scales efficiently to high dimensionsand delivers accurate point estimation results. However,to perform more advanced analyses it is necessary to usemore computationally intensive techniques such as Markovchain Monte Carlo (MCMC) methods. This paper presentsa new and highly efficient MCMC methodology to performBayesian computation for high dimensional models that arelog-concave and non-smooth, a class of models that is keyin imaging sciences. The method is based on a regularisedunadjusted Langevin algorithm that uses Moreau-Yoshidaenvelopes and proximal operators to construct Markovchains with favourable convergence properties. In addi-tion to scaling efficiently to high dimensions, the methodis straightforward to apply to models that are currently

solved by proximal optimisation. We provide a detailedtheoretical analysis of the method, including asymptoticand non-asymptotic convergence results with easily veri-fiable conditions, and explicit bounds on the convergencerates. The methodology is demonstrated with experimentsrelated to image deconvolution and tomographic recon-struction where we conduct challenging Bayesian analysesrelated to uncertainty quantification and model selectionwithout ground truth available. Joint work with Dr. AlainDurmus and Prof. Eric Moulines.

Marcelo PereyraSchool of MathematicsUniversity of [email protected]

MS133

Estimating High Dimensional Generalized Addi-tive Models via Proximal Methods

In this talk, we discuss a general framework for sparseadditive regression. We allow the class of each additivecomponent to be quite general (characterized by semi-norm smoothness this includes monotonicity of derivatives,sobolev/holder smoothness). We show that by minimizinga simple convex problem, we can estimate these functionsat the minimax rate (over functions in that smooth additiveclass). In addition we show that the penalized regressionproblem can be efficiently solved using a proximal gradi-ent descent algorithm: Each prox-step decouples into p-univariate penalized regression problems; each of these uni-variate penalized problems can in turn be written as a sim-ple update of the solution to a univariate non-parametricregression problem (for which we often have efficient algo-rithms). For some important classes (eg. total-variationsmoothness), this means the additive model can be fit inroughly the same order of computation as the LASSO ( n*poperations per iteration), and in practice can scale easily toany dataset which fits in memory. In addition, because theprox-step splits over the features, our algorithm is simplyto parallelize.

Noah SimonDepartment of BiostatisticsUniversity of [email protected]

MS133

Alternating Projections, ADMM, and Parallel Co-ordinate Descent

Abstract not available at time of publication.

Ryan TibshiraniDepartment of Statistics and Machine LearningDepartmentCarnegie Mellon [email protected]

MS134

A Unified Optimization View on GeneralizedMatching Pursuit and Frank-Wolfe

Two of the most fundamental prototypes of greedy opti-mization are the matching pursuit and Frank- Wolfe algo-rithms. In this paper we take a unified view on both classesof methods, leading to the first explicit convergence ratesof matching pursuit methods in an optimization sense, for

Page 136: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 225

general sets of atoms. We derive sublinear (1/t) conver-gence for both classes on general smooth objectives, andlinear convergence on strongly convex objectives, as wellas a clear correspondence of algorithm variants. Our pre-sented algorithms and rates are affine invariant, and do notneed any incoherence or sparsity assumptions.

Francesco LocatelloETH [email protected]

Rajiv KhannaUT [email protected]

Michael TschannenETH [email protected]

Martin [email protected]

MS134

Three Variations on a Frank-Wolfe Theme

In this presentation, we discuss two frameworks where anadaptation of the Frank-Wolfe linearization strategy forsolving monotone variational inequalities achieves fast (atleast linear) convergence. This analysis yields insight intothe determination of efficient FW directions.

Patrice MarcotteDIRO, CRTUniversite de [email protected]

MS134

Forward-Backward Methods for Atomic NormConstrained Optimization

In many signal processing and machine learning applica-tions, the aim is to reconstruct a signal that has a sim-ple representation with respect to a certain basis or frame.Fundamental elements of the basis known as atoms allowus to define atomic norms that can be used to formulateconvex regularizations for the reconstruction problem. Ef-ficient algorithms are available to solve these formulationsin certain special cases, but an approach that works wellfor general atomic norms, both in terms of speed and re-construction accuracy, remains to be found. I will de-scribe an algorithm that produces solutions with succinctatomic representations for reconstruction problems. Thealgorithm (called CoGEnT) combines the Conditional Gra-dient method with additional enhancement steps to reducethe basis size. Experimental evidence suggests that sucha method is superior to standard greedy approaches onseveral problems of interest, while also retaining the con-vergence guarantees of the standard Conditional Gradientmethods.

Nikhil RaoDept. of ECEUniversity of [email protected]

MS134

Conditional Gradient Sliding for Convex Optimiza-

tion

In this talk, we present a new conditional gradient typemethod for convex optimization by calling a linear opti-mization (LO) oracle to minimize a series of linear func-tions over the feasible set. Different from the classic con-ditional gradient method, the conditional gradient sliding(CGS) algorithm developed herein can skip the computa-tion of gradients from time to time, and as a result, canachieve the optimal complexity bounds in terms of not onlythe number of calls to the LO oracle, but also the num-ber of gradient evaluations. More specifically, we showthat the CGS method requires O(1/

√ε) and O(log(1/ε))

gradient evaluations, respectively, for solving smooth andstrongly convex problems, while still maintaining the opti-mal O(1/ε) bound on the number of calls to LO oracle. Wealso develop variants of the CGS method which can achievethe optimal complexity bounds for solving stochastic opti-mization problems and an important class of saddle pointoptimization problems. To the best of our knowledge, thisis the first time that these types of projection-free optimalfirst-order methods have been developed in the literature.Some preliminary numerical results have also been pro-vided to demonstrate the advantages of the CGS method.

Guanghui LanGeorgia Techtba

Yi ZhouUniversity of Floridasonrisa [email protected]

PP1

An Optimal Investment Strategy with MaximalRisk Aversion

In this work we consider the problem of an insurance com-pany where the wealth of the insurer is described by aCramer-Lundberg process. The insurer has the possibil-ity to invest in an incomplete market that includes a riskyasset with stochastic volatility and a bank account. Therisky asset is correlated to an economic factor modeled as adiffusion process. Under the exponential utility function aHamilton-Jacobi-Bellman equation is solved to obtain theoptimal investment strategy maximizing the expected util-ity function for insurers. We suggest a mixed Finite Differ-ence Monte-Carlo method to solve numerically the (HJB)equation. In order to show the connection between theInsurers decision and the correlation factor several simula-tions are presented in the case of Scott model.

Mohamed BadaouiNational Polytechnic Institute, IPNEscuela Superior de Ingenierıa Mecanica y Electrica,[email protected]

PP1

Compact Representation of the Full Broyden Classof Quasi-Newton Updates

Quasi-newton methods are an effective tool used to find thelocal maxima or minima of a function when second orderinformation is unavailable or too expensive. Compact rep-resentations of these approximations allow us to efficientlycompute their eigenvalues thereby giving us the ability toperform sensitivity analysis on the resultant matrices. Inthis talk we present the compact representation for ma-

Page 137: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

226 OP17 Abstracts

trices belonging to the full Broyden class of quasi-Newtonupdates allowing for different members of the Broyden classto be used at each iteration. Furthermore, we demon-strate how to compute the inverse compact representationof these matrices and demonstrate through numerical ex-periments how they can be used to efficiently solve linearsystems involving Broyden class matrices to high accuracy.

Omar DeGuchyApplied Mathematics UnitUniversity of California, [email protected]

Jennifer ErwayDepartment of MathematicsWake Forest [email protected]

Roummel F. MarciaUniversity of California, [email protected]

PP1

Working Breakdown in a Repairable Queue withFixed Size Bulk Service and Variable Breakdownand Variable Repairable Rates

This paper considers the repairable queuing system withservice in a fixed batch size with one operating unit. Itis assumed that the operating units may breakdown withdifferent rates while in operation, either in idle state or inbusy state, respectively, according to the Poisson manner.When the system is down, it under-goes two processes,first a substitute server is serving with slower rate withsame fixed batch size rule, we call this state as ‘workingbreakdown state’ and secondly, a repairman will repair thedefective server, where the repair time follows an exponen-tial distribution with variable repairable rate depending onthe substitute server’s operational state, whether it is idleor busy, respectively. The customers are arriving to thesystem according to the Poisson manner depending on thestate of the server (idle, busy, idle working breakdown andbusy working breakdown state). A customer on arrival ei-ther decide to join the system with some probability or balkimmediately. We use probability generating function tech-nique to derive the explicit expressions for the steady stateprobabilities of the system length, the steady state avail-ability of the server and various performance measures ofthe model.

Gopal K. Gupta, A. BanerjeeDept. Of Mathematical ScienceIIT (BHU) , Varanasi, [email protected], [email protected]

PP1

Second Order Riemannian Methods for Low-RankTensor Completion

The goal of tensor completion is to fill in missing entriesof a partially known tensor (possibly inculding some noise)under a low-rank constraint. This may be formulated as aleast-squares problem. The set of tensors of a given mul-tilinear rank is known to admit a Riemannian manifoldstructure, thus methods of Riemannian optimization areapplicable. In our work, we derive the Riemannian Hes-sian of an objective function on the low-rank tensor mani-

folds and discuss the convergence properties of second ordermethods for the tensor completion problem, both theoret-ically and numerically. We compare our approach to Rie-mannian tensor completion methods from recent literature.Our examples include the recovery of multidimensional im-ages, approximation of multivariate functions and recoveryof partially missing data from survey statistics.

Gennadij HeidelTrier [email protected]

Volker H. SchulzUniversity of TrierDepartment of [email protected]

PP1

Modeling and Optimization of New Feeding Sys-tems in the Pig Industry: Linear and BilinearProblems

Feed represents 70% of the production cost in the pig indus-try,so, in the current economic context, it is important toreduce it. This study is mainly focus on modeling the feed-ing system that satisfy all animal requirements. Currently,farmers use a three phase feeding system. For each phasea single feed is used and the energy density of each one isfixed. In that case,the associated mathematical model islinear. In a previous study, we found a method that reducesfeeding cost by 4.0%. The idea behind this feeding systemis to take two feeds and to blend them daily in order to sat-isfy the requirements. The mathematical model associatedto it is a bilinear model. Recently we propose a new feed-ing system that mix both methods (feeding system usingphases and feeding system using two feeds). The conceptis the following:for each phase, one is a feeding system us-ing two feeds (optimized at the same time as the total costminimization and not necessarily complete) which will beblended together to satisfy the daily animal requirements.For example, for three phases, one will use four feeds (A,B, C, and D):A and B will be used in the first phase, Band C in the second phase and finally C and D in the lastphase. Compared to the traditional feeding system, thisleads to a feed cost reduction of 5.2%. All these modelsare described in this poster as well as the numerical re-sults. We also included tricriterion model and results thatminimize feeding cost, and excretion.

Emilie Joannopoulos

Insa de Rennes / Universite de [email protected]

Francois Dubeau, Jean-Pierre DussaultUniversite de [email protected],[email protected]

Mounir HaddouInsa de [email protected]

Candido PomarAgriculture and Agri-Food [email protected]

PP1

A Weighted Random Forest Algorithm to Improve

Page 138: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

OP17 Abstracts 227

Classification Performance

Random forest (RF) is a machine learning methodfor classification problems. It applies bootstrap andrandom attribution selection to construct a numberof trees for prediction. In original RF, each tree hasequal weights during aggregation. In reality, sometrees always perform better than others. So weightedrandom forest (wRF) with heavier weight for betterperformance tree may potentially improve the overall pre-dictive performance. The cardiotocography dataset fromhttps://archive.ics.uci.edu/ml/datasets/Cardiotocographywas downloaded to compare the predictive performanceof original RF and wRF. Half of the data was used asthe training data to construct the RF model. The otherhalf of the data was the testing dataset to test the RFmodel and computer the predictive accuracy for eachtree. The predictive accuracy was then used to calculatethe aggregation weight for each tree. The weight wascalculated according the logistic regression curve withpredictive accuracy as the independent variable. With theweights calculated, the testing dataset was again classifiedwith the RF model, but this time the aggregation wasdone with weighted vote from each tree. The predictiveaccuracy rates for original RF and wRF are 94.08% and96.15% respectively. The results show improved predictiveperformance.

Weihui LiWentworth Institute of [email protected]

Xiaxin Meiwentworth institute of [email protected]

PP1

Conic Program Certificates from ADMM/Douglas-Rachford Splitting

Many practical optimization problems are formulated asconic programs, a current active area of research. Onechallenge with conic programs is that one often does notknow, a priori, whether there is a solution to the conicprogram; it may be infeasible, unbounded or has a finitebut unattainable optimal value. Solvers like SeDuMi andSDPT3 provide certificates for strong infeasibility, but theystruggle with weak infeasibility. To resolve this, we proposea method that finds (1) a solution when one exists (2) acertificate of strong/weak infeasibility when the problemis strongly/weakly infeasible (3) an unbounded directionwhen there is one and (4) a sufficient condition for iden-tifying programs with finite but unattainable optimal val-ues. The method is based on Douglas-Rachford Splitting(ADMM), and we establish the efficacy of it through the-oretical analysis and numerical experiments.

Yanli Liu, Ernest RyuUniversity of California, Los AngelesDepartment of [email protected], [email protected]

Wotao YinUniversity of California, Los [email protected]

PP1

Hybrid Method in Solving Large-Scale

Alternating-Current Optimal Power Flows

Many steady-state problems in power systems, includingrectangular power-voltage formulations of optimal powerflows in the alternating-current model (ACOPF), canbe cast as non-convex polynomial optimization problems(POP). For a POP, one can derive strong convex relax-ations, or rather hierarchies of ever stronger, but ever largerrelaxations. Readily available second-order methods forsolving convex relaxations on real-world large-scale powersystems have been popular due to the fast quadratic localconvergence. However, they usually suffer from three mainissues, namely high computational costs, prohibitive mem-ory requirements and convergence to local optima. Ran-domized first-order methods (RMs) have been successfulin machine learning, due to their much lower per-iterationcomputational and memory requirements, convergence tothe optima of convex problems with high probability, andacceleration with the parallel/distributed variants. Wehence study means of switching from RMs for solving a con-vex relaxation to Newtons method working on the originalnon-convex problem, which allows for convergence underthe same conditions as in solvers for the convex relaxation,but with an improved rate of convergence. We also proposea backtracking framework which allows the practical searchfor globally optimal solutions. We illustrate our approachon the ACOPF using Polish instances and demonstrate thebenefits of employing the hybrid schema.

Jie LiuLehigh [email protected]

Alan LiddellUniversity of Notre DameDept. Appl. Comp. Math. [email protected]

Jakub MarecekIBM Research [email protected]

Martin TakacUniversity of [email protected]

PP1

Stochastic Optimization Framework for a FreshFruit Supply Chain

We present a stochastic optimization framework for thedragon fruit supply chain in Vietnam. The dragon fruitsupply chain faces several uncertainties in yields, price,and demand, being highly sensitive to weather conditionsand global uncertainty factors. The risk factors in thedragon fruit supply chain also depend on species for ex-ample, the red varieties, while more profitable than thewhite varieties, also have higher export risk because theyare subject to global prices and adverse geopolitical condi-tions. The stochastic planning model framework is devel-oped based on the two-stage stochastic approach suggestedfrom Darby-Dowman et al. (2000). In the first stage, wedetermine optimal planting plans for dragon fruit undersecond-stage uncertainties. The second stage optimiza-tion problem looks at optimizing harvest and distributionthrough standard trader channels. The timing of sowingand harvesting decisions is taken into account, because thedragon fruit plant has a six year life-cycle with maximumaverage yield in the fourth year. Another modelling consid-

Page 139: OP17 AbstractsOP17 Abstracts 91 local patterns, such as triangles or 4-cliques. Such local measurements are scalable and provide uniquely measur-ablepropertiesofthegraph

228 OP17 Abstracts

eration in this industry is the prevalence of forward-buyingcontracts with locked-in prices. Some preliminary compu-tational results will be discussed.

Tri-Dung NguyenDalhousie UniversityIndustrial Engineering Department, Faculty [email protected]

Uday VenkatadriDalhousie UniversityIndustrial Engineering Department, Faculty ofEngineering [email protected]

Tri Nguyen-QuangDalhousie UniversityBiofluids and Biosystems Modeling Lab. (BBML)[email protected]

Claver DialloDalhousie UniversityIndustrial Engineering Department, Faculty ofEngineering [email protected]

PP1

Hybrid Hu-Storey Method for Large-Scale Nonlin-ear Monotone Systems

Nonlinear monotone systems arise in various practical sit-uations and applications, so many iterative methods forsolving these systems have been developed. We propose ahybrid method for solving large-scale monotone systems,which is based on derivative-free conjugate gradient ap-proach and hyperplane projection technique. The conju-gate gradient approach is efficient for large-scale systemsdue to low memory, while projection method is suitable formonotone equations because it enables simply globaliza-tion. The derivative-free, function-value-based line searchis combined with Hu-Storey search direction and projec-tion procedure, in order to construct a globally convergentmethod. Numerical experiments indicate the robustness ofproposed method.

Sanja Rapajic, Papp ZoltanUniversity of Novi Sad, Faculty of Sciences,Department of Mathematics and [email protected], [email protected]

PP1

Bacterial Community Analysis via Convex Opti-mization

Metagenomics is the study of communities of bacteriathrough all their sampled DNA. One of the main problemsof interest is identifying the presence and relative abun-dance of organisms in a given environmental sample. Pre-vious approaches, namely the algorithm Quikr and subse-quent iterations, have cast this problem as an �2 regular-ized �1 regression (minx ||x||1 +λ||Ax−y||22) which choosesthe most parsimonious mixture of columns of the train-ing database (encoded in the matrix A) that fit the data(encoded by k-mer counts in the vector y). However, thematrix A can have similar if not indistinguishable columns.One way to capture the uncertainty about which columnsof A to choose is to instead use the Ordered weighted �1-norm (OWL-norm) instead of the �1-norm and recast the

problem as �2 regularized OWL-norm regression. We re-duce this problem to a Non-negative least square optimiza-tion with linear inequality constraints and investigate anactive set algorithm to solve this problem efficiently.

David Koslicki, Hooman ZabetiOregon State [email protected], [email protected]

PP1

PDE Constrained Optimal Control of HydropowerSystems

Hydropower is the most widely used renewable source ofenergy, however the operations present many challenges.We formulate the reservoir operation problem as a con-tinuous in time optimal control problem to determine thereservoir outflow which maximizes the revenue produced,yet also meets long term planning targets. The problemis constrained by the PDE dynamics within the channel.Additionally we have system state bound and ramping con-straints. We solve for the optimal control using the adjointapproach and show numerical results for a test problem.

John Zaheer, Nathan GibsonOregon State [email protected], [email protected]