### Workshop C3 - Continuous Optimization

**Organizers:** Javier Peña (Carnegie Mellon University, USA) - Coralia Cartis (University of Oxford, UK) - Etienne de Klerk (Tilburg University, Netherlands)

## Talks

July 17, 14:30 ~ 15:00 - Room B1

## Translative packings in three dimensions

### Frank Vallentin

### University of Cologne, Germany - frank.vallentin@uni-koeln.de

Finding optimal packings of spheres and more generally of nonspherical objects is a central problem in discrete geometry which gives rise to challenging optimization problems. In this talk I will show new lower and new upper bounds for the optimal density of translative packings of superballs (unit balls of $l_p$-norms) in three dimensions. For finding new lower bounds we use Minkowski’s classical approach based on nonlinear optimization. For finding new upper bounds we use the Cohn-Elkies bound based on linear optimization.

Joint work with M. Dostert, C. Guzm\'an and F.M. de Oliveira Filho.

July 17, 15:00 ~ 15:30 - Room B1

## New upper bounds for the kissing number via copositive optimization

### Juan Vera

### Tilburg University, Netherlands - j.c.veralizcano@uvt.nl

We build a hierarchy of upper bounds on the kissing number using copositive optimization. Recently, it has been shown that the kissing number can be reformulated as an infinite dimensional optimization problem over the set of copositive kernels on a sphere.

To obtain a hierarchy for the kissing number, we extend existing sdp-based hierarchies for the finite dimensional copositive cone to the infinite dimensional one; obtaining a hierarchy based on (infinite dimensional) sdp-kernels. Then we exploit the symmetry of the sphere to obtain a finite dimensional SDP formulation.

Joint work with Olga Kuryatnikova (Tilburg University, The Netherlands).

July 17, 15:30 ~ 16:00 - Room B1

## Semidefinite Programs with a Dash of Smoothness: Why and When the Low-Rank Approach Works

### Nicolas Boumal

### Princeton University, États-Unis - nicolasboumal@gmail.com

Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via low-rank, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably.

In this presentation, we show that the Burer-Monteiro formulation of SDPs in a certain class almost never has any spurious local optima, that is: the non-convexity of the low-rank formulation is benign (even saddles are strict). This class of SDPs covers applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations.

The crucial assumption we make is that the low-rank problem lives on a manifold. Then, theory and algorithms from optimization on manifolds can be used.

Optimization on manifolds is about minimizing a cost function over a smooth manifold, such as spheres, low-rank matrices, orthonormal frames, rotations, etc. We will present the basic framework as well as parts of the more general convergence theory, including recent complexity results. (Toolbox: http://www.manopt.org)

Joint work with P.-A. Absil (Université catholique de Louvain), A. Bandeira (Courant Institute, NYU), C. Cartis (Oxford University) and V. Voroninski (formerly MIT).

July 17, 16:00 ~ 16:30 - Room B1

## A Linear Programming-Based Algorithm for Solving Semidefinite Programs

### Amir Ali Ahmadi

### Princeton University, USA - a_a_a@princeton.edu

We show that under mild assumptions, any semidefinite program can be solved by a sequence of linear programs of fixed (and small) size.

Joint work with Georgina Hall (Princeton University, USA).

July 17, 17:00 - Room B1

## Piecewise Parametric Structure in the Pooling Problem – from Sparse, Strongly-Polynomial Solutions to NP-Hardness

### Ruth Misener

### Imperial College London, United Kingdom - r.misener@imperial.ac.uk

Standard pooling is an NP-hard, non-convex blending problem that arises in process networks applications such as crude oil scheduling and more widely as a general problem pattern [Ceccon et al., AIChE J, 2016]. Unfortunately, state-of-the-art general solvers do not scale well to large-scale instances of the problem which are commercially-relevant.

In this talk we introduce an algorithm that solves specialized pooling problem instances to global optimality and integrate it within a Branch and Bound framework for more generic instances. The approach parameterizes the optimization problem with respect to the pool concentration variables and uncovers embedded sparsity and polyhedral/topological properties for a variety of instances. We generalize and extend recent work analyzing the computational complexity of the pooling problem [Haugland, J Glob Optim, 2016; Boland et al., J Glob Optim, 2016; Haugland and Hendrix, J Optim Theory Appl, 2016]. Our analysis also integrates source-to-output streams and both upper and lower bounds on the network parameters.

Joint work with Radu Baltean-Lugojan.

July 17, 17:30 ~ 18:00 - Room B1

## Manifold Sampling for Piecewise Linear Composite Nonconvex Optimization

### Stefan Wild

### Argonne National Laboratory, USA - wild@anl.gov

We address minimization of a function $\psi(x)+h(F(x))$, where $\psi$ and $F$ are smooth functions, $h$ is piecewise linear, and the Jacobian of $F$ is unavailable. We are motivated by cases where $F$ represents the outputs of an expensive blackbox simulation and $h$ represents a nonsmooth loss function.

We employ a derivative-free trust-region framework, whereby smooth models are built to approximate $F$ using zero-order information. Under minimal assumptions, we show that our resulting ``master model" generates descent directions and that the algorithm's cluster points are Clarke stationary. Numerical experiments demonstrate the benefit of the proposed algorithm as well as current limitations to achieving better performance in practice.

Joint work with Jeff Larson (Argonne National Laboratory) and Kamil Khan (McMaster University).

July 17, 18:00 ~ 18:30 - Room B1

## Direct search based on probabilistic feasible descent for bound and linearly constrained problems

### Luis Nunes Vicente

### University of Coimbra, Portugal - lnv@mat.uc.pt

Direct search is a methodology for derivative-free optimization whose iterations are characterized by evaluating the objective function using a set of polling directions. In deterministic direct search applied to smooth objectives, these directions must somehow conform to the geometry of the feasible region and typically consist of positive generators of approximate tangent cones (which then renders the corresponding methods globally convergent in the linearly constrained case). One knows however from the unconstrained case that randomly generating the polling directions leads to better complexity bounds as well as to gains in numerical efficiency, and it becomes then natural to consider random generation also in the presence of constraints.

In this paper, we study a class of direct search based on sufficient decrease for solving smooth linearly constrained problems where the polling directions are randomly generated (in approximate tangent cones). The random polling directions must satisfy probabilistic feasible descent, a concept which reduces to probabilistic descent in the absence of constraints. Such a property is instrumental in establishing almost-sure global convergence and worst-case complexity bounds with overwhelming probability. Preliminary numerical results show that the randomization of polling directions compares favorably to the classical deterministic approach.

Joint work with Serge Gratton (Toulouse), Clément W. Royer (Madison), Zaikun Zhang (Hong Kong).

July 17, 18:30 ~ 19:00 - Room B1

## Faster Algorithms for Computing the Stationary Distribution

### Aaron Sidford

### Stanford University, United States - sidford@stanford.edu

Computing the stationary distribution of a Markov Chain is one of the most fundamental problems in optimization. It lies at the heart of numerous computational tasks including computing personalized PageRank vectors, evaluating the utility of policies in Markov decision process, and solving asymmetric diagonally dominant linear systems.

Despite the ubiquity of these problems, until recently the fastest known running times for computing the stationary distribution either depended polynomially on the mixing time or appealed to generic linear system solving machinery, and therefore ran in super-quadratic time. In this talk I will present recent results showing that the stationary distribution and related quantities can all be computed in almost linear time. Moreover, I will discuss connections between this work and recent developments in solving Laplacian systems and optimizing over directed graphs.

Joint work with Michael B. Cohen (MIT, United States), Jonathan Kelner (MIT, United States), John Peebles (MIT, United States), Richard Peng (Georgia Tech, United States) and Adrian Vladu (MIT, United States).

July 17, 19:00 ~ 19:30 - Room B1

## Solving chance-constrained problems using a kernel VaR estimator

### Andreas Waechter

### Northwestern University, United States - andreasw.waechter@northwestern.edu

We present a reformulation of nonlinear optimization problems with chance constraints that replaces a probabilistic constraint by a nonparametric value-at-risk estimator. The estimator smooths the historical VaR via a kernel, resulting in a better approximation of the quantile and reducing the nonconvexity of the feasible region. An optimization algorithm is proposed that permits the efficient treatment of joint chance constraints. Theoretical convergence properties and numerical experiments are presented.

Joint work with Alejandra Pena-Ordieres (Northwestern University, USA).

July 18, 14:30 ~ 15:00 - Room B1

## Efficiency of minimizing compositions of convex functions and smooth maps

### Dmitriy Drusvyatskiy

### University of Washington, United States - ddrusv@uw.edu

Numerous optimization tasks can be posed as minimization of a finite convex function composed with a smooth map. I will discuss various aspects of this problem class, including efficiency of first-order methods, stochastic algorithms for large finite sums, fast local rates of convergence, and termination criteria.

Joint work with Alexander D. Ioffe (Technion), Adrian S. Lewis (Cornell University) and Courtney Paquette (U. Washington).

July 18, 15:00 ~ 15:30 - Room B1

## Sharpness, restart and acceleration.

### Alexandre d'Aspremont

### CNRS - Ecole Normale Supérieure, France - aspremon@ens.fr

The Lojasievicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost generically. Here, we show that sharpness directly controls the performance of restart schemes. The constants quantifying sharpness are of course unobservable, but optimal restart strategies are fairly robust, hence searching for the best scheme only increases the complexity by a logarithmic factor versus the optimal bound. Overall, this shows that restart schemes generically accelerate accelerated methods.

Joint work with Vincent Roulet.

July 18, 15:30 ~ 16:20 - Room B1

## Online First-Order Framework for Robust Convex Optimization

### Fatma Kilinc-Karzan

### Carnegie Mellon University, USA - fkilinc@andrew.cmu.edu

Robust optimization (RO) has emerged as one of the leading paradigms to efficiently model parameter uncertainty. The recent connections between RO and problems in statistics and machine learning domains demand for solving RO problems in ever more larger scale. However, the traditional approaches for solving RO formulations based on building and solving robust counterparts or the iterative approaches utilizing nominal feasibility oracles can be prohibitively expensive and thus significantly hinder the scalability of RO paradigm.

We present a general and flexible iterative framework to solve robust convex optimization problems that is built on a fully online first-order paradigm. In contrast to the existing literature, a key distinguishing feature of our approach is that it requires access to only cheap first-order oracles for the original problem that are remarkably cheaper than pessimization or nominal feasibility oracles, while maintaining the same convergence rates. This, in particular, makes our approach much more scalable and hence preferable in large-scale applications. In the case of strongly convex functions, we also establish a new improved iteration complexity addressing an open question in the literature. We demonstrate our approach on robust quadratic programs with ellipsoidal uncertainty.

Joint work with Nam Ho-Nguyen (Carnegie Mellon University, USA).

July 18, 17:00 ~ 17:30 - Room B1

## Performance estimation of first-order methods for composite convex optimization

### François Glineur

### Université catholique de Louvain, Belgium - francois.glineur@uclouvain.be

Composite convex optimization consists in the minimization of a convex objective function equal to the sum of several convex functions with different properties, for example a smooth term and a nonsmooth term. We consider a large class of first-order methods designed to solve such composite problems, which rely on specific oracles for each term. We show that the worst-case performance of each of those methods can be computed exactly by solving a semidefinite optimization problem, which also produces an explicit problem instance for which this worst-case is attained.

The performance estimation methodology was born in the original work of Drori and Teboulle in 2014, which introduced a semidefinite relaxation to study the behaviour of first-order optimization algorithms for smooth unconstrained convex optimization. In this talk, we present a framework that produces exact convergence bounds for fixed-step linear first-order methods applied to general composite convex optimization. These methods include classical and accelerated gradient methods (including constrained and proximal variants), conditional gradient and subgradient methods, and also allow inexact gradient computations.

In particular, our approach allows us to derive exact convergence rates for the proximal gradient method in function value, gradient residual norm and distance to the solution, backed by independently-checkable proofs. We also use numerical computations to obtain worst-case rates for several well-known methods including accelerated gradient and the conditional gradient method, improving on the best published rates. We conclude with a quick overview of a MATLAB toolbox implementing our framework, called PESTO (Performance Estimation Toolbox), available from the URL below:

https://github.com/AdrienTaylor/Performance-Estimation-Toolbox

Joint work with Adrien B. Taylor (Université catholique de Louvain, Belgium) and Julien M. Hendrickx (Université catholique de Louvain, Belgium).

July 18, 17:30 ~ 18:00 - Room B1

## Condition-based complexity of a subgradient method

### James Renegar

### Cornell University, U.S. - renegar@cornell.edu

Assuming a strictly-feasible point is known for a general convex optimization problem, we present a subgradient algorithm which computes a feasible point whose objective value is within relative error $ \epsilon $ of the optimal value, where the number of subgradients computed is bounded above by an expression involving a natural condition number depending on $ \epsilon $.

July 18, 18:00 ~ 18:30 - Room B1

## Sparse Interpolation via Super-Resolution and Compressed Sensing

### Jean B Lasserre

### LAAS-CNRS, France - lasserre@laas.fr

We consider the « sparse » interpolation polynomial problem, that is, given an unknown black-box polynomial with very few coefficients, retrieve its coefficients and exponents from a few evaluations only. We show that this problem can be formulated as a « Super-Resolution problem » on the Torus for which exact recovery is guaranteed under some « spacing assumption » on the support. It also implies that exact recovery is guaranteed via « L_1-norm minimization », i.e. solving a compressed sensing LP, even though the RIP property is not satisfied. We also compare this approach (with & without noise) with the well-known Prony method on some numerical examples.

Joint work with Cédric Josz (LAAS-CNRS, Toulouse, France) and Bernard Mourrain (INRIA, Sophia-Antipolis, France).

July 18, 18:30 ~ 19:00 - Room B1

## A new variance reduction method for finite sums with recursive stochastic gradient.

### Katya Scheinberg

### Lehigh University, USA - katyas@lehigh.edu

We propose a StochAstic Recursive grAdient algoritHm (SARAH) for finite-sum minimization problems. Similarly to SVRG, SARAH admits a simple recursive framework for updating stochastic gradient estimates. Linear convergence rate of SARAH is shown under strong convexity assumption, in addition we show a sublinear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not have. We also discuss the convergence rates for the nonconvex case and demonstrate the efficiency of the algorithm via numerical experiments.

Joint work with Lam Ngyuen (Lehigh University, USA), Jie Liu (Lehigh University, USA) and Martin Tak\'{a}\v{c} (Lehigh University, USA).

July 19, 14:30 ~ 15:20 - Room B1

## Accelerating the Universal Newton Methods.

### Yurii Nesterov

### INMA/CORE (UCL), Belgium - yurii.nesterov@uclouvain.be

In this talk we present new results related to the universal second-order methods, which can automatically adjust the level of smoothness of the objective function. Our methods converge in accordance to the best rates allowed by a Holder condition introduced for the Hessians. As compared with the usual Newton method, the reason for acceleration of our schemes consists in accumulation of global information on the behavior of the objective, represented by a linear model. Our methods solve the convex optimization problems in composite form.

Joint work with Geovani Grapiglia, Federal University of Parana, Brazil.

July 19, 15:30 ~ 16:00 - Room B1

## Towards Understanding the Convergence Behavior of Newton-Type Methods for Structured Optimization Problems

### Anthony Man-Cho So

### The Chinese University of Hong Kong, Hong Kong - manchoso@se.cuhk.edu.hk

Recently, there has been a growing interest in applying Newton-type methods to solve structured optimization problems that arise in machine learning and statistics. A major obstacle to the design and analysis of such methods is that many problems of interest are neither strongly convex nor smooth. In this talk, we will present some design techniques for overcoming such obstacle and report some recent progress on analyzing the convergence rates of the resulting Newton-type methods using error bounds. We will also discuss some directions for further study.

Joint work with Man-Chung Yue (The Chinese University of Hong Kong) and Zirui Zhou (Simon Fraser University).

July 19, 16:00 ~ 16:30 - Room B1

## Complexity of nonconvex optimization: the story so far

### Coralia Cartis

### University of Oxford, United Kingdom - cartis@maths.ox.ac.uk

We give an overview of results regarding complexity of nonconvex optimization (without constraints), focusing on two antipodal algorithmic cases, namely, high-order regularisation methods with improved complexity and universal properties, and zero-order regularization methods with probabilistic models. A story of robust methods, with continuously-varying and often-sharp complexity bounds, emerges.

Joint work with Nick Gould (Rutherford Appleton Laboratory, UK), Philippe Toint (University of Namur, Belgium) and Katya Scheinberg (Lehigh University, USA).

July 19, 17:00 ~ 17:30 - Room B1

## On solving the quadratic shortest path problem

### Renata Sotirov

### Tilburg University, The Netherlands - r.sotirov@uvt.nl

The quadratic shortest path problem (QSPP) is the problem of finding a path in a digraph such that the sum of weights of arcs and the sum of interaction costs over all pairs of arcs on the path is minimized. The QSPP is NP-hard. In this talk we consider first linearizable QSPP instances whose optimal solution can be obtained by solving the corresponding instance of the shortest path problem. We also present a polynomial-time algorithm that verifies whether a QSPP instance on the directed grid graph is linearizable. Further, we present a semidefinite programming relaxation for the QSPP that provides strong bounds for non-linearizable problem instances of moderate size.

Joint work with Hao Hu (Tilburg University, The Netherlands).

July 19, 17:30 ~ 18:00 - Room B1

## Singular Value Decompositions and Optimization of Symmetric Functions

### Raphael Hauser

### University of Oxford, United Kingdom - hauser@maths.ox.ac.uk

In this talk we will derive a novel characterization of the leading part SVD of real matrices in terms of optimization models over symmetric functions. The new characterization frees the singular value decomposition from its inherent dependency on the Frobenius distance and the orthogonality of factors, leading to new dimensionality-reduction models and algorithms in data science.

Joint work with Armin Eftekhari (Alan Turing Institute, London, United Kingdom).

July 19, 18:00 ~ 18:30 - Room B1

## A polynomial algorithm for linear feasibility problems and its applications in infinite-dimensional optimization

### Sergei Chubanov

### University of Siegen, Germany - sergei.chubanov@uni-siegen.de

In this talk, we will consider an algorithm for solving systems of linear inequalities given by separation oracles. Without loss of generality, we assume that the system in question is homogeneous, i.e., that it defines a cone. We are looking for a nonzero solution of this system. The algorithm is based on a procedure for finding sufficiently short convex combinations of the rows of the coefficient matrix. Whenever such a combination is found, the algorithm performs a linear transformation of the space which depends on this convex combination. The running time of the algorithm is bounded by a polynomial in the number of variables and in the maximum binary size of an entry of the coefficient matrix, provided that the separation oracle is a polynomial algorithm.

In many situations, like for instance in the case of the minimum-cost flow over time problem (which is NP-hard) we are looking for a function optimizing a linear functional subject to an infinite number of linear constraints, feasible solutions being square-integrable functions. Identifying functions which are equal up to a set of measure zero, we come to linear optimization problems in Hilbert spaces. Similarly to the finite-dimensional case, we can reduce such problems to a sequence of conic feasibility problems. The algorithm mentioned above can be modified to solve such systems with a given accuracy. The algorithm does not use discretization and works in the space of the original problem.

July 19, 18:30 ~ 19:00 - Room B1

## Comparison of Lasserre's measure-based bounds for polynomial optimization to bounds obtained by simulated annealing

### Etienne de Klerk

### Tilburg University, The Netherlands - e.deklerk@uvt.nl

We consider the problem of minimizing a continuous function f over a compact set K. We compare the hierarchy of upper bounds proposed by Lasserre in [SIAM J. Optim. 21(3) (2011), pp. 864?885] to bounds that may be obtained from simulated annealing. We show that, when f is a polynomial and K a convex body, this comparison yields a faster rate of convergence of the Lasserre hierarchy than what was previously known in the literature.

Joint work with Monique Laurent (CWI, Amsterdam, and Tilburg University)..

## Posters

## Gradient Method With Inexact Oracle for Composite Non-Convex Optimization

### Pavel Dvurechensky

### Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany - pavel.dvurechensky@gmail.com

We develop new first-order method for composite non-convex minimization problems with simple constraints and inexact oracle information. The objective function is given as a sum of "`hard"', possibly non-convex part, and "`simple"' convex part. Informally speaking, oracle inexactness means that, for the "`hard"' part, at any point we can approximately calculate the value of the function and construct a quadratic function, which approximately bounds the whole function from above. We give several examples of such inexactness: smooth non-convex functions with inexact H\"older-continuous gradient, functions given by auxiliary uniformly concave maximization problem, which can be solved only approximately. For the introduced class of problems, we propose a gradient-type method, which allows to use different proximal setup to adapt to geometry of the feasible set, adaptively chooses controlled oracle error, allows for inexact proximal mapping. We provide convergence rate for our method in terms of the norm of generalized gradient mapping and show that, in the case of inexact H\"older-continuous gradient, our method is universal with respect to H\"older parameters of the problem. Details can be found in https://arxiv.org/abs/1703.09180.

## Nonlinear Spectral Methods for Nonconvex Optimization with Global Optimality Guarantees

### Antoine Gautier

### Saarland University, Germany - Antoine.Gautier@uni-saarland.de

We present the Nonlinear Spectral Method, introduced in [1,2], which can be used to solve a class of nonconvex optimization problems over the product of nonnegative $\ell^p$ spheres with global optimality guarantees and linear convergence rate. Advantages of this method are that the global optimality assumptions as well as the convergence rate can be easily checked before running the method by computing the spectral radius of a small nonnegative matrix which depends on the objective function and the norm contraints. The method is inspired by the theory of (sub)-homogeneous nonlinear eigenproblems on convex cones which has its origin in the Perron-Frobenius theory for nonnegative matrices. In fact this work is motivated by the closely related Perron-Frobenius theory for multi-homogeneous problems developed in [3].

Following [1], we show that this method can be applied for the training of a class of generalized polynomial neural networks on real-world datasets. This is the first practically feasible method for training a class of nonlinear neural networks with global convergence guarantees. We also discuss how the method can be used to compute projective norms of nonnegative tensors.

[1] A. Gautier, Q. Nguyen and M. Hein, Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods, NIPS, 2016.

[2] Q. Nguyen, A. Gautier and M. Hein, Nonlinear Spectral Methods for Nonconvex Optimization with Global Optimality, Workshop on Optimization for Machine Learning, 2016.

[3] A. Gautier, F. Tudisco and M. Hein, The Perron-Frobenius Theorem for Multi-homogeneous Maps, arXiv:1702.03230, 2017.

Joint work with Q. Nguyen (Saarland University, Germany) and M. Hein (Saarland University, Germany).

## DC Decomposition of Nonconvex Polynomials with Algebraic Techniques

### Georgina Hall

### Princeton University, USA - gh4@princeton.edu

We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite programming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave procedure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard.

Joint work with Amir Ali Ahmadi (Princeton University, USA).

## On the Conjecture by Demyanov-Ryabova in Converting Finite Exhausters

### Tian Sang

### RMIT University, Australia - s3556268@student.rmit.edu.au

The Demyanov-Ryabova conjecture is a geometric problem originating from duality relations between nonconvex objects. Given a finite collection of polytopes, one obtains its dual collection as convex hulls of the maximal faces of sets in the original collection, for each direction in the space (thus constructing upper convex representations of positively homogeneous functions from lower ones and vice versa, via Minkowski duality). It is conjectured that an iterative application of this conversion procedure to finite families of polytopes results in a cycle of length at most two.

We prove a special case of the conjecture assuming an affine independence condition on the vertices of polytopes in the collection. We also obtain a purely combinatorial reformulation of the conjecture.