Lecture-by-lecture list of topics
EECS 598-006 W20 "Optimization methods for ..."

1 1/09
	Ch. 0 Course policies

	Ch. 1 Applications (1)
		1.0 introduction (read)
		1.1 linear programming
			compressed sensing with l1 norm
			minmax sparse filter design (read)
			MRI RF pulse design (read)
		convex relaxation
			sublevel sets, quasiconvex convex functions
			convex envelope
		1.2 quadratic problems
			LS, regularized LS, finite differences
			constrained LS (read)
				analytical solution for diagonal case

		1.3 strictly convex smooth problems
			edge-preserving regularization
			m-estimation for robust regression

2 1/14
				robust regression example

		1.4 convex composite problems
			l1 regularization
			LASSO - feature selection
			sparse approximation
			signal models: synthesis and analysis
				wave+spike example
			denoising using sparse synthesis
				unitary case
			compressed sensing - synthesis regularizer
			LASSO via GP using x = u-v
			elastic net regularizer (Read)
			analysis sparsity / total variation (TV)
			1D TV as a LASSO problem (Read)
			corner rounding (Read)
			proximal operators - complex case (Read)

		1.5 non-smooth problems
			robust regression using l1
			binary classification (Read)
			union of subspaces - unsupervised

3 1/16

	Ch. 2 Applications (2)
		2.1 signal processing applications
			patch-based regularization
				synthesis form
				analysis form
				aggregate (global) vs local sparsity
			dictionary learning
			transform learning
				regularized version (Read)
			filter learning
			blind deconvolution
			phase retrieval (Read)


4 1/21

		2.2 machine-learning applications
			low-rank approximation / matrix factorization
			low-rank matrix completion
			matrix sensing / recovery

	Ch3: Gradient-based optimization
		3.1 Lipschitz continuity
			properties

5 1/23

			bounds / majorization
			relate to Hessian
			edge-preserving regularizer: Lipschitz constant
		3.2 gradient descent (GD)
			step size
			convergence rates

6 1/28 [claire lin, due to sedona workshop]

		3.3 preconditioned steepest descent (PSD)
			preconditioning
			descent direction
				complex case for LS (Read)
		3.4 descent direction for edge-preserving regularizer: complex case
			descent direction proof (Read)
			Lipschitz constant conjecture (Read)
			practical Lipschitz constant for edge-preserving regularizer (Cover!)
			PGD step size
			SD vs CG and inner products

		3.5 General inverse problems cost function
			efficient line search (Read?)

7 1/30 [steven whitaker, due to ipam workshop]

		3.6 convergence rates of PGD, PSD, PCG
			heavy ball method
				heavy ball convergence analysis (Read)
			S-Lipschitz continuity
			PGD convergence theorem for S-Lipschitz convex functions
			Lipschitz constant units
			Nesterov's FGM with (preconditioned) gradient step

		3.7 First-order methods: general and fixed step
			FGM is FO, FGM rates

8 2/04

			OGM, OGM', bounds and optimality
			OGM worst-case functions, bound tightness

		3.8 Logistic classifier design
			cost function is like an inverse problem
			Lipschitz constant via properties
			example of GD / FGM / OGM
			Adaptive restart of FGM / OGM
				Strongly convex functions

reading: 11.8 (CG) 11.9 (QN) 11.10 (BCD) by 2/13 (should have started earlier)

9 2/06

			nonlinearity in machine learning

		3.10 summary
			history of first-order methods
			preview of GFOM / OGM line search

		3.9 1D finite differences: demo/diff1, @view etc.

10 2/11

			2D finite differences: in-class task
				adjoint tests for LinearMaps
				julia call-by-reference ?

11 2/13

	Ch. 4: Majorize-minimize (MM) methods
		4.0 Intro / Application examples
		4.1 Majorization principle / sandwich inequality
			Algebraic properties
			Quadratic majorizer when gradient is S-Lipschitz smooth
			connection to PGD
		4.2 Applications
			low-rank matrix completion via MM
			LASSO / sparse regression / compressed sensing (l2+l1)

12 2/18

			ISTA / PGM
			diagonal majorizer
		Convexity majorizers (separability)
			general case
			LASSO example
			exponential loss example (read)

reading (optional - not in W20 since ch4 now more complete):
	14.1 intro, skip 14.1.4
	14.5 surrogate design
	14.6.6 monotone line searches

13 2/20

		Poisson data (MLEM)
		Line search with Huber's majorizer
			1D MM approach

14 2/25
			rationale in terms of 1D mean/median/robust fair
			Huber's conditions and uniqueness (read)
	[Exam review based on p11,14,24,33,34 of w20exam1 student problems]

** 2/26 Exam1 Wed. 6-8 PM

15 2/27
		Huber hinge loss
			Lipschitz constant
			optimal quadratic majorizer for s >= -1
		
		4.3 Acceleration methods (over-relaxation)
			LASSO example

* 3/3 3/5 break

16 3/10

		4.4 Summary and LASSO recap

	Ch. 5: Proximal methods
		5.1 proximal operator definition
			example: SVHD
			example: hard thresholding, pocs

17 3/12 - Canceled due to corona virus

18 3/17

		proximal point algorithm, cf MM
		5.2 proximal gradient method (PGM)
			cf MM
			PGM convergence rate O(1/k)
				linear rate (read)
				strongly convex f
			PGM with line search (read)

19 3/19

		PGM revisit

		5.3 accelerated proximal methods
			FPGM (fast proximal gradient method) = FISTA
			POGM (proximal optimal gradient method)
			inexact compute of proximal operators
			related methods

		5.4 applications
			binary classifier with l1 regularizer

20 3/24

			MRI compressed sensing with ODWT

	Ch. 6: Alternating minimization methods (BCD, BCM)
	6.1 SP applications
		CS using synthesis sparsity:
			two-block BCM
			two-block BCD
		CS using analysis sparsity: two-block BCD/BCM

21 3/26

		Sparse coding using multi-block BCM (aka CD)
			relate to MP/OMP (brief in w19, not in w10)
			CD code
		CD update of x for CS with synthesis sparsity
			(group task in w19, not in w20)
		Sparse coding for tight frame via 2-block BCM
			(group notebook in w19, not in w20)
		Wavelet transform overview (in w19, not in w20)
		Patch-based regularization using sparsifying transform

22 3/31

		Sparsifying transform learning via two-block BCM
			square case via orth. procrustes
			non-square case via MM (Read in w20)
			non-square case by weighted 0-norm (Read in w20)
			example of 1D filter learning
			memory efficient implementation (Read in w20)

		Dictionary learning
			two-block update of D,Z (brief)
			multi-block update of d_k,c_k ala SOUP
			in-class task using SOUP for wave+spike (not w20)
			joint update of dk and ck (Read)


23 4/02
		6.2 ML applications
			Low-rank matrix approximation for large problems via BCM
			Non-negative matrix factorization & sparsity
			fused LASSO (brief/read)
			alt min for 0-norm in biconvex form (brief)

		6.3 convergence properties

		6.4 more about BCM (didn't do in W20)
			relating to GD
			VARPRO
			1D TV failure to converge

24 4/07

	Ch. 7: Duality / Lagrangian / ADMM
		7.0 variable splitting overview
		7.1 convex conjugate
			properties & examples
		7.2 Method of Lagrange multipliers
			Lagrange dual
			Properties and relation to convex conjugate
			weak and strong duality
			Using dual problem to solve the primal problem
		7.3 Augmented Lagrangian method
			ADMM


25 4/09

			binary classifier with hinge via ADMM
			ADMM in general / convergence
			linearized AL method (LALM)
			augmented ADMM
			primal-dual hybrid gradient (PDHG) / Chambolle-Pock
			"near circulant splitting method"

26 4/14

	Ch. 8 Stochastic gradient / subgradient methods
	8.1 Subgradient method
		Subdifferential / subgradient and properties
		Convergence for diminishing step sizes
		Normalized subgradient method
		Convergence for constant step size factors
		Projected subgradient method

		Naveen Murthy: SGM
	8.2 Example: hinge loss with 1-norm regularizer

27 4/16

	8.3 Incremental (sub)gradient method
	8.4 Stochastic gradient method
		minibatches
		convergence analysis
		variance reduction: SVRG, SAGA, ...
		momentum
		adaptive step sizes
		restart
		Example: ordinary linear LS
	8.5 Example: X-ray CT reconstruction

28 4/21

	Ch. 9: Misc. topics
	Review based on binary classifier design
		partly inspired by student exam2 questions

---- below here from W19 -------

	PGM as alternating between GD and denoising
	Patch transform sparsity as related to variational CNN methods
	Overview of deep learning for medical imaging