1 Convexity I (30 Points)
Prove/disprove the following properties for convex function(s):
1. (5 points). The sum of two convex functions is also convex.
2. (15 points). Let f (x) = h(g1(x), g2(x), .., gn(x))). Then for each of the following cases, prove that f is convex:
• (5 points). ∀i ∈ {1, .., n} gi is a convex function and h is increasing in its i-th component.
• (5 points). ∀i ∈ {1, .., n} gi are affine functions.
• (5 points). ∀i ∈ {1, .., n} gi is a concave function and h is decreasing in its i-th component.
3. (10 points). The maximum of a convex function f over the polyhedron P is achieved at one of its vertices.
2 Convexity II (40 Points)
Let the primal optimization problem be: minx∈R2 f (x) s.t. g(x) ≤ 0, then let L(x, λ) = maxλ∈R q(λ) be the Lagrangian dual problem where q(λ) = minx(f (x) + λg(x)).
Consider f (x) = x2 + x2 and g(x) = 1 − x1 − x2.
1. (10 points). Plot the feasible region for the primal problem in x1 and x2 space. Let (x∗1 , x∗2) be the point in (x1, x2) space which minimizes f (x) s.t. g(x) ≤ 0. Plot (x∗1 , x2∗) on the plot which also contains the feasible region.
2. (10 points). Plot sets R = {(y, z)|y = g(ω), z = f (ω) for ω ∈ R2} and F (set of feasible solution space for primal problem) in (y, z) space where y = g(ω) and z = f (ω).
3. (10 points). Let y∗, z∗ and λ∗ be the optimal values of y = g(ω), z = f (ω) and λ respectively which optimizes L(ω, λ). Plot the optimal solution (y∗, z∗) on a plot. Also draw y + λ∗z = q(λ∗) in y = g(ω) and z = f (ω) space.
4. (10 points). Show that q(λ) is a concave function on Sq where Sq = {λ|q(λ) ∈ R}.
3 Support Vector Machine (40 Points)
For linearly separable data, support vector machine tries to find two parallel hyperplanes parameterized by (w, b) such that the margin between two classes is maximized. If the class indicator yi = 1, then wT xi + b ≥ 1 and if the class indicator yi = −1, then we have wT xi + b ≤ −1. Thus, a perfectly learned separator satisfies yi(wT xi + b) ≥ 1. However, if the data is not fully separable then we allow for error margins si > 0. We update the primal problem for SVM as follows:

1. (10 points). Show that the Lagrangian Dual Problem of the SVM primal problem can be given by a quadratic program with Lagrange multipliers αi ∀i ∈ {1, …, n}.
2. (10 points). Show that for the primal and dual optimal solutions, the Lagrange multiplier αi is non-zeros for samples where yi(W T xi + b) ≤ 1.
Now consider the diabetic retinopathy dateset (provided in the assignment zip file) which contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. All features represent either a detected lesion, a descriptive feature of a anatomical part or an image-level descriptor. We will use Python3.x for this part of the problem.
3. Read the data from the CSV into a numpy array (use appropriate datatype). Split the dataset into a training set and a testing set with ratio |training| : |testing| = 3 : 2.
4.(5 points). Use scikit-learn’sSVM model with “linear” kernel, C = 0 and default values for other parameters to fit the training set for predicting the “class label” given in the dataset. Report training accuracy, testing accuracy and the learned weights w∗.
5. (5 points). Plot C vs the sum of training errors values i si for each C ∈ [0, 1000]. What do you observe? Comment on your observations.
6. (5 points). Now, plot C vs the testing accuracy. Which value of C would have been optimal for this dataset? Comment on your findings.
7. (5 points). Let’s choose the value of C to be the optimal one you found in the previous part. Let’s change the kernel to “poly” and play with the degree of the polynomial. For what value of the degree of the polynomial kernel do you find achieves the best testing accuracy? Comment on your findings. (Plot degree vs testing accuracy, and include degree vs training accuracy on the same plot.)
4 Kernels (30 Points)
