**1 Convexity I (30 Points)**

Prove/disprove the following properties for convex function(s):

**1. ****(5 points). **The sum of two convex functions is also convex.

**2. ****(15 points). **Let

*(*

*f**) =*

*x**(*

*h*

*g*_{1}(

*)*

*x*

*, g*_{2}(

*)*

*x**(*

*, .., g*_{n}*))). Then for each of the following cases, prove that*

*x**is convex:*

*f*** • ****(5 points). *** ∀i ∈ {*1

*is a convex function and*

*, .., n} g*_{i}*is increasing in its*

*h**-th component.*

*i*** • ****(5 points). *** ∀i ∈ {*1

*are affine functions.*

*, .., n} g*_{i}** • ****(5 points). *** ∀i ∈ {*1

*is a concave function and*

*, .., n} g*_{i}*is decreasing in*

*h**its*

*-th component.*

*i***3. ****(10 points). **The maximum of a convex function

*over the polyhedron*

*f**is achieved at one of its vertices.*

*P***2 Convexity II (40 Points)**

Let the primal optimization problem be: min_{x∈}_{R}2 * f *(

*) s.t.*

*x**(*

*g**)*

*x**0, then let*

*≤**(*

*L**) = max*

*x, λ*

_{λ∈}_{R}

*(*

*q**) be the Lagrangian dual problem where*

*λ**(*

*q**) = min*

*λ**(*

_{x}*(*

*f**) +*

*x**(*

*λg**)).*

*x* Consider * f *(

*) =*

*x*

*x*^{2}+

*x*^{2}and

*(*

*g**) = 1*

*x*

*− x*_{1}

*− x*_{2}.

**1. ****(10 points). **Plot the feasible region for the primal problem in

*x*_{1}and

*x*_{2}space. Let (

*1*

*x*^{∗}*2) be the point in (*

*, x*^{∗}

*x*_{1}

*, x*_{2}) space which minimizes

*(*

*f**) s.t.*

*x**(*

*g**)*

*x**0. Plot (*

*≤**1*

*x*^{∗}*2*

*, x**) on the plot which also contains the feasible region.*

*∗***2. ****(10 points). **Plot sets

*=*

*R**(*

*{**)*

*y, z**=*

*|y**(*

*g**)*

*ω**=*

*, z**(*

*f**) for*

*ω**R2*

*ω ∈**and*

*}**(set of feasible solution space for primal problem) in (*

*F**) space where*

*y, z**=*

*y**(*

*g**) and*

*ω**=*

*z**(*

*f**).*

*ω***3. ****(10 points). **Let

*and*

*y*^{∗}*, z*^{∗}*be the optimal values of*

*λ*^{∗}*=*

*y**(*

*g**),*

*ω**=*

*z**(*

*f**) and*

*ω**respectively which optimizes*

*λ**(*

*L**). Plot the optimal solution (*

*ω, λ**) on a plot. Also draw*

*y*^{∗}*, z*^{∗}*+*

*y**=*

*λ*^{∗}*z**(*

*q**) in*

*λ*^{∗}*=*

*y**(*

*g**) and*

*ω**=*

*z**(*

*f**) space.*

*ω***4. ****(10 points). **Show that

*(*

*q**) is a concave function on*

*λ**where*

*S*_{q}*=*

*S*_{q}*(*

*{λ|q**)*

*λ**R*

*∈**.*

*}***3 Support Vector Machine (40 Points)**

For linearly separable data, support vector machine tries to find two parallel hyperplanes parameterized by (* w, b*) such that the margin between two classes is maximized. If the class indicator

*= 1, then*

*y*_{i}*+*

*w*^{T}*x*_{i}*1 and if the class indicator*

*b ≥**=*

*y*_{i}*1, then we have*

*−**+*

*w*^{T}*x*_{i}*1. Thus, a perfectly learned separator satisfies*

*b ≤ −**(*

*y*_{i}*+*

*w*^{T}*x*_{i}*)*

*b**1. However, if the data is not fully separable then we allow for error margins*

*≥**0. We update the primal problem for SVM as follows:*

*s*_{i}*>***1. ****(10 points). **Show that the Lagrangian Dual Problem of the SVM primal problem can be given by a quadratic program with Lagrange multipliers

*1*

*α*_{i}*∀i ∈ {**.*

*, …, n}***2. ****(10 points). **Show that for the primal and dual optimal solutions, the Lagrange multiplier

*is non-zeros for samples where*

*α*_{i}*(*

*y*_{i}*+*

*W*^{T}*x*_{i}*)*

*b**1.*

*≤*Now consider the** diabetic retinopathy dateset **(provided in the assignment zip file) which contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. All features represent either a detected lesion, a descriptive feature of a anatomical part or an image-level descriptor. We will use **Python3.x **for this part of the problem.

3. Read the data from the CSV into a numpy array (use appropriate datatype). Split the dataset into a training set and a testing set with ratio * |training| *:

*= 3 : 2.*

*|testing|*4.(5 points). Use **scikit-learn’sSVM **model with “linear” kernel, C = 0 and default values for other parameters to fit the training set for predicting the “class label” given in the dataset. Report training accuracy, testing accuracy and the learned weights w∗.

**5. ****(5 points). **Plot

*vs the sum of training errors values*

*C**for each*

*i s*_{i}*[0*

*C ∈**1000]. What do you observe? Comment on your observations.*

*,***6. ****(5 points). **Now, plot

*vs the testing accuracy. Which value of*

*C**would have been optimal for this dataset? Comment on your findings.*

*C***7. ****(5 points). **Let’s choose the value of

*to be the optimal one you found in the previous part. Let’s change the kernel to “poly” and play with the*

*C**of the polynomial. For what value of the*

*degree**of the polynomial kernel do you find achieves the best testing accuracy? Comment on your findings. (Plot*

*degree**vs testing accuracy, and include*

*degree**vs training accuracy on the same plot.)*

*degree***4 Kernels (30 Points)**