Quick Overview

You will implement a simple generic Differential Evolution search

You will apply this generic algorithm to two problems

You are provided with scaffolding code that you will need to complete

You will also perform experiments and report on your results

Introduction

In machine learning, a hyperparameter is a parameter whose value is set before the learning process

begins. By contrast, the values of other parameters like the weights of a neural network are derived

via training. Examples of hyperparameters for neural networks include the number of layers, the

number of neurons in each layer and the learning rate.

Different training algorithms require different hyperparameters. Some simple algorithms (such as

ordinary least squares regression) require none. The optimal values for the hyperparameters depend

also on the dataset. In this assignment, you will implement a search algorithm to find good values

for a small set of hyperparameters of a neural network.

Differential Evolution (DE) is an efficient and high performing optimizer for real-valued functions.

DE is the search technique that you are required to use in this assignment to solve the optimization

problems. As DE is based on evolutionary computing, it performs well on multi-modal,

discontinuous optimization landscapes. Another appealing feature of DE is that it is still applicable

even if the function we try to optimize is not differentiable. This is often the case when some

simulation is involved in the computation of the real-valued function being optimized. In these

scenarios, a gradient descent approach is not applicable. DE performs a parallel search over a

population of fixed size, where each population member is a higher dimensional vector.

In this assignment, the core problem is to adapt DE to perform a search for four hyperparameters of

a neural network. In the next section, we describe in more details DE.

Differential Evolution

Let f(w) be the cost function which must be minimized. Note that maximization can be performed

by considering the function -f . The function f(w) takes a candidate solution as argument in the form

of a vector w of real numbers and produces a real number as output which indicates the cost of the?
There exist many variations of the DE algorithm. The variation that you have to implement has

been chosen for its simplicity. In the rest of this section, variable names written in?blue italic

correspond to Python program variables used in the provided scaffolding code.

We constrain the search by specifying bounds for each component of?w?with a list of pairs called

bounds?in the scaffolding code. That is, we impose

bounds[k][0] <=?w[k] <=?bounds[k][1]

for k in range(len(w))

For example, if?bounds?is [(1,100),(1,100),(-6,2),(-6,1)] , then w[1] is constrained to the interval

[1, 100] and w[2] is constrained to the interval [-6, 2].

To make the code generic, we normalize each component of?w?to the interval [0, 1] by mapping

with an affine function the interval [?bounds[k][0],?bounds[k][1] ] to the interval [0, 1].

We first initialize all individuals?w?of the initial population with random points from the search-

space.

Until a termination criterion is met (maximum number of iterations/generations done, or acceptable

cost is reached), we repeat the following loop:

For each individual?w?in the population do:

Pick three distinct individuals?a,?b?and?c?from the current population at random. The

individuals?a,?b?and?c?must be distinct from?w?as well.

Create a?mutant?vector?a?+?mut?* (b?–?c) where?mut?is a constant (passed as an argument to

the?differential_evolution?function)

Clip the?mutant?entries to the interval [0, 1]

Create a?trial?vector by assigning to?trial[k] with probability?crossp?the value?mutant[k]

and probability 1-crossp?the value of?w[k]

Evaluate the cost of the?trial?vector. If the?trial?vector is better than?w, then replace?w?by the

trial?vector

In other words, for each individual?w?in the current population, we create a challenger?trial?vector.

If this challenger is better than?w, then?w?is replaced by the?trial?vector .

The mutation factor?mut?with and the crossover constant?crossp?are numerically defined in the

scaffolding code.

Scaffolding Code

The file “my_submission.py” contains scaffolding code. The key function is?differential_evolution.

This function return a generator. You should have seen generators in the Python

tutorial

.

To validate your implementation of the function?differential_evolution, you should first test and

debug it with the function?task_1?as it tries to solve a simpler problem than?task_2.

In Task 1, you fit a polynomial to a noisy dataset. This is the same problem as you have seen in the

early pracs where you solve this problem with the pseudo-inverse and a gradient descent approach.