Cloud Computing (COMM034)
Coursework description, 2018-19
To demonstrate an understanding of how to construct a Cloud application using multiple services across two Cloud providers, involving specifiable scaling.
You will propose, implement, test, evaluate, and demonstrate, an application that estimates the value of Pi (π) to:
i) a given number of tests;
ii) a given decimal place of accuracy.
using a so-called Monte Carlo method. See Approach, and – in particular – part iv regarding the estimates.
Your application will need to adopt the Approach within the set of provided Requirements, and you will make Submissions as outlined.
i. The approach involves generating random numbers and using an inequality to generate a ratio. The ratio allows for the estimation of Pi (π). It proceeds as follows:
a. Assume a circular dartboard of radius r, and a square backboard on which it sits. If the dartboard fits perfectly inside the backboard, we have a square of width and height 2r. The area of the circle is πr2, and the area of the square is 4r2.
b. Now assume a dart player with little control over where the dart lands – all darts land within the square, but not all land within the circle. After an incredibly large number of shots, the ratio between those landing inside the circle to all of those thrown should become:
Area of circle / Area of square = πr2/ 4r2 = π/ 4
and so the ratio multiplied by 4 should approximate Pi.
ii. Consider the circle as a unit circle of radius 1, with centre at the origin (0, 0) in the Cartesian coordinate
system. If we pick random numbers for x and y, with both between -1 and +1, we will land in the square and perhaps inside the circle. We can test the latter, if √(x2 + y2) < 1 , as (x2 + y2) < 1.
iii. The code, then, is relatively straightforward – generate random values, test if they are inside the circle, and multiply the resulting value by 4.
iv. A single run of this code will produce one estimate. Multiple runs, serially, will produce multiple estimates. There are three things that we could do from here:
a. Determine an estimate of Pi by averaging from a number of (parallel) runs
b. Determine an estimate of Pi across multiple parallel runs of the code (i.e. not calculating Pi for each, but determining its value using information about shots and incircle across all runs)
c. Determine an estimate of Pi accurate to a given decimal place through multiple iterations of b, above by comparison to a given value of Pi (in Python, from math.pi), repeating parallel runs until a ‘good’ value is obtained.
This coursework requires b and c to be undertaken, but NOT a.
Figure 1: An example chart – estimated value vs shots (values are taken at increments of 10,000 shots here), in contrast to Pi. Note that being within a given number of decimal places may happen reasonably early but might not occur again for a while.
i. You must use: (i) Google App Engine, (ii) AWS Lambda, and (iii) one of the other scalable services in AWS: Elastic Compute Cloud (EC2), Elastic MapReduce (EMR) or – should you wish to explore – EC2 Container Service (ECS).
Subsequent mentions of scalable services in this document mean Lambda plus your choice of (EC2 or EMR or ECS).
ii. Your system must offer a persistent front-end through which you can present information about (accuracy of) estimates of the value of Pi to end users.
iii. The system must present a Chart, either using image-charts.com or the newer Google Charts
(https://developers.google.com/chart/) [since the nice, easy to use Google ‘Image’ Charts will be killed off mid-March] that shows the value settling to Pi over the number of data points (shots) with value on the y-axis and shots on the x-axis.
Estimates for values of Pi must also be presented in a table on the same page as the chart.
iv. The scalable services, and not Google App Engine, must run the Pi estimation code, with results made available for presentation by the persistent front end.
v. The scalable services must be used dynamically – i.e. any resources used in calculating the Monte Carlo values should be switched on and off (automatically, via code) for the purpose and shall not be left on continuously.
vi. It must be possible to specify values for the following parameters through the persistent front-end:
a. to allow for a choice between estimating to a given number of data points (shots) or a given decimal place of accuracy
b. a value of S, as the number of data points (shots) to use for estimating Pi / a value of D for the decimal place of accuracy, based on selection between shots and accuracy (above)
c. a value of R, as the number of resources (in the scalable services), to be used for estimating Pi, so that each run uses approximately S/R shots (if D is not met, additional runs should also happen at this rate).
d. a value of Q as the reporting rate: if 1,000,000 shots (S) were requested across 10 resources (R), a reporting rate of 10 would mean each run returns 10 values, one after each 10,000 shots. The graph, as shown, would then display 100 values, each representing a further 10,000.
e. the scalable service to use for Monte Carlo – e.g. a selection between Lambda and EC2; there is no requirement to use both scalable services at the same time.
vii. Except for specifying values, the user must be able to run an analysis with a single click.
viii. Data about each run must be retained. You will need to consider where such data should be stored.
Your system may incorporate additional Cloud components, for example for storage. However, the mantra of Keep It Stupid-Simple should be followed and additional components should not be added unnecessarily.