Cloud Computing (COMM034)
Coursework description, 2018-19
To demonstrate an understanding of how to construct a Cloud application using multiple services across two Cloud providers, involving specifiable scaling.
You will propose, implement, test, evaluate, and demonstrate, an application that estimates the value of Pi (π) to:
i) a given number of tests;
ii) a given decimal place of accuracy.
using a so-called Monte Carlo method. See Approach, and – in particular – part iv regarding the estimates.
Your application will need to adopt the Approach within the set of provided Requirements, and you will make Submissions as outlined.
i. The approach involves generating random numbers and using an inequality to generate a ratio. The ratio allows for the estimation of Pi (π). It proceeds as follows:
iii. The code, then, is relatively straightforward – generate random values, test if they are inside the circle, and multiply the resulting value by 4.
iv. A single run of this code will produce one estimate. Multiple runs, serially, will produce multiple estimates. There are three things that we could do from here:
a. Determine an estimate of Pi by averaging from a number of (parallel) runs
b. Determine an estimate of Pi across multiple parallel runs of the code (i.e. not calculating Pi for each, but determining its value using information about shots and incircle across all runs)
c. Determine an estimate of Pi accurate to a given decimal place through multiple iterations of b, above by comparison to a given value of Pi (in Python, from math.pi), repeating parallel runs until a ‘good’ value is obtained.
This coursework requires b and c to be undertaken, but NOT a.
Figure 1: An example chart – estimated value vs shots (values are taken at increments of 10,000 shots here), in contrast to Pi. Note that being within a given number of decimal places may happen reasonably early but might not occur again for a while.
i. You must use: (i) Google App Engine, (ii) AWS Lambda, and (iii) one of the other scalable services in AWS: Elastic Compute Cloud (EC2), Elastic MapReduce (EMR) or – should you wish to explore – EC2 Container Service (ECS).
Subsequent mentions of scalable services in this document mean Lambda plus your choice of (EC2 or EMR or ECS).
ii. Your system must offer a persistent front-end through which you can present information about (accuracy of) estimates of the value of Pi to end users.
iii.The system must present a Chart, either using image-charts.com or the newer Google Charts (https://developers.google.com/chart/) [since the nice, easy to use Google ‘Image’ Charts will be killed off mid-March] that shows the value settling to Pi over the number of data points (shots) with value on the y-axis and shots on the x-axis.
Estimates for values of Pi must also be presented in a table on the same page as the chart.
iv. The scalable services, and not Google App Engine, must run the Pi estimation code, with results made available for presentation by the persistent front end.
v. The scalable services must be used dynamically – i.e. any resources used in calculating the Monte Carlo values should be switched on and off (automatically, via code) for the purpose and shall not be left on continuously.
vi. It must be possible to specify values for the following parameters through the persistent front-end:
a. to allow for a choice between estimating to a given number of data points (shots) or a given decimal place of accuracy
b. a value of S, as the number of data points (shots) to use for estimating Pi / a value of D for the decimal place of accuracy, based on selection between shots and accuracy (above)
c. a value of R, as the number of resources (in the scalable services), to be used for estimating Pi, so that each run uses approximately S/R shots (if D is not met, additional runs should also happen at this rate).
d. a value of Q as the reporting rate: if 1,000,000 shots (S) were requested across 10 resources (R), a reporting rate of 10 would mean each run returns 10 values, one after each 10,000 shots. The graph, as shown, would then display 100 values, each representing a further 10,000.
e. the scalable service to use for Monte Carlo – e.g. a selection between Lambda and EC2; there is no requirement to use both scalable services at the same time.
vii. Except for specifying values, the user must be able to run an analysis with a single click.
viii. Data about each run must be retained. You will need to consider where such data should be stored.
Your system may incorporate additional Cloud components, for example for storage. However, the mantra of Keep It Stupid-Simple should be followed and additional components should not be added unnecessarily.
Submissions for part 1 and part 2 must use the A4 version of the IEEE Manuscript Templates for Conference Proceedings available at: http://www.ieee.org/conferences_events/conferences/publishing/templates.html. A link to the MS Word (2003) version for this is: http://www.ieee.org/publications_standards/publications/conferences/2014_04_msw_a4_format.doc
Two columns should be used throughout, with only large figures and tables allowed to span both columns.
Do not modify the template, e.g. by changing margins. This will impact on structure/quality, which is assessed for both written submissions.
1. Submit a 2 page paper, using the IEEE template, that discusses how the system will work. This needs to:
a. Provide an abstract and introduction which relates your system to NIST SP 800-145 from the perspective of i) a developer and ii) a user. [20 marks]
b. Present (i) a high-level view of the major components of the system – perhaps using the AWS icons with suitable additions, and (ii) discuss what data/information needs to be communicated between the major components, in order for the system to work and meet the requirements. [30 marks]
c. Demonstrate an understanding of how the system will work by presenting relevant results from running the code and from making changes that will be needed for it to be usable as anticipated (a few lines of modified code may be included for this purpose). [30 marks]
d. Submission structure and quality of writing. [20 marks]
2. Submit a 4 page paper, using the IEEE template, that presents the system. This needs to:
a. Improve your abstract and introduction from Part 1 – evaluated with respect to (g), below.
b. Present (i) the final high-level view of the major components of the system – perhaps using the AWS icons with suitable additions, and (ii) discuss what data/information needs to be communicated between the major components, in order for the system to work and meet the requirements. Explain, also, the choice you made in the additional scalable service to use and any other services used (contrast to others available). [20 marks]
c. Discuss (i) your implementation (i.e. what you needed to write code for, what you used from elsewhere) and how you have approached scaling to be able to parallelise the approach, and (ii) what tests you ran against your code. [20 marks]
d. Present and discuss a snapshot of results that the system produces, and explain what happens with higher numbers of shots. [20 marks]
e. Identify requirements not met, and what you could have done in order to meet them. [20 marks]
f. Compare the costs of running each of the scalable services for large numbers of shots, and explain which is the more cost-effective. [10 marks]
g. Submission structure, including the abstract and introduction, and quality of writing. [10 marks]
3. Submit a short presentation and source code of your system for a viva in which you will:
a. Give a short presentation (maximum of 5 minutes) using slides highlighting what you managed to complete in respect to the requirements (as achievements) and where improvements should have been made. [40 marks]
b. Give a short demonstration (maximum of 5 minutes) of your system – this should cover how you have used each Cloud service, and how and where you analyse the data. [30 marks]
c. Demonstrate the user’s view of the overall system, and respond to any questions about the system and your code. Ease of running and using the system – for a theoretical end user – will be key factors. [30 marks]
Weighting and Composition
Coursework represents 100% of the assessment for this module, and so should be considered as requiring significant effort in order to complete.
The coursework has three parts, weighted at 30%:40%:30%. The three parts are:
1. A written submission that proposes a structure, a means of operation, and exemplifies the work to be done in the form of a 2 page (A4) IEEE conference-style paper. This accounts for 30%.
2. A written submission in the form of a 4 page (A4) IEEE conference-style paper that documents a final system, offering a design, implementation, testing, evaluation and reflection. This accounts for 40%.
3. A viva, requiring submitted code and a brief presentation and demonstration of your system. This accounts for 30%.
Part 1 (2 pages): 4pm, Tuesday 26 March (week 8) – submission on SurreyLearn.
Part 2 (4 pages): 4pm, Monday 20 May (week 12), with a submission of your code also – submission on SurreyLearn.
Part 3 (Viva): To take place in week 14/15 – date/time will be scheduled once the examinations timetable is known. Presentation slides to be submitted on SurreyLearn by a date to be confirmed – 6 or 7 June are candidate dates for the Viva.
Marks and feedback
Part 1: Friday 26 April.
Part 2, Part 3: Friday 28 June.
You will be informed of any variations to the above dates should variation become necessary or unavoidable. Note that the standard duration for return of marks and feedback is 3 semester weeks.
Marking criteria – part 1
Marking criteria for subsequent parts will be provided after part 1 is complete.