Distributed Computing in a Failure Prone EnvironmentTeam: 1 School: Monte Del Sol Area of Science: Computer Science
Interim:
Team Number: 1
School Name(s):
The Academy for Technology and the Classics Charter School and Monte del Sol Charter School
Area of Science: Computer Science
Project Title: Distributed Computing in a Failure Prone Environment
Problem Definition:
Distributed Computing is the practice of performing a computational task across multiple computers, assembled into a network.[1]
Although normally reliable and stable, distributed computing can fall prey to many dangers. [2] The heterogeneous nature of such projects can lead to many different client versions and configurations, and distributed computing protocols must be flexible to account for this instability.[2]
Our project is to create new protocols to regulate the communication between the computing resources, and to implement algorithms determining network management; meaning, for example, dropping unresponsive resources based on the results of tests on the resource in question: its history of reliability in reporting data for the distributed computing problem, its capabilities-processor speed, RAM, etc.-and other factors.
Problem Solution:
We will assemble two equally reliable distributed computing networks with identical computing resources. One of them will have minimal failure recovery and prevention optimizations, the other will use our protocols for operating in failure prone environments.
To accomplish this, we will use probabilistic methodology to determine the authenticity of any given result, and our courses of action.
Our protocol speaks to two concerns: network management, and peer verification of data. The former we will address with an approach involving selecting the most resources, and at the same time the most reliable resources, in order to most efficiently solve a distributed computing problem, while taking into account such factors as network timeout, job progression, etc.
The latter involves trust and authentication: trust meaning how often a computing resource returns correct results, and authentication being the process of verifying results received by peer majority.
Progress to Date:
Since the main bulk of developing the aforementioned protocols is building the framework for implementing them, we are currently researching in a greater depth the most advantageous ways of constructing these frameworks.
Expected Results:
We expect to be able to accomplish a computing task of given difficulty on the failure-prone networks with increased reliability and efficiency with our enhanced protocols, thus being able to accomplish tasks faster and/or with less computational resources.
Bibliography:
[1] Attiya, Hagit, and Jennifer Welch. Distributed Computing Fundamentals, Simulations, and Advanced Topics (Wiley Series on Parallel and Distributed Computing). New York: Wiley-Interscience, 2004. Print.
[2] "Distributed computing -." Wikipedia. Web. 13 Dec. 2009. .
[3] Babaoglu, Ozalp. "Fault-tolerant Distributed Computing." Editorial. Lecture Notes in Computer Science: 262. Print.
[4] "Cloud computing." Wikipedia. Web. 13 Dec. 2009. .
[5] "Failure." Wikipedia. Web. 13 Dec. 2009. .
Team Members: Max Bond Erik Nelson Arlo Barnes William Fong
Sponsoring Teacher: Rhonda Ward Martinez Mail the entire Team |