Aspects of Grid Management: an abstract ORC-based approach A Stewart, P Kilpatrick, M Clint*, R Perrott, T Harmer
(QUB)
J Gabarró
(UPC)
Aspects of Grid Management: an abstract ORC-based approach A Stewart, P Kilpatrick, M Clint*, R Perrott, T Harmer
(QUB)
J Gabarró
(UPC)
manager Functionality
+
adaptability Grid Application (e.g. Component)
The dynamic behaviour of a manager can be described in Misra's orchestration language ORC.
Overview
ORC is a small language (3 combinators + recursion) which can be used to describe succinctly some essential features of dynamic component management.
Current Work
Adding facilities for reasoning about the reliability of ORC expression evaluations.
The goal is to provide a framework for determining the likelihood of success of different management strategies.
ORC Example 1
Here the data may be supplied by either a BBC site or a CNN
site
Q(x) where x :Î {BBC.w |CNN.w}
The asymmetric operator (above) involves non-deterministic
thread selection and thread termination. BBC CNN Q x Q(x) w w Consider a computation which involves extracting weather data from a web site and using this data in a computation.
ORC Example 2
Grid resources may be known to be busy at certain times – time dependent orchestrations can be specified.
Atimer > t > ( if (12.00 < t < 18.00) >> s1.f(x) |
if Ø(12.00 < t < 18.00) >> s2.f(x) )
Here Atimer returns the current time.
ORC Example 3
A call to a grid site may be unresponsive. In such circumstances the site or an alternative site may be (re)called.
FindW º
( if (x=signal) >>FindW |
if Ø(x=signal) >> let(x) )
where
x :Î { BBC.w | CNN.w | Rtimer(t) }
Here Rtimer(t) returns a signal after t time units.
Reliability
An essential feature of ORC is that a site call may or may not respond. In a similar way a grid site may be operational or unresponsive (due to excessive load or network failure).
Orc Expression Meaning returns a result if
s is operational
E(S)
otherwise no response
Performance
Pr( Ss ): probability that a call to s succeeds
Pr( Fs ): probability that a call to s fails.
In a grid, success might be interpreted as:
Site s and its network are operational.
Site s is operational and the network has acceptable bandwidth.
Site s is operational, can meet job requirements and has acceptable bandwidth.
Conditional Probability
Sites which are known to be currently operational are more likely to be operational in the immediate future.
Let Pr( Ss | Ss) denote the probability that a call to s succeeds given that a recent call to s also succeeded.
Example:
s >> s
Its reliability is given by
Pr( Ss) * Pr( Ss | Ss)
Examples
The reliability of the expression
s
is
Pr( Ss)
The reliability of the expression
let(r) where r :Î { s | t }
is
Pr( Ss) + Pr( Fs) * Pr( St)
Markov Chain for r :Î { s | t }
2 1 0 0 1 1 0 St St Ft Ft Fs Ss
Current and Future Work
Continue development of a framework for estimating the reliability of general ORC expressions.
Experiment with, for example, the GRID'5000 testbed to compare empirical results with reliability theory.
Integrate the work on ORC with the ASSIST/muskel models of the group at Pisa by recasting manager features of the latter in ORC.
Comments