The Group Membership Problem in Asynchronous Systems Aleta Marie Ricciardi Ph.D Thesis 92-1313 November1992 Department of Computer Science Cornell University Ithaca, NY 14853-7501 THE GW)VP MEMflERSHIP PW)flLENI IN AsYNcHR?NOUS SYSTEMS N' Dissertatioii Prcscnt?:?d to the Faculty of the Graduate School of Cornell University in Partial Fulfilinient of the Requiren?ents for the Degre?? of Doctor of Philosophy by Aleta Niarie Ricciardi January i9?j:?? Qc Ali4a Niarie Ricciarcli i??9;? Ai?L R1CH?1?? RESERVED THE GROUP MEMBERSHIP PRORLEM IN ASYNCFlllONOIJS SYSTEN`IS Aleta Marie R?cciardi, Ph.D. (??ornell I niversity 1 99.? The thesis fornially defines the class of I roc s Group Afeti?bc??hip Frobienis (GMP) for asynchronons systen)s These problerris involve rnaintalning a list of processes belonging to the systeni, and npdating it as processes join (are started) and leave (terriiinate or fail). \Ninvestigate closely the strongest nieinber of the ()1MP class. Strong (?MP presents this list in a consistent n?anner to all processes nsing it: the sequence of joins and leaves are identical. \&`e show that despite preva- lent beliefs, strong consistency and efficiency are not conflicting goals. This shonld have significant in?plications f??r distribrited systen?s since the need for process fl)en-)- bership agreement arises in n?any canonical problen?s in distribnted con-ipnting. ?Ve present an inexpensive n?eans (the ?-GNiP algorithn?) of assnring con?plete systen?-'vide agreen?ent on process rrien?bership. NVe discnss the role of process nienibership in distribnted systenis and how to ilse S-GNIP to bnild a Menibership Resoiirce Manager (NIRM). Flie thesis also exaniines whether any w?,aker n?ember of the (.?N4P class suf- fices to specify an NIRNi. In doing so we jnstify using Strong C?NIP over two u?iich weaker C??NtP instana?s in three in?portant ways. P?irst., by con?paring Strong ?;xip and its iuinin?al sohition with tlie weaker instances and their niinirnal sollitions. we arrive at the surJ)rising restilt that Strong ??1Ni1? is often less expensive than the others, notably iii executions in which rriei?bership changes are freqnent. Sec- ond, we show that a n?ei?bership service defined by Strong (4NIP is n?ore robust, n?ore responsive. and n?ore adaptable than a niernbership service defined by weaker GN4P instances. Third, we conipare n?ei?bership services defined by the various GNI Ps according to the utility each service provides higher-level, distributed appli- cations. `I?hat is., ignoring irnplen?entation costs, how useful are the different C-?Ni l? consistency guarante(?s as a platforn? on which to build distributed soliitioiis to dis- tributed problenis? N\e show that the (onsistency guarantees of Strong C?1NIP u?ake it (i.e. anci the n?enibership service Strong C1? NIP defines) niore useful to higher-level distributed applications. ?in ally the thesis presents experimental results fron? in?pleriientings -c AIP. The data den?onstrate that a centralized Xleinbership Resource Nlanager is a non- iutru sive service around which to design distributed systen?5 and provide systeln- wide consistency. The data quantify aiid help clarify the tradeoffs between replica- tion degree overall s',?sten? size and process failure frequency. `!?hese initial results should guide fi?iture NiRN4 desi?n and developriient. Biographical Sketch Aleta Ricciardi ?vas born in the wilds of the Rronx., NY ---H without doubt responsible for her love of outdoor sports. Her future as a cofl)puter scientist was flited when tier father brought lionie a vintage lf3Ni teletype circa ?967 and n?ade a valiant stab at teaching her i?ortran., Cobol., and all about r. She raised ulore than her fair share of aitch-ec-double-ell in her ft?rrnative years (which, according to her parents, lasted far longer tliait tlie national average) but nonetheless i?an aged to graduate froni Cornell Jlniversity in ??.)?I., taking refuge in 13iology arid Niatheniatics until key punches had been rel(?gated to the dustbin of history. Niceting l?rakash Panangaden on a porch sorriewliere, sonie night, iii Collegetown awoke li(?i' early nieniories of Computer Science and she left lier wild dQys for the monastic existence of graduate school. As is only fitting, she met Xlichael while cycling around Caynga Lake on the only nice d??y of the Summer of `92. ill ??o rTiy parents Niarie and F?niii, and to Bngs Bnnny Iv Acknowledgements ?his ?vork ?vould not have been possible ?itliout the gilidance support entli'isiasn?. and generosity of I?en t3irnian. I&en?s coi?fidence has been a constant so'irce of inspiration and his patience in giiiding n?y evolution has been rernarkabli?. l&eith N4arzullo, too., has been both friend and advisor - I have benefited greatly fioni (and enjoyed infinitely) our many di?cussions., technical and political, from the lunatic fringe. NN'?ithout I?en's and T?eith's insights ?armth., and genuine (x)ncern my experiences here ?vould have been severely diminished. l?rakash Panangaden and Bob Constable gave me early encouragement and guidance., as well as the initial desire to pursue these studies. Robbert van Renesse has been inval'iabh iii all technical matters. Niore than any other, Robbert `s assiStance in the implementation stage was unflinching and enthusiastic; I have l(?arned a great deal from lii in. I liave also benefited from niany discussions (and bagels) with Pat ?tephenson, Brad Glade Nlike Reiter arid the rest of the Isis crowd who have been both friends and helpful colleagues. I??m Bressoud has been the perfect office-ii?ate; I cannot begin to thank hini for his patience, knowledge, and for laughing at all my dumb jokes. f)avid Cooper proof- v read the entire thesis and gave n?an,\ valuabli coi?rnent. Niany thanks also to our technical support staff, especially John Vinley, Doug Flanagan, Anne Louise C?ocke1, and ()J. I have also been fi?rtunate to have received funding fron? JR XI, the Sieniens Corporation, N?NS?A, and C?1eneral Electric, Rruce Donald has been a continual joy a dear friend ?vhose kindness and sense of humor have added brilliantly to my lifi?. From Dan Huttenlocher I have gotten ?varmth, laughter, generosity and a great deal of confidence. Daniela Ri's, in addition to her more obvious ?vorld-class talents, is als?? a ?vonderful listener and ?best woman'. Niichelle Duncan arid I have been through many interesting arid important events quite fittingly, all electronically. i?hanks too, to ?he NVomen's Cabal'. Joseph Nlooi.e `vas a tremendous source of corrifort; he made sure niy life was as filled with love, fun, aiid music as it was ?vith work rio small task. Niari ()rser and `Avigail F?isenberg, too, have been dear arid cheri shed friends. Susie Fortebraccie" Armstrong and I have lc?gged many kilometers in tiie saddle arid enjoyed every moment, which kind of goes without saying. Jn the ?`XIoldy Oldies" category are I)eirdre ?aub and Nathan Yoffa both have helped and watched me grow and made nie laugh for rilaily, many years. Chuck lYiller and Mark Theodore also occupy very special places. Above all my family, especially my parents and brothers, have given iue iriore support, encouragement, and inspiration than 1 believe they'll ever understand. I have felt equally and constantly their faith iri rrie, their love, and their pride iri ruy vi acconi??1ishi?ent?. t??ina'1y, Niichael Ogg ilas n?ade what ?iio'i1d have b(?efl a very trying period SC) very beaiitiful and exciting. He lias shown inc such wondrous joys I often find niyself beyond words. vii Table of Contents 1 Introduction 1 1 Exan?ples 1.2 N'Ien?bership ?;?????t???? and Distributed Systein J)esign 1.2.1 In?plicatioris for Distributed Systei?s 1.2.2 N4R-Ni Design Considerations . . 1.3 Related \Vork . . . . . . 1.1 ?liesis Contiibutions . . . . . . . 1.? Uliesis Orgai?ization 2 The System Model and Formal Logic 2.1 ?ysten? R?qitiren?euts and Niodel Assun?ptions 2.2 i?lie ?ystern Model 2.3 The Niodal l?ogic (lsyflc 2.3.-I Syut. J?acking better notation we also write c E c to abbreviate ?6 E 1)1) for sonic p nientioned in c'? and elaborate when tile context does not clearly distiT gnish the intention. Eron? I?ai?port [Lani7S] we give the following definitions of event caiisality and gh?bal, cansal coiisistencv. Defii?ition Events 6 and c' are irnn?cdiatc1i,? causally related in, eat c (written 6 ?+? e') if and only if 2,This is pi?reiy fr)r e?'se of expositioli. 24 I. 6 and (-I 1?tl ?? son?e con?I)one1)t l)p of c; or 2 c --H--H send?(q n?)? ? --H--H recvq(p n?) 6 C 6 and c;? c c. NVe ?vrite 6 Hc 6? to denote the binary relation 1?apprn.s befor? which is the reflexive transitive ch?snre of --H->? over the events of c. I Observe that, iii the abstract, send and receive events are always ( i. 6?, in all cnts) -related while oth??r events derive the relation fron? their hi stori cal context. \N?e will on?it the snbscript when the particular cut is clear froi? context and simply write 6 H 6 I)eflnition ,N c'it c is cousaily consistcnt if and only if whenever 61 c c and 6 in c, then 6 C c. I i:ignre 2.2 illustrates both a causally consistent cut, and one that does not respect cansality. Henceforth we restri(?t the discussion to consistent cuts, as they are the ones that are physically realizable. Consistent cuts represent the possible global states of an execution; while a giver) consistent cut need not have existed at any point ill real tirne in an execution, it is in?possible for a cut that is not causally consistent to ever exist at any point in i?eal tiiiie. A characterization of global, systeni-wide causality should incorporate the no- tion of progress between global states. .%pecifically, we desire that every process either riiakes local progress or ren?ains stationary, none should regress. A process rnakes local progress between the curnii ? ive states represented by hp and ii? ex- actly when liJ) is a prefix of h'p a N. (?onsistent k)ut: Vc H : (c! c ? ( C c) e c ?. Incon?istei?t Cut: c ? e'; ? C c; ? Fi?iire 2. ?i: (i?ausalty (?on?'istent aii&I Inco'i?istent Cuts. 26 1?1 Defii?ition Given c (h1 hn) and c' (?!i h'?), c caus(i-liy precrd?s c? (?vritten C ? rI) if and only if for each process, p, 1. bp h'?; or 2. /?? is a strict prelix of h'p ()?serve that there are (infiiiitely) iii any corripletions (or successors) for any given cut. In this sense, the future of any cut is uncertain; it niay branch out in many possible directions. ()n the other hand c ? c' implies that any execution in which c1is a prefix n?ust also contain c as a prefix. I)efinition lAt c (h1 , . . . , h1l??) and e' (k'1 . . . , h'p . , J?'n) be consistent cuts. ??hen 1. r strictly cansally pr?crdcs c? (written c ? c') if and only if c ? r1 and c ? CI (as in Figure 2.2N); 2. c icry strictly p ?;c?des c' (written c ? cl) if and only if l?? is a strict j?refix of for eaci mentioned (Figure 2.2.13). 2.3 The Modal Logic asynr N?Te now introdnce a formal laugnage, async , with which we will define the class of Asynchronons Pro(:ess Group N4einbership problems. a?ync is a combination of temporal and epistemic logics. Uni?:jue to this combination is its attention to asynchrony the basic semantic entities of the logic are consistent cuts. lii her 27 c A. c ? c' and c ? c c 13. c ? c': J?p i.? a st?4ct puJix of /?I `v. Vig'ire 2.2: Causal 13inary R?1ations on Consistent Cuts. 2S doctoral thesis [?i?y9O]? Taylor defined C?onc?irrent Fp?steir?ic Logi(:, Not only are forTnnl&) of her logic eval1?ated along consistent cilts., hnt she defines a new, weaker n?odal operator that embodies systen? asynchrony and captnres the resniting process `in certainty. The ten?poral logi? con?ponent of ?,?y??,. arises fro?n algebraically analyzing the set of consistent cnts in an asynclironons execntion, and then iriatching each of the individnal algebraic properties with specific characteristic axionis of tense logic. Consistent ctits and the binary relation ? defined in (?hapter 2 fi?rin a downward- closed lattice. See [Ric9Ol for a proof that the tense logic arising froi? the corre- sponding characteristic axtonis is complete for tile class of l?ripke niodels that are downward-closed latti ces 2.3.1 Syntax of t'async NVell-fornied forn?nlae 0f asyflc are constrncted froni a conntable set of prinlitive propositions Ia I a'2, .. J fonr niodal operators, and the set of process identifiers Proc as follows: 4 a I ?i A Y2 ?Y' I I ??)Q' A%y NVe `ise the standard cil?breviations v, ?. ? and ? fi?r readability. The ??nglish langnage interpretations of tile tense niodalities ?, E, ? and ? are, respectively, ??at all fiitiire points' (henceforth), `at all points in the past", "at sonic fntnre point", ? and ?at some point in the past??. ??he English interpretations of the the duality iiiterpretation for ?,` rather than the branching time iiiterpretatioa. lE branching tii?e,' ? is interpreted as iiievitabi!ity' at sonie future point in every 29 episten?ic modalities i?)) and ???? are `?p kno?s"., and "in a state indistinguishable to 2.3.2 Semantics of flasyr?e I?ripke n?odel sen?antics [I&ri63] of niodal logics requires an occ ib?i?tq relat?on on the basic sen?antic entities of the niodel. For ?sync the teniporal operators' accessibility relation is ? - causality bet??een consistent cuts. For the epistenlic operators ?e take the follo?ving definitions from [?)iy9()] Defii?itioi? [faylor] (luts c ( 11p 1?PLhp7 ) and c' ., . . . ` h'p? ) are ifldt%?tifl!]uishabi? to process j)j (?vritten c ? c1) exactVy ?vhen hpj Fori?ul?e are evahiated along consistent cuts of an asynclironous execution. An ?ni?rpret?tion for an infinite asynchroiious run 4? 1A assigns to each prin?itive proposition a subset of consistent cuts of A N'Ve ?vrite : ?aI.,a2,.. ,? 2A (A, c) ? (tj exactly when c C 1?(ai), but will oniit the run, and simply write c ? Q,', when it is clear fi'on? context. A consistent cut c satisfies a well-for?ned lorn?ila, k' (written c ? ?") according to the following structural rules : ?Q iff c ? 6; 0 $` :91 A9'9 iff ? ? 6i and c ? :30 o+ : E)y ilf fc?rall (! ? c. w ?5); o+ : ?Q iff forall c' ? c.(c ? o+ ?): ?? fflff forall c! ? c.(c W o+ .4,: Dy" iff in all runs for ?vhicli c is a 1?refix,exists c' > c. ? o+ : ?y' iff in all rnns for ?vhich c is a prefix,exists (? > c ? o+ 4"': Pp? iff exists c' C A (! p ?` ((A cl) w o+ 4): A))Y' iff forall c' ? c?(c' W 2.3.3 Inference Rules and Axioms Let Q, and `? be well-fornied fornuil?T of ?a?ync ??lie Inference R?nles of async are: Modus Ponens ? ? H ??. Temporal Generalization ? F ?y',, and y k ?y'. Epistemic Generalization ? k A?i?y' ??he axioi?s of ?asyflc begin with all substitutional ii) stances of trtith-fnnctional tantologies. J?et Q' and `?j) be well-fori??ed forninla?. ?be teniporal axion?s are: 3j TOa (Z(Q ? ? ? E4) Tob ? TOc ?(? ? > ? E]'?) Tod ? Tia E]q? ? Tib ?q$ E?y' T2a E)? ? T2b E]? ? T3 ?ffi V?HL T4 ?? _ T5a (?? A D4?) ? ?(?Q' A ??) T5b (?9' A ?V) ? ??o' A ?`?`) _ T6b?Q??o' I?lie cpi sten?i c ?xion? s (?VC frorri [?Fay9O] KO A)) (5 ? (j? Ki A?(9 ? ? ( A)) 0,' ? K 2 A)) 0,' ? Kp A)) ? K 3 A?p 0,' ? kp A)) 5 K4 o,' ? Ppo,' K5 PpPp0,' ? Pp? K6 Fpmi4?m5 ? Pp0' K7 PpQ,' ? ?`Th)?5 Chapter 3 Strong Group Membership Jn this chapter, ?? define formally tile Strong Ci1ronp N1en?bership ProJ?len? for asynclironons systenis. ?he definition sfionld specifjy how to coordinate local events among a gronp of processes so that the externally observable behavior of the gronp (that is, the behavior observed by a process not in the group) is that of a single, fantt-tolerant process. The ultimate goal is for any syst(?ni process to be able to query core NI RN1 rnen ibers' l(?cal views and observe `?one-copy ` [BUGS7b] behavior on the sequence of view reports so-obtained. Because responses to queries wilt be taken as refiectiiig the exact systen? coniposition, we will want to ensure that NIRNI core n?en?bers have ?-dr nt,cqi sequences of view transitions. ?ailed core niei??bers will observe only a prefix of all view transitions, but their local views when they are operational i?ust not be perniitted to diverge. Niore in?poi'tant, core n?embers that have been ousted but are not crashed, must not be pern?itted to misrepresent the system state. ??or example, we wish to preclnde executions in which a functioning :3:3 process that is no longer a core n?en-iber is able to change its local vie?v iiidepen- dently; if that process were qneried? its response conid diffc?r fron? the response a trne core nieinber (i.e. one that has been forced to coordinate changes to its local view) wonid give. 3.1 Formal Specification I?) specify the problen? forn?ally. we define the following events, logical fc?rii'nlt'e, and sets of processes. For clarity, we will always nse en?phasizcd text for events SMALL CAPS for h?gical forn?nl?, and sans serif for process sets. ?he forn?nla UP1) holds along a ci't c if and only if p has not executed qut'!13 in its local history, h1). (1onversely, DO?'N1) holds along c exactly when p has crashed in c. 4?he iiidexical set Up(c) in an asynchrono'is run 4 is the set of all processes that have not crashed along c Up(c) ?p c k UPpt Proc p executes the internal eveilt j)iai!?]p(q) as soon as it suspects q faulty; whether p conies to suspect q through son?e local observation or through gos- sip is in?i?aterial. ?on?e tii?e after recording f(inItyp(q), p will execute the event r?n1ovep(q). Notice the distinction bet???en the events ffl?uit?j1j(q) and rc'n?ovep(q); the for?ner represents ps belief in q's faultiness, which rnay be incorrect, while the latter is actual ren?oval of q fioiu the set of core rncn?bers p believes opera- tional. `l?he forn?ula FA?LTY?(q) holds along all cuts th?t include fciuify?(q), and `By deflnitioii quit? ?viiI be the l?st event ()f p's history in tile ciit. 3,4 RE?iOVE1?(q) along all cuts that include rcn?ovcp(q). Analogous staten?ents hold for events opcrat?nyp(q) and addp(q) arid forniula oPERATINGp(q) ADD?(q). 4?he local n?ei??bcr?h?p Vic?V for process p cut c (A1,An), (denoted LocaIView?(c)), is the set of processes p obtains by sequentially ruoditying its ini- tial n?enibership list according to the rc??ovep() and add?( ) events in 1) NVe use LocaIView? when the cut is clear fiorn context. ?Yivially, we require p C LocaIView?(c). ??he fc?rrnula IN-LocAL?(q) holds along all cuts, c, such that q E LocalView73(c). NVhen DO?VN)?) holds ah?ng c, LocaIVi'ew?(c) is undefined. Because Ap is linear, it niakes sense to talk about the ?th version of j)'5 local view, which we denote Localviewf;. Thinally 1N?LocAL?x(q) holds when q C LocaIVeWf? ?Ve extend local views to group views as follows. Given S c Proc arid a con- sistent cut c, if the l(?cal views of all the functional processes in S are identical, the group view is the agreed-upon local view; if S has no functioning members, the group view is enipty; and if the functionin? members of S have different local views, the group view is undefined. \N'e say that S dete?tiines a group view. Formally Definition Given a ?onsistent cut c and a set of processes, S c Proc the group V??u" determined by 5 along c is SflUp(c)--H? LocalView13(c) A(?. q C S A uP(c)) (LocalViewp(c) LocalViewi(c)) un defined otherwise. Gpviews(c) :35 Lastly the fornnjla IN?CPp holds along all ents, c, snch that p C Gpviews(c); OUT?CPp holds ?vhen p ? Gpviews(c) Observe that precise defi?ition of this for- rniila ar?ticipates liniqileness of the group vie?v along all consister?t cuts. 3.1.1 More About GpViews(c) 1'he definition of Gpviews(c) is crncial to the class of C?roup Nienibership Problen?s so it is ?vorth?vhile discussing a subtle point: how the sets S and Gpviews(c) relate. Recall that Gpviews(c) is the abstraction we are using to define the ultin?ate goat ---H the single, fault-tol?want process illusion we will use to build our NIRNI. In this light, NIR?I core rnern!?ers are precisely the n?ernbers of Gpviews(c), so any n?ernber of Gpviews(c) should be able to reply to ?1R1Ni client requests. Now if q E (GpViews(c) fl then q is a core n?en?ber whose local view is not used in deterniining the NIRNI composition and systen? state: specifically, q's local view is not constrained by the definition of GpViews(c), so LocalViewq(c) need not be ideiitical to GpviewS(c).2 Because q replies to NIRNi client requests based on its local view, its replies will contradict other core rner?bers' replies when LocalViewq(c) # Gpviews(c). ?llius the single-process ilbision falls apart, and the NIRNI fliQy easil?, fi?ll short of its stated aiiii in assuring global colisistency, when every core n?eriiber's l(?cal view is not a?coiiuted for in detern?ining the NIRNI group view. To avoid this, we iieed a clause in the specification requiring q to be iii S when- 21n practice, a core n?eiiiber's iocal vie?v includes in additioii to the NI RNI composition the cot ire systeni process coil iI)ositIoi). :36 ever it is in GpViewS(c) (Gpviews(c) n UP(c)) c n UP(c)) Since p C LocaIView?(c), A Up(c)) c (GPviews(h A uP(c)) and so the desired clause ?ill require S Gpviews(c). Note, too, that the NIRKI must be unique to maintain the single rocess illn- sion. ?`he desired clause ?vill incorporate the above-mentioned equality as ?ell as uniqueness ah?ng all consistent cuts of the set, S, satisfying equality. 3.1.2 Strong GMP Specification ?Ve now have the language a id formal tools necessary to formalize Strong C??NiP. G?P--HO (Base (lase) An initial group view exists along the initial cut: V (S() GPviewc0(5?)). SocProc GNlP--H1 (?%?lidity) Processes do not make changes to their local views capriciously: a. (?iN?LocA?p(q) A iN?LocALp(q)) ? FAUUrY?(q) (??iN?LocALv(q) A iN?LocALp(q)) OPERATI?C1(q). In contrast to fAULTY?(q), oPERATI?c?(q) is riot stable. GNlP-2 (j;niqueness) Non-null group views are unique along all consistent cuts. :37 For all cut? c: v (cPviews(c) s) A UNDEF'D(Gpviewsl(c)) 3(S GNIP-3 (Sequei?ce) ?li processes exhibit the sa?ie sequence of local vie?vs cori- teu?poraueousty provided tile views are defined: UNDEF'D(LOCaiVIeWp) I??LocAL1X;(q) ? DOWNq V (LocaIViewq LocaIViewpr?) GNlP--H4 (Liveness) For each eveut ffiuityp(q) (respe , operai?ngp(q)) and each process 1) cGpviewr?, eventually either p is ren?oved froi? the group view, or q is re?ioved fion? it (respective, added to it): a. FAULTY?(q) A IN?GPp ? (?oUT?cPq v ?nLT?GPp) b. oPLR?NfINGp(q) A IN?GPp ? v ?oUT?GPp). (;?NIJ?-3 is equivalent to requiring that each local view eventually becot?e a group view. `flie presence of ? forces a group view to exist along sonie consistent cut in an execution. `Fhis too, is why we cannot bind LocalViewq to a version nuiiiber. Local views when indexed by version nnn?bers are static --H- the couiposition of a process 5 j'th local view will never change. So suppose c is the witness for ?. Then oi?itting the version superscript forces LocalViewq (c) (for q E LOCaIVieWf?) to be identical to LocaIView1X; at least along c Had we iiicliided the version nnn?ber in the equality clause, we would not have been able to conclude that group views necessarily exist, since the local views Tieed not have been identical siniultaneously. Finally ?i'ncc each process execntes at least one event bet?veen local vie?vs w and y $ I the corresponding gronp views ??ill exist along cnts that are ?-related. Hence, it n?akes sense to talk abont the ylh gronp view, ?vhich we denote Gpview'r Chapter 4 A Protocol Solving Strong GMP This chapter presents a distrib'ited protocol that solves ?trong (11NIP and therefore achieves the single. fanlt-tolerant process ilhision ?ve discnssed in Section 1.2.2.1. The trong G1ronp Nlen?bership Protocol (hereafter s-c?IP) is both asylfln?etrlc and centralized: a distingnished core rneniber? denoted ?gr Is responsible fc?r coordinating updates to the outer (i.e uoil-n7gr ) core inenibers' local viev?s. In a syn?i?etric, distribn ted sol?ition all core 1uei?bers ?:onld behave identi c to denote an update to Gpviewr'--Hl in which ?` (the value proposed) is a set of process identifiers. NVhen the version nun?ber is clear fron? context we sin?ply write ??. \Ve will also sometimes want to identify the specific processes in t'. 4.1 Simple S-GMP ?Vhile we assunie in this section that nigj does not fail, the protocol we present is in fact niore coi?pli cated in coni niunication structure and degree of coordination than this assuniption warr'?nts. Indeed, if we kn?u' that nigr could not fail we would already have a single, fault-tolerant process and ?? could solve ?trong G?1P with a reliable broadcast. Instead, we anticipate rn?r' `s faihin to sin?plify presenting the reconfiguration algorithn'i in Section 4.2. \?`?hen ni?r becon?es aware of an outer nieiuber's, say `r's, failnie, it initiates a two-phase update algorithn?. In Phase 1 (Vigure 4.1) fli9T proposes r's renioval by 42 Ph(1sc I f)h?se if Tflflr M-sub(--Hr) M-com(--Hr) Figure 4.]: `1??vo-Pliase (?oi?mnnication Structnre of ?in?ple s-c;?iP. ninhicastirig2 a s'ubn??t niessage, M--Hsub(--H?), to the nienibers of its local view and awaits each outer core niertiber's response or its own i?elief in an c?iiter riiert'1?er's fariltiness, In this way all core nienibers that are not believed faulty by nig? believe r faulty at the (?ud of Phase I. If nigr receives responses Thoni a niajority of its current local view-, it niulticasts a conin?tI niessage, M--Hcom( --HT'), in 1?hase 1 j,:i If n?gr does not receive a niajority response. it `nust block. If local views are identical at the outset of Sinipte s-c;?ip then, because nigr is a single process, local views are identical at the end of Siniple S--HCNiP. fhe subniit rnessage coordinates belief ainong the group iri r's faultiness; the coniniit niessage tells outer n?ernbers that the group has reached (iqree'ncTit 0ri r's ?\4 ulticasts are iiot failure atoniic 3?I)vpicaIIy a phase of coijiuluilication consists of a n?'ilticast fron? a sii?gIe process t() a group of processes anci their responses back to the niessage initiator. in fact, Siiiiple s-c?N.iP is one?aod one?haIf phases but this is awkward. 43 Phase I (??n?presscd I?hase 11 and Phase I fl?J(? q N-sub( --Hr) M-com(?r):M-sub(--Hq) M-com(--Hq) A A 4 L crasli Fignre .2. ( onipressing Siic???ssive !i?stances of ?inipli s-e??iP. faihire and that they shonid now remove r' froni their own local views. `I?hi? agree- rnent., however, need not he con?plete: becanse ?7?gT does riot receive responses frorri onter members it l)eli??ves faulty, it does not know whether these n?enibers received its snhmit n?essage. Even though such a n?en?ber iiiay receive u?gr 5 messages, rngr certainly has no way of knowing this. F oni n?gr `s perspective, these riiernbers n?Qy not be aware of the current update to tlie group vie?v, rendering con?plete, core-wide agreement on the new view contingent upon the subsequent reu?ovat of these faulty core nienibers. Finally, Gossip ensures that operational outer processes become aware of sud? contingencies. observe that tlie I?hase I subriiit niessage is unnecessary if ??gr knows a niajor- 44 ity of the non-fi?nlty onter processes already believe a process, 5Qy q, fa'ilty ??hat is, rr?gT need not disseniinate its own belief in q's fanltii'ess if it knows that the onter n?ernbers already believe the san?e thing. in this light, the contingent npdates when piggy-backed ni?on a con?rnit fliessage, serve as the snbrnit fl?essage f??r the next view change. NVe can thns coflipress snccessive instances of Sin?ple S-GNIP if n-i?r niakes known ?vhen it multicasts the Phase it con?niit message. exacttv how it plans to change the group view next. Vor exaniple? in Figure 4.2, process q crashes before reponding to M--Hsub(--HT) causing nif]r to suspect q faulty. Hy appending M--Hsub(--Hq) to its con?n?it niessage M--Hcom(?'r), migr indicates that it will iiext at- tenipt to reniove q from the group view. Outer processes respond to the cou?bined con?niit-submit niessage as they do to plain subn?it niessages. 4.2 F'ull S-GMP \`N?hen n?gr fails (or is believed to have failed) the outer nieinbers execute a rccon- fiqur??fion algorifAni to select a neiv coordiiiator and, if necessary, reestabhsh the group vie?v. Fo see h()w this illay arise, coiisider Figure 1.3 in which I?gr fails in the riiiddle of a conini?t niulticast; because local views differ along the second cut, the group view is undefined. R?configuri ug safely and successfully involves solving two problenis: s access?'on ?vhich process(es) should initiate reconfiguration and which should assilnie the n?!]r role at the end; and progression which update should a reconfiguration niti ator propose to n?solve niembers' inconsistencies and niai ntain safety? ln practical tern?s. a correct reconfiguration depends on an initiator's ability to 45 mgr I? q M-com(?r) LocaIView?'?tI LocaIView'? Gpviewr Figure 4.3: nig'r `s Failure Results iii I?ndefiued Croup \7ie?v. deteru?iue the last defined group view and propagate the correct s'ibniission for the succeeding group vie?v. As P??igiire 4.3 slloivs., proposals n?ay also be partially known arnong the current group view, arid this caii pose problents in reconfiguration. `fhe n?ost intriguing and di fliciiit aspect of reconfiguration is accounting for in- tisibic coniniits. An invisible coi?niit is an update that is coi?uiitted by a set of processes, but can never be known to have been con?n?itted by then?. An invis- ible coi?ryiit occurs when the only processes receiving a corrin?it iliessage fail, or are believed faulty by the rest of the group. i?his is significant fiw reconfiguration: while no subsequent reconfiguration initi ator will ever An o ?t whether these processes received the coniniit n?essages and updated their local views, Strong CXlP (specif- icalky, c'1Nl F?-:3) requires that if an invisible coinriut did occur, the reniaining core ruenibers n?ust behave in a manner consistent with that event. It is iruperative, then, that every invisibly coniniitted update be detectable by every recorifigurer. Fhis is the core issue in nialutaining C;NlP-3. NVe can ensure this only if all initia- 46 tors (?lietlier n?gr or a reconfigurer) atterripting to install the yth group view vie f()r the reqnisite n?ajority responses froni aniong the n?e set of processes 4.2.1 Structure ofthe Reconfiguration Algorithm tnlike the n?gr -initiated algorithni, reconfiguration soinetin?e requires three phases In actuality three phases are necessary only in the ?vorst and highly unlikely case., but fc?r simplicit?' ?ve p - ent the algoritlim here as three phases and discuss the cases when t?vo suflic? in lapter 6. That reconfiguring is three-phase is interesting and important, though not surprising in light of Skeen's work on non-blocking commit protocols [SkeS2l. In the first phase, the initiator r, broadcasts a reconfiguration int(c'r?vgotc message, R--Hint(ver(r)), to all processes in its local view and awaits their responses or its own belief in their faultiness. Core members respond to the inter- rogate message with their current, local state, If a majority have responded, the initiator uses the inforniation it received to determine an update event, say `c, and version number, say o+r. whose execution would result in a well-defined GpView'? The initiator multicasts this event as the Phase II reconfiguration s?ibni?t mes- sage, R--Hsub(? tx >). After obtaining a second majority response, acknowledging receipt of the submission, the initiator broadcasts the Phase III reconfiguration conintiY message. R--Hcom(? v, x ?). Again majority approval of R--Hint(ver(r)) and R--Hsub(? v, x >) are ??ssential in maintainilig Sequence and I?niqueness; without either, the initiator must block. Defii?itioi? An update initiator (either rngr or a reconfigurer) is succ?:ssfni for its submission (M--Hsub(? ?`, x >) or R--Hsub(? `??. x >)) if it can reach the appropriate 47 R--Hint(x --H I) R--Hsub(? --Hm?-qr.?T >) Th A-- r2 Tn-gr ?r1 said.,?R-sub(? `r- ? ___ -_____ ----H--H? R--Hint(x ??ng?. sa1'd?'M?sub(? ??. Y >)- M--Hsub(? ??X >) Figtire 4.4: Recoiifigurer r2 I?earns of -? nilicting Proposals for GpView'?' S-CNlP con-ii?it phase; that is, if a niajority si?bset of its local vie?v respond to the proposal. A subi?itted value, ? v, x ?, is stable if and only if the initiator sub?nitting it is successful. I Iii this light. GNiI?-:3 requires that a-il s??ccessful initi-itors trying to install (or cornplete the installation of) the xth system vie? make identical proposals. ?he local state infor?uation collected dnring reconfiguration Phase I intist allo? the reconfiguration iiiitiator to detertiiitie the correct update proposal unai?bi guonsly. In s-c?iP all successful reconfigurers atten?pting to install (or coniplete the installation of) the o+i-.th group view propagate n-19r 5 proposal if they become aware of it., and propose fl'?fjT 5 renioval if they do i-lot. I?'.nfortunateLv, as Figure 4.?! n-?akes clear asynchrony and inopportune failures can result in there being two different proposals for the sanie instance of the group view. Correctuess requires that only one of theni beconie stable, and that any non-bl(?cking reconfigurer be able to deterri?ine which one it is by the end of reconfiguration Phase I. Hy propagating the stable subn?ission, a reconfigurer forces the entire group to act consistently with any invisible conin?its. 4.2.2 Rules of Succession ?N?e solve the sn(cession problen? by imposing a deterininistic, i?nc--?jr ranking on core merubers based on seniority in the group view -- ?older core n?en?bers are ranked higher.4 \N'?e ilse rank(j?)>rank(q) to denote that j) has higher raiik than q. \Vhenever a process is ren?oved froni the group view, the ranks of all higher- ranked processes are decreased by one. Observe that for as l()ng as p and q are both (conten?poraneously) core n?en?bers their 1.anking rc-1ativ? to each othC r `v 11 not change. A process initiates reconfiguration when it believes all those ranked higher than itself are faulty. [hat is, given cut c and LocaIView?(c), define the local proposition INITIATE( I)-) A $ank(q) rank(P)) FAULTY1) 4?LocaIView?(c) 4?I?1iis does liot i iiean ???e cannot join niiiIti??ie processes to tlie group vieiv in a single update 49 ?%4?le 4.1: Ii?itiating R?configuration: FAULTYp(n2gr ) A FAULTYq(fl?gr ) Process raiik up? FAULTYq(p) INIiIAiE(q) INIT!ATE(p) rai?k(n?gr )= p True False False True rank(p)= fi 1 False False Eventually False rank(q)= p --H 2 True True lYne True False False True False NVhile initiating reconfigtiration on INrrlATE(p) can lead to n?ultipli?, concurrent reconfiguration initiations, it guarantees at h?ast one process ??ill begin the reconfig- uration algorithu?. Consider ??bh 4.2.2 in ?vhich rank(n?qr ) = p, rank(p) = p --H 1, and rauk(q) = --H 2, and both p and q believe u?gr faulty. In the third scenario both p and q initiate reconfigurations. S-cNlP must ensure view uniqueness in the face of i?ultiple, concurrent reconfiguration attempts. In the second scenario q Cx- pects j), ?vhich has crashed, to ini?iate a reconfiguration. Any sohition must also ensure that q eventually comes to suspect p faulty. In S-G?IP, q times-out ?&?ting f:)r ps R--Hint( ) niessage, surmises FAULTYq(p), and then initiates reconfiguration. 4.2.3 Rules of Progression: Maintaining Safety fl? understand the difficulties encountered in reconfiguration we exa ne c;NiP-2 and GNlP-:3 more closely. I5niqueness requires that at most one group view exists along any consistent cut in an asynchronous execution. In the situation depicted in Figure `4.5, Q and R are subsets of Gpviewx?, aiid q and r are both initiating reconfiguration. If all niembers 50 q Q j? -? j?z R-int() = FAULi?Y(??) FAULTY(q) Yigure 4.5: Niajority of Responses Needed of Q believe ? fa?ilty., tl?ey ?vill receive neither r?s R--Hint( ) n?essage., nor its i?iiase II proposal., nor its Phase III corni?it. N'nalogous staten-ients hold for the n?ernbers of R regarding q. If T?'5 j?roposal differs from q's then the rnen?bers of R ?vill commit a different vahie than the men?bers of Q. ?ince Rczipfr?f ?vill e??ntnaliy rerriove all of Q u fqf., and Q U ? qJ ?vill eventnally reii?ove all of R LJ frt t?vo distinct gronp vie?vs ?vill exist. Naively it ?ould appear that the n?ajority reqilirelnent suilices to preclude this occurring. Hoivever as Figure `4:3 niakes clear initiators that fliQy end up installing (snbn?itting and coinn?itting) the san?e group version iieed not have identical local vie??s ??hen they "in dertake reconfiguration. Any solution niust en sure C??4P-2 ?vhen concurrent initiators a?e seeking niajority approval ftou? different sets of processes. 51 M-sub(?vx>) M-coin(?? I? q r R--Hint( --H 1) Vigiire 4.6: Ren?oval of q (lonirnjtted Invisibly to p and r Henceforth., let jtr(c) be the size of the sniallest niajority siibset of LocaIView? (c) and ?` be the size of the si?allest iriQority snbset of LocaIViewX? _______________ ? def LocalViewf' itr(c) (LA' LocalV2iewr(c) t I I?r 2 4.2.4 Reconfiguration Phase I Responses Onter processes responses to R--Hint(ver(r)) niiist allo?v r to detern?ine th( natiire and coniposition of all local vie?v inconsistencies, inclnding inconsistencies iiivolving core niembers that did not respond to ??. ?N?hile local views alone sufhce to detect inconsistencies aniong the processes responding to an initiator, this inforniation falls short of satisfyiiig GNiP-:3 (Seqnence) entirely as invisible coininits are riot detectable. 52 In Fignre 1.6 ? > is conirnitted invisibty to I), q., and r. Since all three have identical lo?al vie?vs, r ?vill riot detect the actnal discrepancy. Ho?vever, p is a?vare of Tng? 5 intention to cornn?it ? ?., X >. and p can envision a sitnation in ?vhich n?gr sncceeded in doing so and then failed (in this case, the sitnation that actually occurred). If p ??ere to forward n?gr `s inteiition to con?init ? v, x ?, v would then envision the san?e situation and propagate ? v, ? > as its Phase II subn?ission. Thus, in addition to its local view, an outer n?eniber iTilist also report how it expects to cliaiige its local view next. 4.2.4.1 Ren?arks Since fault deterniinations are pernianent (c.f. Section 2. I ) all processes, but par- ticularly initiators, should n?ake theu? cautiously. Any core n?eiriher believing a majority of its local view faulty can never succeed in con?rnitting an update. N4ore- over Gossip ensures that future cornrinini catiori with another core rriernber results in the latter also believing this n?ajority faulty. Recovering fron? such a (otnpict(? f(Ii1u?? is diffIcult; one could employ, for exaniple, Skeen?s algorithni [SkeSS], but a con?plete discussion of recovering froin total faihire is beyond the scope of this work. Ilefore presenting S-C?IP we should address issues arising at start up. In prac- tice, a process wanting to join the NIRNi will query the systen? rian?e service for the NIR?XI. If the nan?e service reports that the NIRNI is non-existent, this process executes an initiation sequence and becon?es the NIRNI. If the name service reports that the X?RN1 aheady exists, the process requests of the NIRNI that it be allo??d 53 to join. ???eIRNI core i?nst bnild to critical size, after ?vhich it begins operating as a system service, and nntil that poirft, the nan?e 5C1'ViCC 15 not fanlt-tolerant: it is either a single process that n?ay fail, or an nncoordinated gronp of processes that iTlay report inconsistent inforn?ation regarding the `\4RM. In the latter case. a race condition way occnr if more than orie process desiring to join the NIRNI is told it does riot exist. 4.3 The S-GMP Algorithm The ITiost irriportant, arid rather astonishing, aspect of onr algorithrn is the complete lack of restrictions on changes to a gronp view. Specifically, there is no upper lii?it on the number of processes (i.e. core iiienibers) one can add to Gpviewr to form Gpviewr',+l; if removilig processes from Gpviewr?, tile npper limit is the size of the largest minority stibset of GpView". I'llis flexibility has a n'imber of very irriportant rarriiflcations. (lompared to an NIRNI defined by a less-stritigeut ri-ienil?er of the class of Asynchrouons C?1ronp Nierribership Problems (we make tliis explicit in Chapter 7), an NIRNI defined by Strong C?1?Il? i-las a number of striking advantages: Fault-tolerance Th( Strorig CNil? ser?i - can tolerate nearly half its niernbers faili rig simtiltaneotisly. Nioreover. the fanit-tolerarice level is dynari?ic and depends oil1y oi? the size of the current group view. \Veaker versions can tolerate the simultaneous fitilure of only a fixed number of members, and given initial conditions, this will usually be otie or t?vo. 54 Responsiveness The Strong GNIP service can adapt n?nch n?ore quickly to severe changes in its ?vork load. In a single change to its core rrienibership? the Strong GNIP sei?vice can add an arbitrary uni-riber of ne??? nienibers to share a snddenly?increased load or remove nearly half its ?nenibers ?rhen its ?vork load drops quickly (thereby ridding itself of dead ?veight'). \?\`eaker protocols ??ill need to run as niany as n rounds to change n?en?bership by n processes. Additions require very little change to S-C4?IP as presented in the first t?vo sec- tions of this chapter. Let vi be the set of new n?enibers. An initiator say f sends a Join niessage, giving the nienibers of i;i permission to join. in-imediately before it n?uliicasts the commit message, com(? ??I, x t 1 >):M--Hsub(? i?, + 2 >), to GpView"'. The initiator also sends a State--HXfer message to ?i, informing the ne?7 members of all relevant core and system state. `Ib simplify bookkeeping, new members begin with local version equal to the group version tl-ieir addition resulted in. If j)I E vi had not already received the state transfer iriforniatiori from a previous initiator it initializes its local state and sends ack(Join) back to r. If p' has already joined at the behest of a previous intiator, say i.(), p' responds to r with NextUpdate?i as set by r(). This response is necessary to niaintain GNIP-2 and C?N?P-3; it deals with the case in u-'hi?4? r?i.'s commit niessage differed from T'5 in the coiitingent submission component, say M--Hsiib( ? `1)2 ? t 2 >). and r() niQy have been able to commit ? v2, x t 2 > invisibly to r and all menibers of Acks(i, R--Hint(ver(r))) (See Figure 4.3). 55 M-com(? ?;1,x +1 >): `L I Tflgr' )oi?n a??d ?tate-x??er - NextUpdate7'1 R--Hint(x) Nf ? DcterinincPropo?al --- - ? I ? v1 x + 1 > 12 = ? --Hu?gT?x + 2> - R-com(? ?i?? +1 >): M-sub(? 1?. X +2 >) Figure 4.7: A Situaticr? Requiring New (?ore XIen?ber? to Rq-?ort NextUpdate to tuitiator. 56 in ??hat follows ?` [ and r2 are sets of process identifers. For fnll generality we use ? to denote either of the operations that after h?cal views - ren?ovc'() or add( ). C;???? a set of processes, G, let `t?initicast?(G ri-i) denote the compound action Vq c (r" : (se?nd?(q, rn)). niiiiti?c?t tp(G rn) is an indivisii?le action only in the sense that p does not execute any other events until all n?essages are sent; it is not failure-aton?ic. ??he n?essage ack(n?) acknowledges receipt of message n? ?he sets FauIty? and Recovered? are, respectively the rneuibers of its k?cal view p believes faulty, arid the uieu?bers riot in its local view j) currently believes functional For siniplicity we do not explicitly show gossi?)ing, channel-disconnect or any error-checkirig, bnt asurrie these are done transparently. For example, we do riot show an outer process checking whether the process fron? which it has received an M--Hsub() niessage is, iii its belief, the current n?gr , whether the update indicated iri a com() niessage actually n?atches the update indicated in the n?ost receiit sub() n?essage, whether the processes to be reriioved (or added) are currently (not) in the group view, and so forth. 57 Task:mgr ?vhile (tr'ie) repeat GetUpdate(?'1); until (??1 nil-id); n?ulficastn?gr ( LocaIViewn?gr M--Hsub( ?vt)); while (??1 nil-id) 1* Con?pressed algorithi? loop ?/ forall p E LocaIView??91 TCCVn?gr (P. ack(M--Hsub(+?'I))) or f(?uUyj ?gr if (majority of LocaIViewn?9r didn?t respond) qnitn-?gr /? Update LocaIView?ngr according to I DoC?oinmit( ??i); (41et LT,pdate(t2); if (Joining new members) muifiaistmgr (vi Join : 3tate--HX?er) forall j)' C ??? r(cvn?gr (vi ack(Join) : NextUpdate?') or f(?uitymgv (p'); v2 = NextUpdate?;i in?i4icast'?ngr (LocaiViewmgr , M--Hcom( Iv t) M--Hsub(I?'2)); `2; 5S /* Called only l)y ?ngr. I?rocednre:C?1et J?ipdate(OIJ$Ivat); if (?ecoveredrng? ? ?) wil --H--H Recoveredmg? else vat FauIty?ngr; return(); Task:Otitert?rocessesp rccvp(fl?gr M--Hsub( ?v1)); /? Ni?iark the processes of ? 1 faiilty or operational as appropriate. ?/ Dot)reCoTnniit(??i ?); repeat scndp(n?gr ¶ ack(M--Hsub(???1))); t'(cvp(T17gr M--Hcom(Iv 1): N--Hsub(?i?2)) or faultqp(?ngl ); if (!FA?LTYp(Tngr)) DoPreCon?rnit( t'.2); DoConin?it(vt); ? 1 --H--H else NVai t- R econfiguration(); until (`L'i nil-id); 59 /? Execiited l%Y a proc `:5 destined to be an N4 RN4 n?en?ber. ?/ New-Process:PI nan)e-server- I ooknp( Xi H,Xi, contact-id); if (contact-id nil-id) /? Race (londitions Possible. Localview11 narne-server-Install(N'lRXl., PI); Regin mgr Task; repeat scndy (contact-id request-j oln-MRM); rccv?i(contact-id? Join(<: t., :r >) : State--HXfer) OR tinie-ont; if (time-out) ri an?e-se'.ver- Lookiip( Ni RN4 contact-id); else /" LocaJView?? is part of State-XIet 7 1)oCon?n?it(? `l.Y nntil (NIR Xi -n?eniber OR Ni RNi-un available); else Reconfiguration For R?conflg'irin ? ( introduce the follo?'ir'g variables: o+ NextUpdate? is a tuple of the forni Fids, ver(p) + 1, rank(init)lp , where Ids i? the set of processes by which p is expectin? to aft?w its current local view, ver(p) and rank(??iit) is the rank (in .ver(p) of the proces? LocaIView? LocaIView? from which p re??eived the proposal. \`Vhen p receives a subn?ission it changes NextUpdate? to reflect the valne proposed and the initiator proposing it. o+ LastCommi=t? is a tuple of the forn? [Ids, ver(p)]p, where Ids is the set of pro- cesses by which p altered its previons local view to obtain its cnrrent local view LocaIV ver(p) Iew? o+ Ahead? is the set vahies reported coi?i?ittcd f:?r versions nun?bered greater than ver(r). initiator r receives these values in response to its R--HInt(ver(r)) niessage. Behind? are reported local versions nuiiibered less than ver(r). Anticipating th(' correctness proofs of Chapter 5, the reported version in Ahead? is exactly ver(r) + 1, and that in Behind? is exactly ver(r) --H o+ SubCurrent? is the set of proposed next updates with proposed versions equal to ver(r)+ I; SubAheadr is tlie set with proposed versions greater than ver(r)t 1. 61 ??sk:ReconfigurationIniti?tor,r?ithver(r) m?it?casf? ( LocaIView? R--Hint()); forall p C LocaIView? recv? (fi, NextU pdate1? : LastCommit?) or faultyr(p); if (niajority of LocaIView? didn't re?pond) quit?; /? 1?etcrrnine tlie value and version to siibi?it fron? the responses received, ?/ Detern?ineProposal(t;i , ver,, v2); ?oPreComrnit(?' I); niulticasir ( LocaIView? R--Hsub( ? ?v I ?`cr forall p C LocalViewr recvr(p, ack(R--Hsub(? ?`?`1 , ?`r or fauUy?(p); if (inajority of LocaIView? didn't respoi?d) qnitr; f)oC'o11i1?it(v1); if (Joining ne?v nienit?ers) muiticastr(??1, Join : State--HXi'er); forall j)1 E ?1 i?cvr('vI,ack(Join) : NextUpdate?i) or faultijr(p'); v2 NextUpdate',1; multica 1r( Loca?View? R--Hcom( ? ?? 1' t'cr : R--Hsub(I?2)); n?gT' , ii r, ?2; Begin n?gr ??sk; 62 Task:OuterReconfig??ration.,p Tecvp(r., R--Hint(x)); /* If rank(p) > rank(?), I'i? thought faulty. ?/ scndp(?., NextUpdate? : LastCommit?); recvp(T%R?sub(? +vi., VEr or fauityp(r); if (!FA?LTYp(r)) Doi?reC?oinniit( i); ?r:nd?(m ack(R--Hsub(? I'?'1 ??r ir(cvp(r., R--Hcom(< t?t vcr : M--Hsub(+V2)) or fa?1iyp(r); if (!Fi?UL?Np(r)) DoCou?rn1't( vi); Tfl?r V 1 ??. else ?Vait- Recoufiguration (); else N'Vait-Recoufignratiou(); 63 7 Sets proposal ?nd invisible fioni aniorig the 7 7 NextUpdate and LastCommit information. k?7 7? ver(r) = ;r. 7 Procedure:Deterrninel)roposal(0?;`iproposal,()?4?UversionO????Finvisibk?); Ahead? --H f [Ids, ver(p)]p I ver(p) (? t I )J Behind? = [[Ids,ver(p)]p I ver(p) SubAhead? = [[Ids, ver(p) t 1 rank(in?[)]? I ver(i)) = t 1 )J SubCurrent? = [[Ids, vcr(p) + t,rank(?n?t)lp ver(p) = if (Ahead?# \b) 7? Partially committed version v + I. -?7 proposal = Ahead?; C41etStableProposal( invisible, Su bAh ea d ?); version = x + 1 retnrn(); if (Behind?# ?) 7 Partially con?mitted version x. proposal = [Ids, ver(r)lr; 7? LastCommit? ?7 GetStableProposal(invisible, SubCurrent?); version = retnrn(); 6'! Petern? ineProposal,continued. /? All respondents report the same local version. ?/ version = x t 1; if (SubCurrentr == proposal = ? --Hlflgr r t ? GetI??pdate(invisible); return (); if (SubCurrent? is a siiigleton) proposal SubCurrentr; C?1et?j?pdate(invisiJ?le); return ( ); /? SubCurrent? has two elements. 7 C?etStableProposal(proposal, SubCurrent?); C?1etUpdate(invisible); return( ); /? update-set has no niore than two elements. 7 Procedure:???etStable Proposal( 0 I?'f?`(1ic?>,iNupdate-set) ? ia!., i(:'i > = the elen?ent of update-set with the loivest ?Yinkcd ii?itiator. return( ); Chapter 5 Correctness In this chapter we prove S-GMP correctly solves Strong GNIP ?Ve prove correctness by induction. In the first section of this chapter we construct the inductive step; li?ll correctriess fi)llows easily fi'oi? there. 5.1 The Inductive Step fhroughont this chapter we use sub( and com() to denote subn?it and con?rnit rYlessages irrespective of the sender's role (n?gr or a reconfigurer). As in Sec- tion 4.3., NextUpdate? is the triple tids,ver(p) + 1 rank(?n?t)]p . E?or each p, C??ossip., Disconnect and INITIATL() u?ean that NextUpdate? is always the proposal of the 1ou'cst-?anked initiatoi' from which p has received proposed updates for group view ver(p) + 1. For process 1 n?nlticasting n?essage n-?,, Acks(T,?n) is the set of processes fioni 65 66 ?vh?ch r receives a n?essage ackno?vledging, or in response to fl2. As before, Ahead? dL? fp p E Acks(r,R--Hint(ver(r))) A ver(p) > ver(r)? Behindr def fp p C Acks(r, R--Hint(ver('r))) A ver(p) ? ver(r)? \Ve also assi-in?e Gpviewx?l is well-defined. NVe approach the inductive step by proving all successful initiators propose the same value for Gpviewr??, whidi goes a long WQy toward maintaining both C?1N1P-2 and GN4P-3 as this is the only value that can be conirnitted. ??ie n?ost difficult aspect is ensuring that reconfigurers are able to detect updates that are con?n?itted invisibly to them. Proposition 5.1.1 If ? is a reconfiguration initiator with ver(r) r, then q?Acks(r, R-int(?)) Pro of `Though GpView' i?ay not be defined globally, let p be the process re- sponsibh for r installing LocalViewf'. ?`ipp05C son?e q EAcks(r, R--Hint(x)) reports ver(q)? j: --H 1. `Fhen q has neither received nor responded to p's sub(? v. x so p believes FAULTY?(q). ??ossip ensures r also believes FAULTY?(q) `ipon re- ceipt of com(? t', x >) froni p. Since r received com(? ?), X >) before it n?ulticasted R--Hint(Q?), the f)isconiiect property n?al?es it in?possible for r to receive a response fron? q. Thus, q ? Acks(r,R--Hint(x)). So suppose son?e q' E Acks(r, R--Hint(r)) reports ver(q1) > + 1, and let J)' be responsible for qi installing ver(q1). Because it ? ver(q') --H 2, i' has neither received nor responded to i1,s sub(? v, ver(q!) t>), resulting in FAULTY?(r), and upon q1's receipt of coii( V FAULTYp(q). (5.1) ??liat is, froni ps j)oint of vie?, GpVlew1 is (or has been) defined. Of course Gpviewr rnay not be defined gli?bally as sonie process q that p believes faulty, rriay have ver(q)? x. NVith respect to a recoiifiguration initiator, r, GpvjewL is r?defined at the end of 1?hase 1, if every process in Acks(r, R--Hint(x)) --H EauIty? reported a local version at least as large as . Processes in Acks(r, R--Hint(r)) fl Faultyr may have reported local v(?rsions less than r, but r believes theni faulty by the end of Phase 1. Proposition 5.1.2 Let r be a reconfiguration initiator. Ihen r proposes version x if and only if GpView'?1 is the most recent (i.e. highest-numbered) r-defined systeni vie?' at the end of Reconfiguration Phase I. Proof Refer to procedure !)eterrninePi'oposa] in Section 4.3, o+ \Vhen Ahead? ? ?, r proposes version number ver(r) + ?I. \Vhile it n?a,y also be the case that Behind? ? ?, we can use Proposition 5.1.1 to show that ?ULTY?(q) holds for p C Aheadr and q C Behind? resulting in FAULTY?(q) at the end of Phas?' 1 and Gpviewver(r') being i-defined there. 68 o+ ?Vhen Ahead? ? arid Behind? ? ?, ?pvewver(q) for q E Behindr is the most recent r-defined system vie'vi; 7 proposes version number ver(r)--Hver(q)t4 o+ NVhen Ahead? Behind? ?, cpviewver(r) is the most recent r-defined system view and r proposes version ver(r) + t. Proposition 5.1.3 For any initiator, ?, if P submits sub(? v, x >), then r --H J ? ver(p) ? Proof By way of contradiction, suppose x ? ver(P). From Proposition 5.1.2, we know GpView??' is the rnost recent P-defined group view, so let q be aiiy outer member reporting ver(q) x --H 1. I?et p' be the process from which p received the comn?it message com(? il, ver(p) >). Since q has riot committed view r, it can neither have received sub(? v1, ver(?) ?) froni p , nor have responded back to p'. By the end of P1,5 submit phase FAULTY?i(q) holds, and by C?ossip so does FAULTYp(q) as soon as p receives coii(? 7)1, ver(?) >); P would not have received q's response to R--Hint(ver(p)) On the other hand, ver(P) ? r --H 1 is impossible if GpView"?' is the most recent p-defined view. 69 Proposition 5.1.4 if q with ver(q) x 2 receiv ? sub(? ?)?X >) fron? initiator `r then FAULTNJ?(q) holds ??hen i n?nlticasted the snhrnission: q? Loc)viewr (ver(q) r --H 2 A sEND?(q sub(? t, x ? IAULTY?(q). Proof Let ver(q) y--H2 arid suppose r proposes ? `??? .r >. it is not possible for r to be the same co-ordinator as the one fron? ?vhich q received com(? ???, x --H 2 >) be- canse Vl?O channels force q to receive sub(? ?)!, --H I >) before com(? v', :r --H 1 >) and the latter before sub(? ?`, Y >), resnltlng in ver(q) + So suppose r is a reconfiguration initiator. Proposition 5.1.2 shows that r pro- poses ? v, i > if and only if ? detects Gpviewx?las the n?ost current r-defined sys- ten? view. By definition then, for every p in Acks(r, R--Hint(r)), either ver(p) > x --H 1 or FAULTN'r(p) holds. Thus, FAULiY?(q) holds at the end of reconfiguration Phase I. iii S-G?iP q fails syi?pathetically in?n?ediately after receiving sub( ? v, x --H 1 >) froni r, arid then attenipts to rejoin the group view as a new process. Proposition 5.1.5 If ver('r) then the n?ost recent r-defined systen? vi at the end of Reconfiguration Phase I is no less than ;? --H 1 and no greater than r. Proof Gpviewl't] (or any view greater than x + I ) cannot be the niost recent v-defined systeni view as this would require ver(r) ? r + t contradicting the hy- pothesis. So suppose GpView"?2 is i?()st recent. Let q C Acks(r, R--Hint(x)) such that ver(q) ? --H 2, and h-t p be the process fron? which r received com(? v, x Froi? Propositioii 5.1.4, it is impossibli- for q to have received and responded to sub(? ??., X 5() FA?Ui?Yp(q) holds by the end of p's submit phase. t?roin C)1ossip? FAUL?N'r(q) holds upon r's receipt of com(? i?., x >). meaning that ? cannot receive q's response to R--Hint(x). 1?he teniporal cohcsion of outer inen?bers' local vie?vs guaranteed by l?roposi- tions 5.1.1 and 5.1.5 is fundan?ental to correctuess. if non-faulty core nierubers never receive snbniissions that ornit an update, they cannot cornrnit a vie\v ornit- ting an update. c; Ni P-:3 follo?vs by sho?ving successful subrnissions for the sarne group version are identical. If r, ?vith ver(r) x, is successful in R?econfiguration I?hase 1, i?roposition 5.i.?i tells us the largest version nuniber observed arnong i??'s respondents is .r + 1. So suppose Aheadr is non-null and let p be a process frorn ?hich soirie nieruber of Ahead? received com(? Vrtl, J + 1 >). If the conipressed update algorithni were used to install version r + -I then every rnen?ber of Aheadr also responded to sub(? V+2, j + 2 >). Nioreover, r can irnagine that tile processes in the corriple- nient, Acks(r, R--Hint(i?)), rnay also have done so niaking it possibli? that Ahead? and Acks(r,R--HInt(x)) (and ?"x+l if Gpview?tl GpView U ?)x+1) together forni the requisite niajority subset of GpView?+'to coinrnit view r +2 (See ?igure 4.(;). ?Yoii- ble can arise if Acks(i. R--HInt(:r.)) (and i'+i and i'+2 if applicable) also constitutes a niajority subset of Gpviewi:+2 since neither r nor any process in Acks(i, R--Hint(x)) can know what value p would propose for view i. + 3. If rernoving core nienibers is tlie only perniitted operation, and i is successful it is inipossible for another process to conirnit a version nunibered higher than x + 1. \Vhen additions are per- 7-! n?itted? p i?ay continue con?n?itting new group views and r n1Qy lag behind p, but if both are successful for a given groul) versioii? both con?ruit the sanie vabie. Proposition 5.1.6 Let r be a reconfignratiou initiator with ver(r) = r Ahead? C Acks(r.,R--Hint(x)) report li?cal version x + 1 Acks(??R--Hint(r)) be a rna- jority subset of LocalViewf' and let [) be a process fron? which sonic n?ei?ber of Aheadr received com(? Vx+1, x + 1 >). Then., when the gronp view is only altered by ren?ovals., /) cannot be successful for any view nun?bered higher than x + 1 Proof Let q C Aheadr and let p be as described. ?ince q received R-int(r) from r after coin(? Vr,tl X + i >) from p, INITIATE(r) means rank(r) ? rank(p)? so FAULTY*(p) holds for every member of Acks('r., R--Hint(r)) (by (?ossip). As a result initiator p is successful for +2 if and only if Acks(r,R--Hint(i)) is a niQiority subset of L0CaIV?eWj"'+1 = Localviewf; --H Observe that Gpviewr' = LocalViewx = Acks(r,R-int(?)) u Acks(r,R-int(x))., and that `r will not receive ack(R--Hint(?)) responses from the members of t?+I; that is `tx+I C Acks(r?R--Hint(;r)). t?or p to comniit Localview;;t2 (Acks(r,R--Hint(r)) --H must be a majority subset of Localview;;+1 . Then initiator p is successful for ? it + 2 > if and on?y if IAcks(r,R--Hint())--HIv?+iI _ IAcks(mR--Hint(x))I--H I Loca?View11;+1 I Gpviewr' --H I Acks(r,R--Hint()) I --H ?t+1 --H --H ____ I Acks(r,R--Hint(x)) I + I Acks(r,R--Hint(x)) I --H I Vr?+1 2 I Acks(i.,R--Hint(i')) I --H `?+i ? Acks(r?R--Hint(x)) I 72 but this contradicts the assuniption that Acks(r, R--Hint(x)) is a iu?iority subset of GpView' Proposition 5.1.7 Let r be a reconfiguration initiator ?Tith ver(r) Y. Let Ahead? be non-null a?id let p be a process that sent com( ? `L?r+1 r + 1 >) to sorne nieniber of Ahead?. Then if r is successful for ? x + I? >: 1 . If p- snbn?its ? i'xt2? r + 2 ? independent of the conirnit of ? Ur'?+1 x + 1 >., then either r or j), but not both? is successful for version r + 2. 2. If p conirnits ? x + 1 > contingent upon ? ?Ar-,+2.. x + 2 > being (onirnit- ted, then either or p, but not both, is successful for version :r + 3. Proof 1 . - r I?igure 5.1 (top). The split-arro?v message fioni I to ??+i represents the tivo possibilities for the arrival of r's cornulit message, R?com(??? ?`r+1, X + I ?) : M--Hsub(? --HAcks(r., R--Hint(x)), x + 2 >), at `?`???i. ps coniniit message, com(? `rtI j + 1 >). to ill is a dashed bue to represent the possibility that it n-'Qy riot be received. For presentation clarity, \VC elide r's Phase II subniit niessage. (?1Ni1?-:3 would be violated if both p and r ivere able to conirnit their proposed vallies for version x + 2. 73- (a) Snppose the menibers of ??xt1 receive in from r before they receive com(? ?xt1,X + I >) froni p. Since r s inessage gossips its belief in p's fatiltiiiess? the nienibers of t.xtl will never receive another niessage fioni p. In particular the nienil?ers of i+I ?vill not receive p?s subsequent M--Hsub(? ?j.+2,x + 2 >). NVe say r owns v'tl ?hen p is successful for version :r + 2 if and only if Acks(r,R--Hint(r)) ? Acks(r?R--Hint(?')) + I irti and r is successful for version x + 2 if and onky if I Acks(n.R--Hint(:t)) + I vi?,ti I I Acks(r R--Hint(r)) I I3oth conditions cannot hold. (b) The analysis is the sanie as the previous case; r owns c1--+i once the nienibers of that set receive in. (c) In the last case, p owns r;?+I if its M--Hsub(? "i+2 Y + 2 >) inessage ar- rives at -i'?+i befi?re r's R--Hcom() niessage and so gossips ps behef in r's faultiness. Then I) is snccessful for version ;i + 2 if arid only if I Acks(r,R--Hint(x)) + ?4?,+l > Acks(r,R--Hint(r)) and r successful for ` -- sion i + 2 if and onky if Acks(r, R--Hint(x)) > Acks(m R?--Hint(;r)) I + I Again both conditions cannot hold. 74 2. There are o consider but as the analyses arc quite sirnilar? ?ve depict arid discuss ouly four (Figures 5.2 aud 5.3). (a) Suppose r o\vns Vx,+I. This happeus if the n?er?bers of ??+1 receive o+r 5 con?niit n?essage fc?r version ? + 1 x+1 R--Hcom(? ?1+j.,L + 1 >) : M--Hsub(? tx T + 2>) before either p's con?rriit message for version x + 1 = com(? tti x + 1 >) : sub(? Vx+2 x + 2 >), or p 5 conirnit niessage for version x + 2 = com(? vx,t2.,X + 2>): M-sub(? --HAcks(mR-int(r)),r + 3 >) (kkence., the split-arrow). Further suppos 0??iS `?x+2, as happens when the nien?bers of tx+2 receive nif+2 = R--Hcom(? Vx+2?X+ 2>): M--Hsub(? --HAcks(t,R--Hint(x)).,i: + 3>) frorn i bef:?re receiving fl?f)t2 fiorri p. Then r is successff?l for x + 3 if and only if I Acks(r.,R--Hint(x)) + I I + I tx,,+2 I > I Acks(r?R--Hint(i)) I and p is successful for x + 3 if and onLv if I Acks(r.,R--Hint()) > Acks(r,R--Hint(x)) + I Vx+I + ?x+2 I Y\oth ??ondi tions cannot hold. 0 (b) Suppose p ()?VIi5 v1+? and ?`rt2' ??hen p is successful for x + 3 if arid or?ly if Acks(r R--Hint(x)) + + Acks(r, R--Hint(x)) and ? iS successful for x + 3 if ?d on?y if I Acks(r?R--Hint(x)) I ? Acks(r.,R--Hint()) It I I + t1t2 I (c) in this situatiou (Figure 5.3). p o?ns ?`x+l and r o?vus ?`x+2 Flien p is successful for i. + :3 if arid only if I Acks(r?R--Hint(x)) I + tr,+1 Acks(r.R--Hint()) + ?`i?+2 I and r is successful for j + 3 if and only if Acks(r.R--Hint(x)) I + Cr,+2 > Acks(r.,R--Hint(.i)) 1+ I I (d) In the last scenario p o??ns ??r+2 and r o?? )s ??j.t1 . ??hen p is successful for x + 3 if arid only if I Acks(r?R--Hint()) + V1?+2 I I Acks(r.,R--Hint(?')) I + I I and r is successful for r if and only if Acks(r,R--Hint()) I + I titi I I Acks('r,R-int(x)) I + I I 76 1) vI+i P com(? ti.,+? >) a. M-sub(? ?+2,+2>) R-int(i) Cases a. an?l b: r o?vns ?)xt I. M-sub(? ?`.xt2. ? + 2 ?) corn(? ?`-1;+1.Y + 1 >) fl? R--Hint(x) Case (;: p O?\ -- ?xtJ F?igiire 5.1: (lase Analysis for Proposition 5.1 .7? Part ?. 77 `rn - p p T -p i-'Ql fl-? ?ase a: r owns ?L-'tI and Vx+2 I) R-int(?) Case b: p owns `x+1 and Vx?t2 Fignre 5.2: C?ase Analysis for Proposition 5.1.7, Part 2. 78 Case d: r owns Vy+i j) Owns R--Hint(x) n2f+l vrt2 T12?+1 p Figure 5.3: Continued Case Analysis fc?r Proposition 5.1.7., Part 2. r-+1 Tn - p __?- ___ It9 -r v1+2 r R--Hint(r) - ---- Case c: p owns ?`-?+i. r owns t",t2. 79 1?roposition 5. 1.7 is a keystone to correctuess becanse it addresses exactly the situations in ?vhich it might appear possible for S-GNiP to violate C?NlJ?-2 and (4NiP- 3: ?vhen two initiators are able to propose? and theref:?re n?ight be able to commit., different vallies for th( san?e group version. Propositions 5.t.(i arid 5.1.7 ??()VC that even when the local vie?vs of processes installiug the sariie group vie?? differ ( i.e. the initiators do not vie for niajorities fioni aniong the sai?e set of core members) s-GMP is safe. It ren?ains to consider ho?r decides which of the various proposals it learns about could have been invisibly con?niitted. In s-G?IP this necessity arises during n?configuration in determining L'2 when either Aheadr $ ? or Aheadr \h and Behind? $ ?, and in deternAning ?`? when Ahead? BehIndr `lo ilhistrate the difficulty, suppose Gpviewx?l is the i?ost recent r-defined group view. I?et Submissionsr(x) be the set of proposed next updates f:?r version L' that r liarns about in response to its R--Hirt( ) n?essage: Submissions?() --H I ?p C Acks(r,R--Hint()).NextUpdatep --H \L?)i.,X,Z1? for sorne initiator Intuitively, if Gpviewx?l were corurnitted with an accorripanyilig proposal for GpViewx (?.r--. the corripressed algorithui were used), then Submissionsr(x) would be that proposal and Submissionsr() I--H--H Ilowever, it is also possible that. while there ulay have been no plans for a future update when rngr rnulticast M--Hcom( ? L", X- --H -t >), nigr (lid begin an update algorithru to forrii Gpviewx at 50TT1C later tirne. If, during that sarne interval, an outer core niember, p, began reconfig- uration it is pos?ible for p riot to receive any Phase ? responses indicating n?gr `s intention fc?r GpView"' in this case? p would propose n'?gr `s removal for version x. N' subsequent reconfigurer could ti-icri get i?hase I responses indicating both of these proposals. NN?e first describe tl-ie coniposition of Submissionsr(.i), sbo'ving that every reconfigurer proposing version i either propagates ingr `s proposal for version or proposes Tngr 5 rei?oval (See Figure o+4.-t in Chapter 4). Proposition 5.1.8 Pbr all versions x, I Submissions?(x) ? 2. Proof Inspecting procedure Detei'n?inei?roposal, different subnii ssions for the sai?e view can arise only fron? n?gr and from a reconfiguration initiator proposing fl?g'r `s ren?oval. `?he latter occurs if ar?d only if the initiator did not learn of any outstanding proposal n?ade by rngr that is if Submissions?(x) ?he set SubmissI=ons?(.c) is b?vai?nt if it contains two distinct values. Corollary 5.1.1 follows by exani ing [)etermineProposal. Corollary 5.1.1 Let r and r' be recoiihgurers proposing version .r. ihen if both are bivalent.' they are identical: Submissions?(x) I Submissionsri(j) I --H--H 2) ? Submissions?(x) --H--H SUbmiSSi0flS?'(x). Fliat is all reconfigtirers either propagate n?gr `s nniqlle snbrnission for view :r or propose `tnqr 5 renioval. ?Ve now show that only one of the two proposals for a given version coiild possibly have been coi?iinitted (invisibly or otherwise)., and that all reconfig'irers can distinguish which of the two it was. ?his proposition, too, is important in the inductive step. It shows that in going fioni Gpviewz?l to GpviewT one and only one vahie can be con'iniitted by any member of Gpviewx?l as the same value is proposed by any successful initiator for the xth group version. Proposition 5.1.9 Let r be a reconbguration initiator. If Acks(r,R--Hint()) is a majority subset of LocalViewr and Submissionsr(?) is bivalent, then r can distinguish which of the two valii??s proposed could not have been committed invisibly. Proof Let r be as described, and let ? ?`. X > and ? > be the two e -, -ients of Submissicnsr(?i?). Let J) be the process of least rank arrioi?g those reported to have submitted ? ?`, :t >. and let p1 be the )rocess of least rank among those reported to have subuiitted ? t?1. x >. ?4ien r reasons as follows: In order for either vah?e to have been committed its initiator must have garnered majority approval fiom its own local view for the submitted value. ??hat is, Acks(p, sub(? i?, x >)) c Maj(LocaView?) and/or Acks(p', sub(? t', y >)) ii Maj(LocaIView?1). Since both p and p' make version r submissions, both must have local versions either x --H I or 3 (Proposition 5.1.3). \Ve consider these by cases: S2 1 if ver(p) ver(p'), then both processes vie for majority approval from among the same set of processes, and the Phase I response sets of both n?ust intersect: Acks(m R--Hint(ver(r))) fl Acks(p', R--Hint(ver(p'))) # ?. Suppose p has higher rank than p'. Then J) can only get a majority response to its Phase I aiid Phase II messages if it does so hefi?re p'?1 If this `tvere the case, then then p? ?vonld have learned of sub( ? `c, x >) and have propagated it (c.f. l)e[eru?iiieJ3roposa]). Ho?ever, p' submitted ? t'?, . >, and v' ? t', so p must not have been successful for ? v, x >. 2 If ver(p) # ver(p'), then one of ?p or p' has version j and the other has version x --H 1 (Proposition 5.1.3), Suppose ver(p) x and let rc be the process from which p received com(? v, x Pecause it, too, proposed version .c, ro's h?cal version niust also be either --H 1 or x. Suppose ver(r0 ) x --H I, and consider the conditions under whi - - ji proposes < v', x (a) If Submissions??(x) is the singleton fv't or if Submissions?t(x) is null and ?`j? the singleton (n?gr ?. In either case, all members of Acks(p',R--Hint(r--H -!))flAcks(ri,sub(? v,r >)) 1it is Ilot difficult to see that ?vhen t'vo processes colilpete for fliajorities froiii all?ong tile san?e process pool one niust succeed in getting the 1-1-lajority before the other. Nioreover, Disconjiect C4ossip and !NITIAI?E() n?eaii that the process `vith higher rank can only get a n?ajority response if it does so before the lower?ranked process. n?nst have received p"s R--Hint( --H 1) l?efi?re ?? `s sub(? ?, >); if not., they woiild have forwarded ? ?, X > in their responses to P1 making Submissions?t (r) bivalent. N4oreover, because they received sub(? ?`, X ? fiom P', rarik(r0) ? rank(?1). C?ince Acks(?0, sub(? v, x >)) is a majority subset of GpView"'?1 a majority of GpView??' believe p1 faulty npon re- ceipt of sub( ? v, x >); p' could not have obtained the recluisite majority for its submission R--Hsub(? ,?:I, ?,. (1)) If Submissions?i(x) is bivalent then Submissions1g(;r) = tv1, ?t Suppose p1is the first' reconfigurer fc?r which Submissions?t(?) I = 2. Among the respoiises to R--Hint(x --H 1) indicating ? ?, x >. let vi be the member of least rank to propose < v, x >. Similarly, let v2 be the member of least rank that proposed ? v', x >. Ry assumption, both Submissionsv1 (r) and Submissionsv2 (v) are either null or a siughton. \`Vithout loss of generality, suppose ? v> is the proposal to remove fl?'gv (one of the two submissions must be). `then Submissionsri (v) is either empty or the singleton, ( ? --Hm']v . v >t and `I'2 must be the singleton f ? i'?, v > ?. `\Ve fl0\V sho?i' p' can deduce v? cannot have been successful hA' ? v', v > by considering the possible roles v? might have played. i. if v? were mgv we claim it could not have received a majority re- 5i)0fl5C from among Gpviewx?i; if it had, then vi would have learned of ? i-I, v > during its interrogation phase and thence propagated it. ii. So considei2 $ mgv . Since both v1 and v2 were ?`il?le to propose npdates, their Phase i respondents sets niiist intersect? and since the processes in that intersection received R--Hint( t) froiii ri it n?nst ?)C the case that rank( r1 ) ? rank( T'2 Fnrthern?ore? had r2 been able to commit ? v' y > its Phase II respondents Acks(v2,R--Hsub(? v', x ?)), wonid have had to to have been a i?Qjority snbset of GpView??'. In this way, if T2 had been successful for ? w (Acks(v2,R--Hsub(? v',x >))flAcks(vi,R--Hint(J --H 1))) # ?. Since Submissicnsr1 (x) does not contain ? v' x >, the processes in the intersection n?iist have received v1's R--Hint(v --H 1) before v2s R--Hsub( < x > ). But now, rank(vi ) < rank(v2) means that having received vi's R--Hint(v --H I ), no process will receive further niessages fron? v2 so v?'s R--Hsub( ? ``, x ?) could riot have reached the required in ajority. ??hus, p1 deduces v2 cannot have been successfiil for ? v1, x >, and p1 should decide to propagate ? v, j >. Exaniining procedure (iNet S[ah]ePvoposa], we see that this is indeed the case: ? v, x > is sub- niitted by the process ()f least rank, v1, among those reported in Submissions??(.v). NVe have arrived at a contradiction: Detei?nine- J?roposai and CetStal?JeI???oposa1 would not have resulted in p' sub- iTlitting ? v', x >, as hypothesized. ?Ve have just proven the claim for tlie case when ].)I is the first recon- hgnrer whose Phase I response set is bivalent. If j)''5 proposal reaches a niajority stibset then this value will be chosen by subsequent initia- tors (with bivalent response sets) in (?etSta/?1eProposa] Since now p?is the snbn?itter with least rank. If p 5 proposal does not reach a n?aiority subset, subsequent initiators will nonetheless reason as p' did and so also choose to propagate the correct val"e. t?inalty, recall that r() is the process fron? which p received com(? v, j-. >). \Ve have just proven the proposition for the case when ver(rc) r I . ?he last case t() consider is ver(vo) j-. But in tliis case we can apply the san?e argun?ent to the proce??, say ? from which ?o received com(? v, r >); this reduction and analysis can contiiine for only a finite nun?ber of steps befi)re we exhaust the nien?bers of GpView?? Proposition 5.1.9 proves that C1c - tah1eJ?roposa1 correctly chooses the only pro- posal for a given grou?) view that co aid have been coininitted invisibly to a recoufig- uration initiator when its 1?hase I respoilse set is bivalent. N?'hen the set is univalent or en?pty, it is not hai?d to see that I)etetrnineJ?i'oposa] is safe. Stabty-defined proposals (those submitted by successful initiators) are exactly the proposals that reconfigurers must view as possibly coiun? itted invisibl'\,?. Corollary 5.1.2 If Gpviewx?l is defined, there is at fl)ost one stably-defined pro- posal for group version r. Proof Proposition 5.1.9 proves that any reconfignrer reachiiig its proposal stage kno?vs exactly ???hich of the t?vo proposals fol' versioii x is riot stably-dehned. Procednres Detez'n?in??Proposa1 and G'etStah1eJN?oposa1 propagate the other one. If this iriitiator reaches its con?rnit stage, its proposal is stabty-dehned and identical to thc other stably defined proposals for version r. Theoren? 5.1.1 (Identical Local Views) lfGpviewr?l is defined, then allmen?- bers that stirvive to define local version x have identical k?cal rth vie?v?: A ((?ver(i?) = A (?ver(q) x)) p,q?GpView'?' (Localviewjr; = Localviewf;). Proof ?he resnlt follows fron? Corollary 5.1.2; no process con?n?its a local view for version .i that diIf??rs fioi? any other processes' versior) ? becanse all proposals that can possibly reach the cornrnitstage are identical. Note that Fheoreni 5.1.1 implies no ten-iporal constraints or) local views. nierely that if p ever defines an xth local view. and if q, too, ever defines an rth local view, then these two are identical. It does not require Localview;; and LocalView'q' to exist conten?poraneorisly. NVe need to do a bit rriore work to pn)ve ()??lP-:3. Remarks S-C?IP ensures that the state to which the systerri finally reconfigures represents the cu ni ulafive systeril progress. It accounts fi?r any previous npdates (and reconfignra- S7 tious) that inay have been only partially successful. and n?akes thei? stable, ?Vith respect to an interrupted coiriniit tlie gix?up vie?v is undefined until 501nC initiator succeeds in niulti casting its con?n?it message. To see that tlie new rngr is unique consider a process, p, that lias received an interrogate i?essage fi?om r. It believes faulty and disconnects from every process ranked higher than ?`. I?hus, p immediately begins to believe that r is tiie highest ranking non-faulty process. in particular, p will not receive subutit., conin'iit, or interrogate niessages fiorn processes ranked higher than i?. 5.2 S-GMP Correctness Proofs Theorem 5.2.1 S-GMP satisfies GMP-(). Proof Let j) be a f)?0(:C55 that queries the na?ie server and is told no NIRN? exists. J?ien LocaIView?(co) GpView??j(co) ?pt Uniqueness of GPVeW?pj(co) depends on race conditions arid assun?ptions on tiie nanie server. iJutil the XiRN?i reaches its critical n?ass and can be used to create a fluilt-tolerant nanie service, its reliance on the iian?e service niakes its uniqueness vulnerable at start-up. Theorern 5.2.2 S-GMP satisfies GMP-I. Proof A pn . j) executes r?fl?ovej?(q) only upon receipt of one of tiie fol- lowing: 1, M--Hcom(--Hq):M--Hsub(--Hq') frorri nigr ., in which case S--Hc?N1P gives cccv13(ni??r ,M--Hsub(?q)) H fa?iUyp(q) recvp(?ngr M--Hcom(--Hq) : M--Hsub(--Hq')) `rcn?ov(4?(q) Equation 5.2 holds irrespective of whether the submit message M--Hsub( --Hq) were the tail end of a contingent update or submitted independently of any accompanying commit message. 2. R--Hcom(--Hq):M--Hsub(q') from some reconfiguration initiator, r. In this c se rccv?(?, R--Hsub(--Hq)) fauit.4p(q) ??cv1(r, R--Hcom(--Hq)) ren?ovr;?(q). Analogous statements app?y to the events opr?rotingp(q) and add?(q). Fhe next t\vo theorems will make use of the following consistent cuts. I?et r, be the process responsible for completing the XlHX'I-wide commit of ? ix, ;i' >. iheorems 5.2.3 and 5.2.4 will show that Cx is the first cut (in the sense of th( partial ordering., ?, on cuts) along which GpVie?x exists. cx[pJ debues the p?con'ponent of Cx for p E Gpviewx Cx [?1 dJ i?cvp( Cx. com( ? Vx, X >)) f)oComn?it( ? ??x, J recvp(rx., com( ? ?`x, X J?o(loinmit(? ?`x, J ?) r(?;cvp(rx, com(? ?x, ? >)) DoComniit( ? ?`x, .C qnit? otherwise Theoren? 5.2.3 S-cMP satisfies GNiP-2. 89 Proof Recall GNiP-2: p\ V (GPViews(c) = /\ UNDEF'D(GpVI'ews!(c)), Sc Proc ?#S'#S th an&l define tlie inte?va1 of a rnn as follows: [cx,,crti) = I c1 ? c ? NVe prove tlie tlieorerri J?y induction over intervals Base Case Suppose exactly one process, p, is installed as the initial group view, GpviewU. Let S() = fp?. Uhen until p changes its local view (that is, along every c E [co, ci)), GpView50(c) = ?pJ = 5o J3y assun?ptiou, every other process, c?. that queries the nanie server during this interval is told the NIRNI exists, so LocalViewq(c) is undefined until q is allowed to join and this is 1)0 sooner than cl. C'aveat: The Base Case holds only when the NIRNI elicounters no race condi- tions at start-up. Inductive Hypothesis The staten?eiit of the theoreni holds during the interval [cr??i, c), with Si,--Hi the unique set satisf?'ing GpView5__ (c1?i) = S,--Hi = GpView??' and no other sel satisfying the equality along any cut in the interval. Inductive Step From Corollary 5.t.2 if GpView?' is uniquely defined, then there is a unique, stal?ly-defined T)roposal, ? i > for Gpviewx, Define S1 c Proc: i?hen, l?y defii?it ion = ? ?CGPV?W??(Cx) are unique. Further, if ? Vi, X > is the reuioval of processes fron? GpViewi?l 91 those processes' local vie?s, if they exist at all, are eqnal to Gpviewr'??l ?ince Gpviewr?--H[ contains all the n?en?)ers of S? GpView', GpViewv?(c) is undefined fi?r c E [c1,, cx+i) the local vie?vs of the functional nien?l?ers of GpView??1 ?? differ fion? the local vie?vs of the rnen?bers of Vx along Cx. In addition any process that queries the nan?e server during this interval cannot define a local vie?v until it is allowed to join the XIRN1 core, Finally, froiii thc definition of the partial order ? on cuts, U0ci [ci, ci+i) covers the entire asynclironous run. Theorern 5.2.4 S-GMP satisfies 1MP-:3. Proof Recall GXiP-3: t\ A?A (Cx V q mIN?LocALf3(q) ? UNDiF'T)( LocaIViewq?) IN-LocAL1f;(q) ? DOWNq V (LocalViewq Localviewpx) Let Cx be as defined above and let LocalViewp(cx) exist and be Localview;;. if q C LocalViewf;--H1 --H LocaIVI.ew1x; then q will not con?n?it any version LocaIViewq?t?.' for 0 ? ?; once q receives com(? Vx, X >) it learns ?"x is its own removal frorn Gpviewx--Hi and crashes. If q is in neither L0CaIVieWj???l nor Localview1x, theii LccalView'q' is undefined along Cx. 92 So suppose q C Localviewf;. If q crashed on or before Cx ?V(- are done. If q has not crashed the result follows frorn `J?heoi.eui 5.1.1 and the definition of Cx It ren?ains to prove Cx consistent. ?y way of contradiction suppose it is not? and let Cp and C ` be events violating consistency: c? ??i e??? e'p! E Cx. and C1) ? Cx. p and p' i?ust be differeift different processes., n?aking Cp a send event fiorri p to p and Cpl the corresponding receive event. ?ousider ?igures 5.4 and 5.5. a: j) ?x,P1 $ rx. ?he lIFO assnn?ption prechides this (Figure 5.4-a). b: ? $ ?x,P1 Tx. If Cx is inconsistent., then flillowing cycle ensues violating causality (Figure 5.4-b): sCiI(ipJ(P?R?com(? ii H r(c?(r?R--Hcom(? v, I ?)) cx[p] H SC ndp(j'/, n? CpJ ?CCVp?(P, fl1 SCTI(11)J (P R--Hcom( ? v, x c: P $ `rx, PI $ Tx. If p is a subsequent initiator, it n?ust have lower rank than Tx, making it iiT?possible for PI to receive rx 5 conin?it message (Figure 5.5-ci and c2). d: p $ vx, P' $ rx. FinalLy, if p' is a subsequent initiator. then p is responding to a n?essage froni it. Iii no case, however, will p receive the con?n?it message fron? rx; if p' has lower rank than Tx FAUL?'??pi(rx) holds, and if PI has higher rank than Tx P will fail upon receipt of 7?x ?? interrogate iliessage, and not receive its (?n?nIit message (Figure 5.5-Ct). I) I) = p R--Hcom(? ?`, (4) 6 I a: FIVO violation C1)I 61) J?: (?a'i?ality violat on i?ignr? 5.4: Proving Cx (?on?ist?nt - `the'oren? 7. 94 Theoren? 5.2.5 s-GMP satisfies GMP-1. Proof An outer process's local failnie and recovery beliefs are propagated by C?ossip in every ?nessage it sends. ??lins., an npdate initiator learns of, adopts., then acts on onter processes' beliefs. 5.3 Message Complexity l?he seqnence and tiniing of failnres affect the message complexity of S-G?iP. NVe qnantify the ?vorst and best case coniplexity for our protocol to install a new grotip view., as well as the gain achievable when we can take advantage of conipressing phases. Define ?r',d1f GpView? ` and Tx to be the nuniber of tolerable failures in GpView1: rr,d=(?? (?Ti;1 + 1). t?hen the worst case scenarios to install the (r + -I )st gron?? view occurs when there are Tx successive failed (or aborted) rec'onfigurations. ?his results in 0 `,` --H(Tx,) --H Tx () Gpviewx 9 , 9' niessages. i?ortunatety, this specific corIif?osition, ordering., and timing of faihires occurs with extreniely, low probability. 95 I?here are three s??-called best case scenario to install a vi r succeeds ilsing the standard two-phase S-G?IP; Tngr succeeds using co?ipressed S-G?1P' and one successful reconfignrer. The first case requires at i?ost 3(nx --H 1) n?essages; the second, at n?ost 2(nx --H 1); and the third, at most 5(nx --H 1). Finally, if we can take advantage of a seq'ience of con?pressed S-G?IP updates we save substantially in message complexity For exarnple, flj --H 1 successive singleton removals, none of which are mgr requires flx' I nx--HI (n1. --H 1) t 2 ? (nx --H y) (n1 --H 1) +2n?(n? --H 2) --H 2( Z y--H--H2 y=2 ?r2 --H 2?r --H 1 --H 1)2 messages, averaging to n? --H 1 messages per update. A standard two-phase algorithm would require an additional flx?Y --H 1 messages for the :qth removal, on the average. In each case, the number of actual failures may reduce the number messages. 96 cl: p' subs??quent initi??tor Ca?us??1ity violation Figurc 5.5: I)rovjng c1, Consistent -- Theorei? 5.2.4 contiijued. A ____ R--Hint(ver(p)) p cl: p subseqnent initiator ra?k(p) ? rank(?1,) - -____ R-int(ve )) c2: p snbsequent initiator rank(r?) ? rank(p) I) p p r1- I), =_R-int(y?(p)) Chapter 6 Protocol Optimality In this chapter, ?ve rephrase the Strong GNi l? nien?bership problen? as a kno?'ledge- theoretic, con?rnit-style probteni, viewilig any sointion to ?;xii3 as one of acquiring and propagating kno?vledge [11N19o]. This approach proved snperior to the previons, behavioral specihcation ?vith respect to fine-tuning the s-G?IP protocol. ?Ve sho? here how the episternic forn?utation niakes clear that any solution to Strong cN?NiP requires niajority corroboration, and where optin?izations to the s-c ?i p protocol are not possible. \Ve flirther show the knowledge-based formulation facilitates establishing a lower bound on the number of messages nee4(?d to solve Strong G?IP, construct a message-minimal solution, and finally quanti fjy the tradeoffs between the given, optimized solution and the niessage-mininial sohition. others have used a knowledge-based approach to analyze a variety of prob- lems in distributed computing ([flalS7] [flad9i], [NiF9()], [NIaz9?O], [H Fs5l). The work presented here differs both in the problem considered and in demonstrating the practical utility of ari episteI?ic fornuilation \`Ve give concrete evidence that such a forniulat ion greatly sin?plifies i?uildirig robust fast sollition5 to coriiri iit-style 1?robleriis. Remarks ??he iridistiuguisliabili ty of failures froii? coi?n?nnication delays seen? 5 to warrant using the doxastic ruodality [?\its9] to refer to local failure beliefs. However the (?oss4? and Isolation i)roperties result in a iTiucli stronger local interpretation than belief. in particular., once a process suspects anothei is faulty? it behaves as if it k??ows that process is fliulty. ??`?hereas the standard doxastic interpretation would give equal weight to belief iri faultiness and belief in non?faiiltiness? our rriodel favors one befi?r? the event faulty1?(q) and the other after. By closing its iricorriing channel frorn a suspected-faulty process, a process behaves as if the systeul were synchronous; as if it knew that a suspected process could not send further messages. Note that the forinulcT FAUUiY?(q) arid DO?VNq reflect this anibiguity. ?he forn?er is local to the suspecting process ?vhile the latter is local to no process. 1?orrriula ? is local to p if p alwQys knows whether it is true [C?NiS6] : I??? v I?j?mq' holds along all consist??nt cuts. In this way, the standard kno'vh?dge operator models exactly the behavior process pexhibits upoii executing [a?iltyp(q). ??hroughout this chapter, the staten?ent "p believes (or knows) q is faulty" should be taken as an artifact of our niodel`5 behavioral requirerrients riot ii terally. In contrast (?handra and foneg impart a pure doxastic interpretation to local faillire beliefs [(;?T9i 1. In their work, despite belief in a process's faultiness, all of 99 its n?essages niust be delivered, as n?ust all niessages to it. I?he difference could also be tern?ed one of optin-lisni versus pessin-iisni. F?inally ?ve note that s-c;NiF is a fnii-?nfo?ination protocol; all relevant state iriforniation is sent with each protocol niessage. In Strong (41N4P, relevant infornia- tioii are local beliefs about faillires, arid during reconfiguration, a process's local view and the vahie of a pending conin?it (if it exists). outer processes can infer an iriitiator's respondents from its failure beliefs. 6.1 Reformulating Strong GMP Strong GNIP, as a coniruit probleni, requires functional core n?en?bers in a given group vie?v to vote on a proposed update to their local views, and cornn?it the update when particular conditions are u?et. Let ?" be a SCt of values, one of which the processes in the (r --H 1 )st group tI? view niust con?niit to install the x group view : ? C (f-.+) ? 2Proc), Before conin?itting a given update, a process n??y V0t(r for n?ore than one update vahie. As in Aton?ic ()ornrriit, in rio execution are votes for any version pre-determined. `I'he notation ? ?`, X > refers to a particular value, ?`, and version of the group vie?v fr?r which r is submitted arid/or conirnitted. The san?e vallie niay be siibi?itted for different versions. i?he following an foru?ul?, events, arid notation used to rephrase Strong GNIP as a coniniit probleni. NNQ say a fornuila y' is local to a process p if p always knows whether ? is true; that is A:p6: V A???? i()O al?v?ys holds. o+ VOTL'J)( ? .r >) holds along consistent cnt c if voirp( ? v, x >) is the most- recent votir?g event p ??4th li?cal vie?v LocaIViewf;?' , executed. o+ (:O?INIIT1?( ? `, >) holds if ]? executed the coi?niit event con?n?it1?(? v, x > to form its xth local vie?v. 4he forniul? FAUUfYp(q), V()iEp(? v, x >), and CO?INIITp(? ??, J are local to p. o+ Maj(?') is the set of all majority subsets of 8, and Sizcof-A!aj( 8) (or sonietimes $L(8)) is the size of the smallest niQjority subset of 8 8?zeofrAhj(8) ji(8) ??;II + l. o+ The episteriiic operator k?r distribntcd knou'icd?e of a forrnula, ?/), an?ong a set of processes, 8, captures the riotioll that if the processes of 8 pooled their local knowledge together, they would know `?`. Vornially, let (T)p(c) be the set of all the fori?ula p knows along c c ? kpQ?? then c ? ?s? iff c k ?i)? ?p(c)) ? o+ STAI3LEN?oiE5(? ?`, j >) holds when 8 is a (non-null) subset of Gpviewx?I each process of which has most-recently voted for ? v, x > and will iiot vote l()t for a different ? ? Nioreover every proces ? is (distribntedVy) known fanity hy tile group S SiAI3LEVoTEs(? t,T A (VOiEp(? t, >) A A EmVOTEp(? ?`,T A A D?FAuLTY(q) pES q?S ()J)serve that the set S must have at least one functioning process in order for STABLEN7oiLs(? v, x ?) to hold. \?Te on?it the snbscript when we are not concerned with the particular set according to which ?> is stable. Conin?it proble?ns are generally characterized by two safety conditions: 1?%iiidify describes the voting conditions (e.g. nui?her of processes voting) that n?nst exist when a value is coniinitted, and tTniqnenes? describes the coininit conditions (e.g. whether another vallie can have been coriin?itted) that nuist exist. For Strong CMP, N&Aidity is deterniined by vote stability, and ?;niqueness is absohite. Strong C?lNIP also has a Propagation condition which ft?rces all core rnenibers to attempt to con?'iiit the value. Validity If p con?nii , then p is iii a subset of Gpviewr?l according to which ?` is stabl( for r COMMITp(? v. ;r >) ? sccpv$ew'--H' (? c A STABLEVoTEs(? ?,X Uniqueness If p coniinits ? v, y ?, 110 other process, q. ever coniniits ? ??, ?, for i?' ? v COMMYi?p(? `??, ?` >) ? A A ?mCOMMFi?q(? ?)?. ?`#vq?Proc 102 Eventual Propagation If p con?n?its ? v, r >, then every other process q either 1) is not in Gpviewr?, or 2) eventnally con?niits ? ?`, X ?, or 3) eveiftnaliy fails: COMMVfp(? v, x qE koc (oui???(q)) v (?ccMMITq(? ?`. v (?DowNq) Froi? the discnssions in Clhapter 7?, it is easy to see that lJniqneness cannot be gnaranteed ?vithont additional restrictions on stability Definition ? > is co?nn?ittabiy stflbie ( c-st?ib1e) if and only if ? v, j > is stable ?vith respect to a majority siibset of GpView?-?' ?he formula C-STABLE(? v, x. holds exactly ?vhen ? v x > is c-stable C-STAI3LE(? v,?r >)d=ef v (STAbLEVoTis(? ?,x scMaj(GpView'?1) tJniqneness and \Jalidity then con?bine r? state the latter as CO?INIITjj(? v,x >) ? scMaj(G$iiewr?i $) c A C-SiABLEs(? v,? 6.2 Knowledge Analyses `These analyses use two notions of n?ininiality : a conimit protocol is knowiedg?- n??n???ai if procx?'sses coriin?it a value as soon as they kno?v it is safe to do so; it is fli(S?f??C?fl??fl?Jt?(1i if it is in?possible to con?n?it a vahie safety in fewer niessages 103 In this section, we show that parts of the S-G?IP protocol are knowledge-n?inirnal, and how the knowledge-based fi?rmnlation of Strong C?NII? led to optin?izations in the other parts. \Ve also derive a lower bound on the niin?ber of niessages required to solve Strong C;NIl?, and con?pare s-cNlP with a inessage-optin?al protocol. ?he optimizations tos-G?i P are theni selves n?inii?al in both senses. for these pnrposes, the niost important aspect of s-G?iP's correctuess is that at most one value attains c-stability fi?r any given version, from which it follows that a core member may safciy commit ? v, r ? as 500fl5 as ? v, x > becomes ?-stable. On the other hand, the following derivation from Validity shows that every solution to Strong CNlI? necessarily satisfies COMMFfp(? v,r >) ? kpC-STABLE(? x 1. Restating Validity: COMMITp(? v,x ?) ? V c A C-STAI3LEs(? ?, x s?Maj(GpView??') 2. Applying the I?iiolwedge C4eneralization Inference Rnh, we obtain: A%coN1MIT?(? v, x V (I) c A C-STABLEs(? v,? scMaj(GpView"?1) 3. Recause CONINIITp(? v, r >) is lo?al to p, CO?1?iITp(? v, x >) is equivalent to A)?CONI?IFi?1)(? v, >). Substituting gives: COMMflp(? v,x >) ? kpC-STAI3LE5(? v,? >). I (6.1) 104 Forniula 6.1 states ?vliat must i?e true ?vheri a process cornniits a value. In this regard., ttie ?earliest? a process rnay safi?ly conin?it a valrie is as soon as it knows that vallie is c-stable: k%C-SiABLE(? v,r >) ? CcMMYTp(4 ?`,x (6.2) \Ve call a con?niit protocol in which processes coi?u?it upon attaining a single ?level' of knowledge of c-stability., a t [?-con?u?it protocol. In what follows \ve use com(i?) for any con?n?it n?essage (M--Hcom(v) or R--Hcom(t)), arid sub(v) for ally sui?rnit message. Definition I?et p' send (or n?ulticast) n?essage Tfl? and let ack(n?) denote a i?essage sent l?y a recipient of n?. back to p' acknowledging receipt of rn. Define o+ Recipients(?i in) %L? (? RECVp(p'. rn)? o+ AcksSent(p'.,n'i) cfficf (pi SENDp(p',ack(m))? and o+ AcksRcvd(p', dLf (? I?E(?VpJ(i). ack(ni))? Process faihires and n?essage asyncli1'onN, render Ackskcvd(p'. n?) C AcksSent(?/, ni) C Recipients(p'., ni)., for any p and ni. ??his is significant in the next definition., and in the propositions that follow. IFo describe the niost general observable systen? state frorn which it can be inferred that a value is c-stable, we use these process sets to define `snccessful initiator' `vith the followi rig fonnula: 105 Definition Process p1is successful for ? t', x ? if and only if a niajority subset of GpViewx--H 1 acknowledge p' `s s'ibn?ission: SUCCE?SFULpi(? r, x >) %d AcksSent(p', sub(?')) c Maj(GpView"-'?'). (6.3) Equation 6.3 captures exactly the problems posed by invisible con?niits. it leaves open whether an initiator actually received any of the acknowledgen?ents sent to it, which detern?ines whether the initiator were in fact able to con?n?it the update. flowever, in the absence of concrete evidence, it is in?possible for a subsequent re- configurer to kno?v whether a previous initiator succeeded in corni?itting the update anywhere. As in the behavioral specification of Strong ()NiP, Episternic Uniqueness and Eventual Propagation force a recontigurer to assume an update was con?rnitted if it detern?ines the update could possibly have been conirnitted, or, equivalently, if it does not know the previous initiator was iiot successftil. ?hus for initiators r and r', r' propagates r's subn?ission ? c, x > exactly when 7 FULr(? t,X 6.2.1 Facts from Correctness `,Ve restate the fi?llowing facts about s-GNIP, proven in Chapter 5 Fact 6.2.1 In s-c?ip. if r is successful for ? ?`, X ?, no value unequal to v is there- after subn?itted for version x: SitCCES?FULr(? v, x A A ?mSENDpi(p, sub(? ??, J v?# p1?j) 106 Fact 6.2.2 In s-GNlP.if r is successfiil for ? v, j' >, then ? ?`, > is c-stable: SUCCLSSFULr(? vi>) ? C-STAI3LE(? v,x Fact 6.2.3 In S-GMP. if VOTEp(? v,:r; >) holds for a T?Qjority of Gpviewx?l along any consistent c'it, then C-Sii?BLE( ? ?`, i >) holds along that cnt: (A VOTip(? v,x >)) C-Sw?BLE(? i,i' s?Maj(GpView?"?') pcS 6.2.2 Knowing Stability ?o nnderstand when a process execnting S-GNIP knows a value is c-stal)le, we analyze the protocol.'5 cori?n?'in1cation phases. NV'e show two con?miinication phases are necessary when the initiator is n?g? . \Ve begin these analyses by considering only those instances of ?-cNiP when a given process is either the first n?gr or has been n?gr. for at least one conipleted npdate of the systeni view; the issnes arising in the transition of a process fron? reconhguration initiator to Tnq'r are niore cornplex and we disciiss then? later. \Ve also show that in foiir of the five possible global states that i?ay exist at the start of i.econfigiiration (i.. the degree and type of h?cal view divergence), processes learn c-stability earlier than when they coinniit in s--HGNIP, allowing us, in two cases, to eliruin ate a full phase of cor?rnunication in the reconfiguration algorithn?, ii) annother case to eliniinate the subsequent submit niessage once the reconfigurer becomes `rflgr and to preclude a fourth situation entirely. N?Ve show it is irripossible to improve the fifth. 107 Definition Let IsNii'(.?R(p'., r) hold if a niajority of Gpview?I believe pi is the highest-ranked nori-fanlty process. `lihen version ? has a ci?a'n starting point if I or if I?N'IGR(p' i' 1) held thronghont the formation of Gpviewx?l. 1 i3y inspecting S-GMP, we obtain: Fact 6.2.4 In S-GNIP. if x has a ch?an starting point, and if rngr snbrnits ? v, x >, then rio process in GpView"?' has previonsly voted for ? i', >. The follo?ving theorern shows that, in s-GNIP, nigr do(?s not learn c-stability of a vahie before the end of the first phase. In partic'ilar, it shows that flig? learns ? v, x > is c-stable exactly "hen it learns it is siiccessfnl for ? `v, r >. Theorern 6.2.1 In s-GMP, A%?9? C-STABLE(? v,x ?) ? AcksRcvd(mgr,M-sub(?)) E Maj(Gpview?l) uhich is equivalent to A4?gr C-STAI3LE(? v,x >) ? A?'mgi?SUCCESSFULmgr(? ?,X >). Proof ??" NVe show Amgi voTE?(? v, :r ? p E AcksRcvd(mgv ,M--Hsub(v)), by inspecting s-G?iP. First, we show that rngr learns ?`OTEp(? v, > ) npon receipt of p's a?knowledgenient to M--Hsub( e). arid then that nigr cannot learn VOTEp(? `I', r >) nnless it receives p,s acknowkdgernent. I3nilding on [(?Ni?61, Niazer ?NIaz9O] showed that if p learns Y'q, a forn?nla h?cal to q, and if processes a) are asynchronoiis, or b) can experience crash faihires arid can only be made trne by q, then p rniist receive a i?essage fron? (or indirectly los frorn) q that irnplies ?q VOTEq(< ??, X >) is snch a forrnula and our systern is both asynchronons and subject to process faihires .?ince has a clean starting point, no process in Gpviewx?I had voted for ? ??, X > before receiving M--Hsub(v) (Fact 6,2.4) Inspecting s-cNIP shows tliat a process votes after receiving M--Hsub(?') and before responding to it, and since VOiEp(? v, x >) has itot held previously, it holds for the first tirne in the execution intirnediatelybefore p acknowledges M--Hsub(v). Thus, nigT' can infer VOTEp(? `t, x upon receipt of ])`5 response to M--Hsub( v). Finally, ?ngr? cannot learn VOTEp(? v, r >) frorn sorne other process. Outer processes do not send rnessages to one another1, 50 Ti0 other process can have learned VOTEp( ? ?", X ?) independently, and propagated that information to rngr As a result, rngr canitot li?arn VOTEp(? `v ? >) frorn a process other than p. only rnessage p sends after voting is ack(M--Hsub(v)), so Arn?jr VOTL1)(? v, Q' >) ? fi E AcksRcvd(rng? , M-sub(v)). Let VOTEs(? v,x >) denote Ap?sVOTEp(? v,x >). Then, A%ngrC-STAI3LL(? v,x >) ? v s?Maj(Gpviewx?l) ]\?mgi'C-STABLEs(? i',r A4?gi' vOTLs(? v,r ?) ? ? C AcksRcvd(rngr,M--Hsub(v)) ? AcksRcvd(rng? , M-sub(v)) E Maj(GpView'??'). Outer processes do not send n?essages to each other except during recoufiguratioii. In that case if p has sent any niessage to q (?vhether p is initiating or responding) q ?viII not send a message to n?gr n?gr has been isolated 109 ""?? Since AcksRcvd( mgr M--Hsub('i))C? AcksSent(n?gi M--Hsub(?))., n?g? knows it is successf'il for ? t, :i >. I;sing Fact 6.2.2., A%ngr C-STABLE( ? I?heoreni 6.2. I shows that whenever x has a clean starting point n?gr cannot know c-stability of any value until after collecting responses. NVe use this in ??heo- rein 6.2.2 to show that s-c?iP, during a nij?r -initiated update begun froi?i a clean starting point is knowledge-r?inin?al. Theoren? 6.2.2 (Two-Phase Necessity) \?\hen version r has a cli?an starting point, S-GNIP is knowJedge-n?ininial. J?hat is, 1. sEND?(mqr ack(M-sub(v))) ? ??A%C-STABLE(? ?, X ). RECV1j(mg??,M?com(v)) ? A)3C-STAI3LL(? ,i),.T >). Proof Since x has a clean starting point, rio process voted for ? v, > before receiving M--Hsub( i;). `1?heoreui 6.2. 1 and inspecting the protocol show that ????gr does not know C-SiABLE(? v, j >) until after it has sent all M--Hsub(v) iriessages2 there- fore, M--Hsub(v) cannot have implied C-STABLE(? v, x >). Furthermore, neither p 5 internal voting event nor its acknowledgement add to its knowledge [C?NIs6] establishing (1). Again, except duri rig reconfiguratiori outer processes do not communicate an?ong themselves so 1) cann?)t have learned C-STABLE( ? v, r ?) independentfly or from another outer process since receiving M--Hsub(v). Hecause p does receive mgr `s 2indeed uiider ifie assiiiiiptions ($-STABLE(? v, x >) does riot hold until a n?ajority have received the M--Hsub( c) iliessage. This need oot be the case when x does not have a clean starting point. M--Hcom(?') n?essagC, it has neither sent i?or received an R--Hint() n?essage. In conse- qncnce. the earli??st p can learn C-STi?I3LE?( ? v., ? >) is npon receipt of fl-?gT `? second phase (con?n?it) niessage. Since S-GNlP is fnll-inforn?ation. C-STABLE(? `?`. r >) is propagated by M-com(?). 6.2.3 Optimizations to S-GMP I?his section presents optiTnizations n?ade po?sible by a kno?vledge-theoretic analysis of the degree of local view inconsisten?y that ?nay exist once n?gv is believed fanhy. Fact 6.2.5 I?et r be a reconfignration initiator ?vith local version --H I. Fhen in S-G?iP., ?vhen collectiiig interrogation responses exactly one of the followiiig three scenarios is possible: (a) v learns son?e process has local version x; v learns some process has li?cal version j- --H--H 2; (c) all processes fron? which v receives responses have local v In the first ii?stance v learns C-STAI3LE(? v x >) at the end of reconfignra- tion Phase 1. ??? optin-?ize S-(?.AiP v., instead of proposing ? v., x > and collecting responses., ?x?n?n-iits ? v., x > and in?n?ediately pix)pagates C-STABLL(? v., ? >) by n?nkicasting R--Hcom(??) to any process ?vhose acknowledgei?ent also indicated local version v --H 1. I3y orilitting the second phase I?essage, R--Hsub(v), the optIn?1zation saves approxir?ately 2 x GpView"?1 n'?essages. Ill mg? M-com(? ?,y- >);M-sub(? v'. r +1 >) ----a R--Hint(x --H 1) krC-STAI3LE(? c,r >) A A%C-STABLE(? v!,x + 1 Vigure 6.1: 1 )ouble (?-on-inii t Optin?izations Another obvions optinlization precludes case (b). A process ?vith local version --H 2, upon receiving R--Hint(? v', j- --H I >), learns C-STABLE(? v', x --H -I >); this process can safely coini?it that update, then respond to the recorifigurer with its ne?v version. The third optirnization is ri-lore subtle. lii (c), if a n-laiority of GpView'-?1 (among the interrogation resp??ndents) also indi (--ate having n?ost-recently voted for the san?e value, r learns (through Fact 6.2.3) that value is c-stable. As in the first optin?iza- tion, reconfiguration l?hase IT is unnecessary. Finally, ?vith the third optimization in n?ind, consider case (a) again. Suppose, as in Figure 6.2.3, that an?ong AcksRcvd(ir?, R--Hint(x --H 1)), the respondents indicat- ing local version .r coristitute a niajority subset of GpView' Gpviewx??l I v. If such a majority subs??t also indicates having iriost recently voted for souie vahie, 112 i; t 1 >, then r additionally learns C-STABLE(? ?`rti X + 1 >) (Fact 6.2.3). Reconfig'irer r can iri)niediately niulticast a (10 UbiC coin init niessage, R--Hcom(i):M--Hcom(vr+i). Proposition 6.2.1 (Optin?ization 1) If i is a reconfigiiration ini?ia?or with local version i --H 1' and some process's response to R--Hint(? --H ) indicates li?ca1 version ;i, then r learns C-STABLE(? v. r >) upon receipt of ack(R--Hint( --H I)) from that process. Proof If sot?e ]`) sends an ack(R--Hint(x --H 1)) indicating ver(p) r' then ArCON1?IITp(? `c, :r >). Distribnting A'7 over implication in \`alidity gives the de- sired A4C-STABLE(? v,r >). Proposition 6.2.2 (Optii?ization 2) if i is a reconfignration initiator with li?cal version .r --H 1. then a process with local version i. --H 2 learns C-STABL(? `v' i --H I >) upon receiving R--Hint(x --H 1) from r. Proof Similar to Proposition 6.2.1 Proposition 6.2.3 (Optii?ization 3) Let r be a reconfiguration initiator with local version r --H 1, aiid suppose all processes responding to R--Hint(.v --H 1) indicate local version x--H1. If some subset of AcksRcvd(r.,R--Hint(? --H I)) are a majority subset of GpView??' and report identical (non-null) NextUpdate valties, then r learns that vallie is c-stahle. 11:3 Proof Follo\vs in?niediately from ?act 6.2.3. Proposition 6.2.4 (Optin?ization 4) Let i be a reconfignration initiator with local version j' --H I. Let 8 be the subset of AcksRcvd(r R--Hint(:i --H 1)) reporting local version x, and suppose Localviewsx is LocaIView'?'??' altered by t; Loca?Viewfs LocaIViewr??l ? . Let v1 and v2 (both possibly null) be the values the n?en?bers of 8 report having i?ost recently voted fi?r to forni Gpviewr+l, with 8i the subset of 8 that voted for ?`1 and 82 the subset of 8 that voted foi?. ?hen for 1 ? ? 5% c Maj(LocaiViewf) ? kTC-STABLE(? t? t 1 >). Proof i)ropositioi 6.2.4 shows r learns C-STABLE(? ??r >) at the end of Phase I. Upon (x)n?n?itting v to forni LocalViewf and then determining whether either of the 8j constitute a i?%jority subset of LocaIView'? apptying Fact 6.2.:3 gives the final result. `I?he next ti1eoren? shows SC ?IP as originally presented. is knowledge-ininiinal (i.e. cannot be optin?ized) when all respondents to R--Hint(r --H 1) report tlie san?e local version, but no majority subset has niost-recently voted for the same value. Theorei? 6.2.3 (Three-Phase Necessity) Let i' be a reconfiguration initiator with h?cal version ;r --H 1, and suppose all processes in AcksRcvd(r, R--Hint(r --H 1)) indicate h?cal version w --H 1. If, ai?ong AcksRcvd(r,R--HInt(r --H ?)), no niajority subset of GpView???1 further indicate having n?ost-recently voted for the sarne value, then for all ? > and p C AcksRcvd(r, R--Hint(r --H 1)), 114 I sEND?(r, ack(R--Hsub(??))) ? v, x ). Iticvp(r,R--Hcorn('?;)) ? kpC?STAI3LL(? ,??r Proof ?)poii initiating reconfiguration, r does not know whether any value is c-stable for version it; it has not received M--Hcom(?) froni the previons n?gv has not learned C-STABLL( ? v, x >) fion? a previous reconfigurer., and has not received messages fi'on? non-initiator processes. Its interrogation cannot imply C-STABLE(? v, .r. >), and clearly p?s respoiise to v does not acid to p's kno?4edge. Since rio majority of v's respondents voted fc?r the same value, v cannot distin- guish at the erid of Piiase I which, if any, of the reported pen di rig valiies may be c-stable; v can envision scenarios in which each of the reported values are ?:-stable. S-G?IP correctuess onty ensures that if a value ?s c-stabli?, v will propose it; i.e., if ? v > and ? > are vahies v's respondents report pending, correctuess and AT?C-STABLE(? v', :r ?) do riot imply ? ?), ? > is c-stabli, so R--Hsub(v) cannot imply C-STAT3LE(? 1), v >). Since outer processes have not sent niessages among themselves, none can have learned c-stability before receiving R--Hsub(v), anci neither p,s vote rior its response add to its knowledge.3 As in ?4ieorem 6.2.2, r learns C-STAJ3LE(? v,r >) if AcksRcvd(v,R--Hsub(r)) is a majority subset ?)f GpView2??' which outer processes li?arn from v's commit message. r proposing ? ?: x > aiid aii outer process ei? iiig tile proposal does not guarantee ? x > ?ill beconie c-stable. `f'vo failure scenario? iti ti second and third phases of recon- figoration (r failing before sending all proposal n?essages or a iriajority failing before receiving aiid/or votilig for the pir?posal) result in ? v., > not becoming c?stabIe. ??oreover., both r and the outer processes envision these scenarios. 1t5 6.3 Optimized S-GMP ()ptin?ized s-CNIP alters S-CNIP by the four optilnizations just outlined. I?liese affect onty the Recorilignrer task an d its subroutine? DeterinineI?roposa]. Task:Optin'?izedRec??nfigui.ationInitiator,r;ver(??)x--H m u1ticast? ( LocaIVIew? R--Hint( :i: --H forall p C LocaIView? rcc?y(p, NextUpdate? : LastCommit?) or fciuity?(p); if (i?ajority of LocaIView? didn't respond) qUif?; Detern?ineI?roposal( c4 , ver., v2); if (()p?iu?izedFlag) /" Omit Phase It. ?/ [)oPrecornn?it( `i?i ); DoCornrnit(?1); I?oPrecon'in?it( ??2); if (Dou')hCornn?itFlag) /? Comn?it ? `i?2, v?rt!>., too. */ 1)oCIon?n?it('i?2); n?u/[ic??st?(LocaIViewr?R?com(? At ?6r : M--Hcom(? v2? i?rt1>)); ? t ??, i else n?%t/!iai.s!?(LocaIViewr,R?com(? ?)1. ?)(`? : M--Hsiib(?` 1>)); rngr ? I t'o+, v2; goto Exit; liG ()ptii?izcdR?configurer,Continued. /` No optirnization possible. Niust do 3-Phase. *! else 1)o1?reCon?ii?it(?.'i ); fl?Uit?C(i.Str( LocaIView? ., R--Hsub(? vi., ??cr forail p C LocaIView? r(cvr(p, ack(R--Hsub(? ?1 v?r or faulty?(p); if (i??jority of LocaIVew? didn't respond) quitr; DoCoriin?it(?l); n?vit?castr(LocaIViewr ,R--Hcorn(? vi' v?r >) : M--Hsub(? e2, ver + f >)); rnar vi r, v2; L?bel: Exit ?egin inyr Task; t17 /? ()pti"iiized J)eteriiiinePi?posa1 for reconfigurer r /? ver(r) = r --H I. ?/ Procedure:Deteru?ine !?roposa1(0 1?proposal,0 U?Uversion.,0 IJ?invisible); Ahead? = f[Ids, ver(p)]p I ver(p) = I Behind?# ? is no longer possibli-! ?/ SubAhead? = ?[Ids., ver(p) + 1 rank(in'it)lp ver(p) = r = ver(r) + -I ? SubCurrentr = f [Ids., ver(p) + f? rank(?n?f)]p ver(p) = --H I = ver(r)t version = if (Aheadr# ?) set 0ptin?izedFla? proposal = Ahead?; Let LocaIView?; = Localviewr;--Hl I [pu)i?osalt; (??etStabli?Propos?4( invisible, SubAheadr); NextUpdate? == irivisible? E Maj(LocalView??) DoubleCon?i'nitFlag = return ( ); /? t)etern'iinePr???osal ?ontin? on next page 1 licS Optin)izedDcteri? ine Proposa1?,continued. I All respondents report the san?e local version j --H t. -Y/ if (SubCurrent? == proposal = [--HTn9T' x rank(r)] CetVpdate(invisible); Optin?ized?tag, f)onble(?onin?itFlag = 0, 0; return(); if (SubCurrent? is a siiigleton) proposal = SubCurrent?; Optin?ized?lag = NextUpdate? == proposal? E Maj(LocaIviewrr?l) GetUpdate(invisil?le); DonbleCon?i?itFlag = 0; return (); I SubCurrentr has (??xactly) t\Vo elen?ents. / C?etStableProposal(proposal, SubCurrent?); NextUpdate? == proposalt c Maj(Loca?Viewf?1) OptirnizedYlag = C41etI??pdate(invisii?h?); I)ouJ?leC?oirn?niit?lag = 0; return(); 1t9 6.4 Message-Minimality \Ve use Niazer's Niessage C?liain `1?lieoren? [NIaz9()] to detern?ine the n?inin?al nun?ber of n?essages required by any solution to Strong GN1t?. Similar ?vork fi?r difibrent con?n?it-style problems is in [Had(.)l] and [NiF?O] bbr sin?plicity, we assui?e no process has knowledge of another's votes Recall that Si??:oF?[aj(8) is the size of the smallest majority subet of S' NVe show that, for a given set 5?, at least 2(S?z?:of-AJaj(?) --H I) messages are necessary for any member of .$? to l(?arn c-stability of a value, and that any algorithm using fewer than (I ? --H 1) t 2(S?zeojAfaj(S) --H 1) messages does not satis% Eventual Propagation. A process can safely commit a value as soon as it knows it to be c-stable. and we know that any sohition to Strong C?Nii? is iiecessarily a 4 A'-commit protocol. I?et C E Maj(S) and let C-STABLLG(? v.Y ?) hold. Then A)3tC-STAI3Li(? t,w ?) if and only if J\ (l?1)1VCiip(? ?`, :r p?G p\ A !`)?t(E]mvoTEp(? ?`?,X >)) A /\ A?%?!)cFAi;LTY(q). p?G t"#v q?C Since VOTLp(? i., x >) is local to p, p' must receive (at least) one message before A))IVOTEp( ? c, .i >) can hold, so at least Sizcof-'?J?j(S) --H I messages are needed to satisfy the first conjulict. Vor the secoiid conjunct, nothiiig in the specification of Strong ()1NIP ensures an initiator that an outer process's vote for ? ?`, r >is stable. ()ne solution is ?n?ti(I/ stability - each process votes for one and only one value - but then there are many initial configurations in which no majority concurs on 120 a specific value4. Alternatively? if ?ve allo?v processes to vote for riiore than one value over tirue, p' must sor?eho?v learn stability. It is clear that processes cannot independently ( i.c. `Aithout communicating) decide when to stabilize a vote, as this degenerates to initial stability. ?herefi?re the decision to stabilize a vote must be coordinated to ensure that a n?ajority stabilize a particular value, and this requires at least Sizeof-A!aj(5?) --H I more messages, giving a total of 2(SizeofAJaj(S) --H I) 1?essages before p1 cal I commit. There are two possible patterns of con?n?unication to achieve this: o+ can passively collect votes until a single value has been voted fc?r by a ruQiority. At this point it sends a stabilize' message to each process in the n?ajority sui)set. Note that p' cannot con?n?it here since it does riot yet know whether each outer process has stabilized5. This requires an additional Sizcof-AJ(%j(5?) --H 1 messages, totaling 3(Sizcof-M?y(S) --H I) messages, before p' can con?n?it. o+ P1 can choose a value and actively propagate it to a inQiority subset. ?lpon receipt of the vahie, each process votes and stabilizes, then sends confir- n?ation of these actions back to p1. This approach requires a ??iniinun? of 2(SizeojAi?U(?) --H I) n?essages fi)r P1 to ham c-stability. Finally, Eveiitual I)ropagation requires that once P1 con?rnits a value, every flinctional process nuist also cornniit that value. If every functional process un- dertakes to learn c-stability in the fashion outlined above, a niinin?uin of (I S --H 4?his is not trt?e fi?r V j? 2 5A tecl'iiicality I)0?'., but in?portant in pra(tice ?vhei reconfiguratioii lilay be ongoing. 121 x 2(S?zeof-AJaj ?) --H 1) total n?essages au needed (where f is the number of crashed processes). On the other hand, if ?1, having learned c-stability "the hard way propagates this fact, only an additional 8 --H 1 n?essages are needed6, giving a rriinin?al total of (I 8 I --H 1) + 2(SVz?of-?J(U(S) --H I) messages. In S--HCMP, installing version r fron? a cli?an starting point and assnrning ??2gi is not thought faulty, uses at n?ost 3( GpView1'?1 --H 1) --H f i?essages. The extra n?essages (approxin-iately GpvIewr? I --H --H 1) represent the tradeoff between con?pressing n?gr `s faultiriess determinatioris to one phase, arid spacing theni out over, in the worst case, 8l'zcofAjuj(Gpviewx?l ) + 1 waiting periods. This case would occur when p1 chooses the worst possible sequence of processes froiii which to get responses: if, froi? the initial niajority subset it chooses, P1 receives only 5?zcof-Aiaj( Gpviewx?I --H `2 responses, ttien observes ? IGpvie2wr?--Hl1 --H 2 successive ?failures' before getting the last response. 6.5 Versions Without Clean Starting Points Vinally we co - sider the case when a version does not have a clean starting point. Recall that version :c. with p' as n?gr , has a ckan starting point exactly when x 1 or Is?iGR?(P1, .r --H 1) held throughout the ft?rn?ation of GpView"-'?1. By definition then, version J' does iiot have a clean starting point exactly when reconfiguration were involved ill installing GpView??1. Fo conclirde this section, we show optiriiized s-c?iP is knowledge n?inin?al irrespective ol its starting point. 6The lack off here ari?es froo? the iiiipossihi!iiy of p' knowing whether another process is truly crashed. 122 Corollary 6.5.1 In ()ptirnized S-GMP reconfigurer r only proposes and comn?its version ver('r) + i . If the l)ont)le C?on-in-?it optirnization ??ere applicai?-le. r also coiumits ver(r) t 2. Proof Froni Proposition 6.2.2, ()ptiniization 2 ensures Behind? is null, so r does not propose version ver(r) (or anv version less than ver(r)). It also cannot propose version ver(r) + 3, as every process with local version ver(r) + 2 believes r faulty befc?re it can receive ?.`s R--Hint(x --H I) Suppose GpView"?'were installed via reconfiguration. l?et r be th( initiator that completed it. ?"??1 be the vakie con?rnitted in forr?ing Gpview?I . and ?`?- be the value r, as the new m?gr , submits and comn?its fc?r GpViewx. \Ve prove no member of GpView'?' knows C-STABLE(? ???, ? >) befi?re it receives i'?s Phase II message, N-com(? Vr,,Y >). Vrom Corollary 6.5.1, at the outset of its reconfiguration attenipt, ver(r) = r --H 2. Fhe responses r' receives to R--Hint(x --H 2) fall iiito one of the foflowiug classes: State Unity Nll p responding to R--Hint(j --H 2) reported LastCommit? = [V--H2, x --H 2], and either NextUpdate? = [v, x--H I, rank()] for some value v, or NextUpdate? = [--Hnigr , :i-' --H 1, rank()]. Reconfigurer r determines ?`x = (v D--Hn7qv ) accord- ing to the minimal value among the rank()s reported, or if Optiniization 3 (Proposition 6.2.3) is applicableby n?ajority reporting. State Divergence Some subset of LocaIView'r'?2 reported LastCommit = [vy?,--Hi, x--H 1], and p1 is the core member with least rank among those reported to v to 12:3 have n?ade a snt?i?ission in Submissionsr(?r). If r detects State I?nity at the end of reconfiguration Phase I? it eveutnally muD ticasts R--Hcom(? v1?i ;t-' f >) resulting in GpView'?1 follo?ved by (piggybacked or not) M--Hsub(<: t-1 x >). its proposal fc?r GpView" Theoren? 6.5.1 Oi?ttrnized S-GMP, in ihc ituation described by State t?nity, is kno?iledge- n?initna!. Proof ?Ve sho? ri() functional pro -- in Gpviewr--HI infirs C-STABLE(<: V, X before Uceiviug M--Hcom(? ir., v' >) from r. I3y hypothesis GpView"?' LocaIViewf-?2 + v1,,?1 and all r."s Phase I respori- dents report k?cal version x 2. Clearly, no process in LocaIView?r?2 knows of `?` when it responds to R--Hint(r --H 2). there are two cases to consider: 1. Fhe commit message R--Hcom( ? t'x--H1 --H I :>) has no coritingent update. in this case r determines ? i, x > only after becoming mgr No process of GpViewr'??Jknows of the proposal ? r?. .i > until r as nigr niulticasts it. \Ve can apply the pi'oof of Uheorem (?`. 2.2. 2. the commit message is of the forni R--Hcom(<: r??j , x 1 >):M--Hsub(? -ir, (a) If <: ?-`--Hi, J --H 1 > is the addition of i-'??i to GpView'?-?2, it may be 1)055i- ble that the members of `i"j- t knew C-STAnLE(<: r'?, x ?) befor? r muD ticasted M--Hcom('r Vx., Ji-' >). ??his could happen if an initiator other than r, say ro., had previously sent R--Hcom(? v1,?i, x --H I >):M--Hsub(<: ?-`?,, r arid the sul?sequent N--Hcom(<; ?"r., ;c >) to the members of ?`x?i l24 Assun?ing State Unity., every ?ne?nber of AcksRcvd(T, R--Hint(r 2)) re- ported local version ? --H 2. I3y Cossip, I)isconnect and tlie Initiate con- dition, none can have responded to r?'s M--Hsub(< v?, r so T1 would only be able to conii?it ? Vr,, Y > if ((rot u vr?--Hi) were a n-lQjority subset of GpVi'ew"'?1 - If this were true r would learn it froni the re- spouses (froin the ruenA?ers of ?`j'--Hi) to r's Join and State--HX"er n?es- sages. at which point r would inulticast the i)oubli Coinruit inessage R--Hcom(<' i"j---Hl, J' --H 1 >):M--Hcom(<' v?. r >) (as per ()ptiinization4, Propo- sition 6.2.4). Sirice the only corninunication is between initiators and outer processes, the initiator is the channel through which knowledge of other meinbers' local states flows. No ineinber of ?`x knows whether its cohorts acknowledged To's M--Hsub(<' Vr,, X 50 Ti0flC can independently irifer C-STAI3LE(<' V,, X (b) If <` ?,?1., i --H 1 > is the reinoval of i,?i froin GpView?2, the ineinbers of i, cannot fc?rin a inajority subset, as above, that would enable To to coriiiiiit <` i',, j >; only T can inake <` i;,, x > c-stable. Under State i?iver?e sonie siJ-?(? of AcksRcv R--Hint'r --H re? lo?l uce, i?s T, . ol ca version .T --H 1, as well as a (possibty null) expected next update value for ver- sion x. The only' process that could know C-STAI3LE(? v,, r >) without receiv- ing M--Hcom(?,) froni T (either as a f)ouble Coininit with R--Hcom(<' v,?i, x --H 1 >), or independently), w??uld be soine previous initiator fiorri which T rio longer re- f25 ceives messages, and which, because of its rank, i? it isolated from the set AcksRcvd(r, R--Hint(? --H-2)). 6.6 Epistemic Quantification of Strong GMP ??his section quantifies the epistemic achiev -nents of Strong GNIP, as specified in Clhapter :3, over a coniplete system run. C;NiP-:3 requires every process's local version x to be the same as every other process's h?cal version , ivhile GNIP-2 says that there mnst be some point in every execution when GpView?exists, resulting in a causally-constrained consensns. Thus, when p con?mits version x it knows eveutnally L0CalVieWf? will be the ?th system view. N?e formalize this as (ver(p) = ? A)??(GpVi'ew" = Localvlewpz) Define the forn??la IsGPN?IE?v(x) to hold exactly when Gpviewr'is defined IsGPVILw(?) d=cf DOWNjjV $er()P) = xAq( DOWNqV (ver(q) = ? A Localview1x; = LocalVew?))) I?et c.?be any consistent cut along which GpView1'is defined. Then (niodulo failures) IsGPV!E?'(x) ? ()ver(?)) = ? ($) A))?ISGPN/?E?V(x)) 126 This is not eventual conin?on kno'vledge [llNis4] of ence of GpView'-?. In essence, oil'. Strong ??NiP specification is phrased loosely-enongh so that processes only know that individual instances of local vieivs n?ust l)C identical. The si>ecifica- tion does not niake explicit when the systen? view conies into existence, only that it doe t as 110 process can ever know whether another has failed, neither can a process, upon coT?lr?itting a new local view, know whether another has also done so. However, our specification and the fact that group views exist in sequence are a source of hindsight ai?out previous systeni views. This could he phrased as, `?when p's local version is greater than x, ? knows that, at sonie point in the past, the o+th systeni view existed" (ver(p) > :?.) ? l?p?ISGPN7IE?V(x) IJpon receipt of com(? `?`x,? >),process pcan reason about previous group views. It knows that other processes, also in LOcaIVieWf and still flinctioning, received and re- sponded to sonie sub(? ?)x,X >) (irrespective of the subiuitter's identity). Because d?annels are FIfO, j) also knows these processes received com(? ?)?1, x --H 1 >). That . ? p receives com(? iy.?, ;r >), p knows Gpview?lwas a defined group view (ver(p) x) ? #?p?ISGPVIi?V(x--H 1) (6.4) Lquation 6.4 holds along any consistent cut containing T?'? receiptof com(? 1,x, `l?he passage of tinie arid existence of successive views gives a process deeper kuowl- Eventual corn?ion knowledge would be lsGPViEw(x) ? Ap ?J?%lsC?P??Lw(x). 127 edge of past group views. Then, since IsGP??E?v(?) ? A1) ver(p) = (ver(p) = ?) ? (A)??A(ver(q) = 1)) ? q (A?p?A A?q?ISGPV1EW(x 2)) ? (]\?p?E?I?GPNTIiW(x 2)) q Conjoining over all processes, I), we obtain IsGPN/IE?v(x) ? Ii,,, Afp?F?ISGPVIE?V(r --H 2) ? i$?Pi?ISGPVIEW(r --H 2) p Equations 6.4 and 6.5 state that processes only have knowledge about each others' local views aftei the fact. J?nwinding these equations further gives the general result IsGPVIEw(x) ? (F?)?IsGPV1w(x --H ii) If ??? assun?e ingi does not fail, we obtain a i?ore sophisticated degree of consen- sus than the specification requires. NVhen p receives com(? Vr,, Y >), it knows there is sonic consistent cut that includes its culTentlocal state (i.e. j) need iiot take any further steps) along which every other fuiictional process in the group will also receive com(? `?`i, x >). That is, p knows it is `sitting on' a particular consistent cut, but doesn?t know whether the other processes have reached it yet. This is precisely forniulateel j )y the concurren cy operator, 1)1) [PT?S], which difl'erenti ates con current knowledge froni other episten?i c forniulatious. i?ornializiug this notion gives (ver(p) = ?) ? [?))PpISGPN7IEW(X). l2S As only one gronp vie?v can exist along any consistent cnt let ?c denote the fnnctional n?en?J)ers of the gi'onp vie?v that exists along ciit c. ??hen the fc?llo??ing holds whenever GpViewXis defined (IsGPvIE?v(r) ? /\ A))PpISGPY?E??V(Y.)) ? PEY (IsGPVIEw(x) ? ?? I?GPvIEw(r)) (6.5) Eqnation 6.5 is the induction rnh- fc?r Concurrent Cornn?on i&no'vledge? making both the c0m1)osition and existence of GpView? CCI&.? fhis is not the case when mgr? can fail. NVhen p receives com(? ???? >). it does ilot know whether that sender failed b??fore completing the mnlticast. The ?trong c1?NiP specification only guarantees j) that cV(:;fli?ialiy GpView"will be defined ((ver(p) it) ? A)3?IsGPNTIE??(it)) ? ((ver(u) ?) ? A ver(q) x)) ? A (A??( A ve?(qi) it))) q?Q q'ECj? 8In tlie teri?inok?gy of [Tay9O] Cx is a loca11j?-di?tinguishabIe consislen! c?I along ?vhich IsC?PN/IEw() holds th( existence of which suffices for concurrent con?n?on knowledge of IsC4P?IE'v(x)? Chapter 7 The Role of Order ??his chapter investigates the role and cost of providing order to a N1en?bership Resource Nianager. ?Ve con?pare Strong (??IP ?vith t?vo weaker (?NiP versions. neither (>f which provide Order. (?on?hining both weak versions and Order gives Strong CMP. \Ve use Consensuss(c) instead of Gpviews(c) to en?phasize that consensus an?ong individual processes local views differentiates the various (4roup NIen?bership I?roh- len?s we will discuss ii? this chapter. 7.1 Characteristic Membership Properties `I?he properties described here are a broader version of those stated in Since we are concerned with corilparing only consistency properties., we oinit explicit reference to the hiveness and N7alidty properties. 129 130 Eventual Propagation If p executes ?ddj?(r) (res??ectively ?cfl?ov(?(T')) ali?rig cut c then every process q, in LocaIVi'ew?(c) eventually executes (tddq(r) (respec- tively rein ovcq(r)) or fails: ADDp(??) ? RiMOVEp(r) ? A (?ADDq(r) V ?DowNq) qcLocalViewp A (??EMovEq(r) v ?DowNq) q? LocalView1 Eventual I)ropagatiOn prevents processes froni taking act ions iin ilaterally In fact, it iniplicitly forces coini?unication bet?veen p and the n?enibers of its l(?cal vie'v befof( p alters it; T) cannot take an action independently and then hope it caii propagate this action to the functional rueu?bers of its local view. Convergence ( ?ith respect to fori?ula ?). If eventually ? is true in a run, then eventually a consensus view exists ?( v Consensuss(c) --H--H Qniescence ( "Hereafter, neither faillires nor recoveries are suspected by nieni- hers of S?') is a con?nion exaniple A (A EmFAUI?TY1?(q) A ?moPR??i?cp(r)) p?s qEs QUIET(S) dJ 131 Note that (?onvergence ?vith respect to qujescence is actually a restriction on the failure suspector `s inaccuracy; since ?ve require a suspector to eventually detect all true failures and recoveries, quiescefl(? requires it riot to suspect falsely ?vhen none are occurring. Unjqueness For all asynclirorious runs and fi?r all consistent cuts iii these runs, at niost one subset, 8, of processes satisfies Consensus5?(c) 8. For all runs, r, and all cuts, c ?sc;oc u????i???(consensuss(c)))v sc7roc ((Consensuss(c) --H--H8) s)s u????i???(consensuss(c))) ()rder All processes exhi1,it the sai?ie sequence of local vie?vs conten?poraneously, provided the vie?vs are defined: AA? A p OO An equivalent fi?rii?ulation is that processes in a consensus vie?v proceed to the same next consensus view: Let c he a consistent cut sud? that LocaIView?(c) --H LocaIView'? --H--H LocaIVIewYq LocalVlewq (c). Let p, q C 8 such that Consensuss(c) 8. Then (LocalViewf;+1 --H LocaIView?Yj+1) v UNDEFINED( LocaIview?x+ 1) 7.2 Three MRM Services iri this section ?e exan?ine in-ipleii?entiiig NIR?N4 services that ensure . C1on?er -?ce ?-ith respect to QUIET(S) only, 0 Eventual ?)ropagation only., . all four charac - c properties (Strong C?1NiI?). ?;sing the kno??.?ledge-based technic?ues presented in Clhaj)ter (j we can show that each protocol we discuss is n?essage-niiiiiinal fcw the NIRNI it impli?inents. Niessage ininirnality gives a clean, concrete way of measuring the cost of providing Order. NVe introduce another iniportarit nietri c: the relative expense appli catiolis incur when using the different NIRNIs. C?learly, strong consistency guarantees provide applications with n?ore inforn?ation so that the application itself, need riot do as nuich work to acc??n?plish its task. One would conclude that if Order can he provided cheaply there would be no reason not to do so. \Ve first show that irrespective of join and leave frequency, an NIRNI pioviding Strong C?1NIU has very low overhead coiiipared to the weaker services. Rather sur- prisingly we show thA when joins and leaves are frequent (as is the case in iriost system executions) irriplerrienting Strong C? `NIP uses feu;er messages overall tlian any of the weaker services because successive phases of ?-GNIP are conipressed. Finally, we show that the Strong (i?1NIP guarantees perniit larger changes between consensus views than does the Eventual Propagation guarantee. 133 7.2.1 Virtual Partitions i?he ??iTtuai j??jrtition.? protocol (hereafter vp) of ml Abbadi? et.al. ?A%(;s51. [A1?s61 is an example of a n?en?bership service providing only (?onvergence with respect to QUIET(S). Nzirtnal Partitions ??re proposed as approxin?ations of tlie c?n- co?,?ni,unic?ite-u?ith relation.' A process attei?pts to create a ne?v virtnal partition ?hen it detects a dis?'repancy t)etwcen (its local vie?v of) tlie virtnal partition to ?vhicli it c'irrently belongs and the can-conirnunicate-with relation; for exan?ple? if it receives a n?essage froni son?e process not in its cnrrent virtnal partition. The virtnal partition to ?vhich a process actilally belongs n?ay be qilite different from the one to ?vhich it believes it beli?ngs; ?ve disciiss this in greater detail in Section 7.2.2. Ni oreover this has iniportant ramifications for distribnted apphcations and we clis- cnss these in Section 7.4. The VP protocol is a two-phase cornni it protocol with 110 mini'iimn (Ipplo -(ii quota. (?onvergence with respect to QUiET(S) re(,tiires at least Proc I t 21 5) I --H 3- messages., and the VP protocol achieves this with the minirrial nnn'ber of niessages when Q?IET(S) holds. There is no limit on the nnnil-?er of processes that can be added or removed bet??een successive virtual partitions (local views). 7.2.2 The VP Algorithm I?he following is takeii fron? [AS(?S51., but amended slightly here for clarity. The variable ??P?d is a n?onotonically increasing integer. Since N?irtnal Partitions niodel changes in the coniniunication topology. the Disconnect and C?ossip properties are 1 Cai?-con?inunicate?witli i? llot assuiued transitive. 1:34 1.rre?evant. In contrast to s-CNiP an onter nien?ber of the VP protocol has no gtiarantee `ipoil receipt of the con???it i?essage., VP--Hj oin( 1?J?id, N?cu??' i?) that any proce?s other than itself actiially ?s the ne?v partition N 6 U ?"` J); any process mentioned in AT6 u?U P n?ay have received a n?ore recent vP--Hinvite( ?"I'id?) rilessage, in which n?ax-N7Pid-seen) depart current virtual partitjoil; send? ( in?t, ack( VP--Hinvite(? rc'cv1(?nit.VP?join(?7Pid, \g'?\/f))); if (?P?d ? rnax-N7Pid-seen) arid (sain??initiator)) join ??!?jd aild adopt ATcw?/P; ??igure 7.1: The Virt?ia1 Pari tions A1gorithn?. 136 7.2.3 Eventual Propagation Only NVe also devised a n?essage-In ininial t?ventnal-Propagation-only protocol (hereafter EP). I?ike s-GNlP tt is a two-phase coriirnit protocol that requires an initiator to block if it does not receive ii?ajority approval to its invitation. I?he ni?jority requiren?ent is necessary and sufficient to ensure that knowledge of the existence of any update that could have been coinn?itted will never be lost; the update?s existence? because it is process-functionafity information., is propagated using a gossip scheme in responses to concurrently-issued and future invitations. As in s-G?IP., at the outset of execution no EP niembership service exists. Boot- ing the EP meiribership service also involves a nan?e service and a sn?all? initial cadre of core mei?bers. Let Consensus0 denote this initial set. The approv(1?i quo[?? for local view updates is the number of processes froni which approval to comi?it the update is required. As befiwe, 5izcof-AI(4((i??) denotes the minimal size of a mQiority subset of (?; S?zcof-?iaj(Ci') --H [I II t 1. In EP the approval quota. AQ()., for p is a function of LocaIView?(c) but in contrast to s-cNlP there is an a priori bound on the number of core members p can add to or remove fiom LocalViewp(c) in any singli upda Since AQ(LocaIView?(c)) must be at least a majority suppose: AQ(LocaIView?(c)) --H--H - frAThj(LocaIView?(c)) t k where k is fixed conimonly agreed upon at the beginning of each rnn? arid no m0re than the size of the largest minority subset of the initial consensus view (0 ? k ? [Conse2nsus01 --H 1). Then p can, in any singk update instance, change ?37 its view by at n?ost k t 1 niei?bers Intuitively, this restriction arises froni the fact that the initial condition is the only con?monly-known view upon which processes agree; it provides the only source of inforniation froni which processes can base an approval quota arid still be assured that the set of approving processes will always intersect. The choice of k dictates how easily a process can alier its local view larger values of k require more coordination. Iii practical terrus the initial consensus view will typically consist ()f rio n?ore than three processes, so it is likely that k ? 1 in niost cases, preventing processes fron? changing successive h?cal views by more than two processes at any tin?e throughout an entire run. Finally, assuming the quota rule is a siiiiple niajority then providing Eventual Propagation requir(s at least LocaIView?(c) t2S?z?ofAi(y\LocaIViewp(c))--H?3 u?es- sages; EP achieves this lower bound. 7.2.4 The EP Algorithm As in S-C?1P we do iiot explicitly show Disconnect, or error checking. Gossip, however, is u?ore obvious: an outer niember passes back to an initiator its local faihire and recovery beliefs in the set ProcStatus?. 13? Task:Initiator?`ni! ?vlii1e (ProcStatus??it # ?) C?et??pdate( EPva1u? ProcStatusi?it); n?uiticas!init( LocaIView???? ( ). EP--Hsubmit( kPvaiu?)); for each p C LocaIViewi?it() `tY?Cvinit(P ack(EP--Hsubmit( ??: 1??oi??)) : ProcStatusp) or f(I uU'qjnjj (p); if (majority didn't resi?ond) quit???t; for each 1? sud? that ProcStatus? # ProcStatusi?it = ProcStatus???t u ProcStatus?; I')oCon?mit( F?i3???1ne); n?vifica?finit( LocaIView???t (), EP--Hcommit( k P?aivc))); Task:OuterProcesses?p rc'cvp(?nit EP--Hsubmit(EJ???a,1?tc,)); scfldp(ifl?# ack(EP--Hsubmit(L?P??a1?i)) : ProcStatus?); rccvp(init,EP--Hcominit(EF?'a1?i?)); DoC'on'?rnit ( F P??ain (?) Figure 7.2: The Eventua? Propagation Algorithr'i. 139 7.2.5 Strong GMP ??he ?-c?iP protocol ()f Section 4.3 illip - ents all four characteristic consistency properties. s-c?iP uses the order it provides to rank processes and distin?iii?h one the in?r as responsible for initiating npdates to the consensus view. \Vitli S-G?IP one subtle cost of order is the restriction on process initiative: a lower-ranked process can initiate an update only when it believes higher-ranked processes faulty. Regarding n?essage complexity., when n??r is beli(?ved faulty, the i?inin?al cost to reconhgure depends on the degree of separation of local views and the degree of dissei?ination of the niost recent update proposal. In only one failure scenario are three con-iniunication phases necessary ilicurring a cost of 2(Siz(:of-AJ(tj( LocalView13(c)) --H 1) extra i?essagcs. In all other circumstances, s--Hc?NIP incurs no additional u?essage cost over EP. An intriguing aspect of s-cMP is that when changes to the consensus view are frequent, the cost of each update is lessthan that of either the VP or EP protocols. order I ;??queness an (I Eventual Propagation conibine to force any protocol iniple- nienting them to con?init an update contingent upon the future ren?oval of nierubers currently believed faulty. The gossipy nature of our systen?s has the affect of niaking the `contingency', a n(?cessary part of the second-phase conirnit niessage, equivalent to a first-phase invitation iTiessage. As a result, when changes are frequent, the cost of a single update is ainortized to ()(21 Consensuss(c) I) [RB9I], cheaper than either of the other protocols. course th(?re are pathological failure scenarios in which S-c4?iP perforn?s poorly. As in Section 5.3 when we counted n?essages, the worst case requires specific 44() processes to fail at the niost inconvenient tin?es: rngr fails. then its replacement fails immediately before assun?ing control., then ifs replacen?ent fails immediately before assurning control, and so forth H o??ever in fypicel systeni executions, ?vhile failures arid recoveries may be frequent, that particular failure scenario is extreme?y improbable. In these situations S-G?iP is, oddly, the least expensive of the three despite providing the strongest consistency properties In contrast to EP, the approval quota for s-c??IP is alwQys just a simple majority of a core member s current local vie?v. Niost significantly, there is no bound on the number of processes that can be added2 bet?veen consensus views! ??his is indeed unexpected as it allows successive consensus views to differ wildly. Again, the Order requirement lies at the heart of?the explanation since it forces reconfigurers to query the outer niembers for their local states. It turns out that at every point in the execution where it might be possible for two update initiators to commit disparate views, both are competing for a majority subset of the same consensus view only one of them can win, thereafter blocking the other and its approval cohorts. 7.3 Message Complexity Comparison Tables 7.t and 7.2 summarize Section 7.2. in ?-GMP, for ead? p c 8, LocaIView?(c) Consensuss(c). NYhile S-G?1P is niore costly than VP, we believe the Virtual Partitions approach is ultimately more liniited than either EP or S-C??IP since it coriverges only in quiet' systems. S-C?iP is no more costly than EP for all updates except nifjT `s removal from tile core, and then only in a worst-case scenario. Niore- o+2obvioiisVv only a olinority can be ren?oved. 141 ??ble 7.1: N1i?irnal Niessage Cost for a Single ?Jpdate to a I?ocal N7ie??. Algorithin (lonsistency Niessages Reqnired VP C?nv w4 QUIET(?') Proc I + 2 S :3 iP Liniitcd Divc?g?nce LocaIView?(c) 1+ 2(5izco?Fi??ij( Localviewp(c)) --H 1) S-GMP: non-fl?gr Order, Conv ?v4 True LocaIView?(e) + 2(Sizeof-AiH( LocaIView?(c)) --H 1) S-CMP: m?gr A? J)iverger?ce I Loc:aIView?(c) + t ( 5teofrA!aj( Loca IView? (c)) --H 1) S-CMP: arnoftized and Uniqueness I Consensuss(c) I --H 1 Table 7.2: Internal Efficiency: N1axiniun? Size Change to Local Views. rn?v: --H ? k + 1 AQ(LocaIView?(c)) SizeofrAlaj( Loca IView? (c)) + k where 0 ? k ? [I_Consensus0_I? --H 1. k fixed thronghont run. l?sually k ? Requires I LocaIView? I > k. S-GMP acid: Proc --H 1 AQ(LocaIView?(c)) G pVi?w ( c)1 --H 1 SizeofrMryj( Loca IView? (c)) 142 over ?Ve believe s--Hc?MP ?vill often be less exp(,?isive than EP since rnns in `vhich ?-c??IP can an?ortize its niessage cost (freqnent joins and leaves) are far r?ore prob- able than runs in ?vhich EP is cheaper (n?gr arid its replacen?ents al?vays failing at specific stages of the j?rotocol). `This silonid seerr? contradictory: ?ve are clairiiing that the mote ordered proto- col can be the least expensive and niost productive to run. ?`he intuition is that processes are using their kno?vledge of tile strong ordering properties to pare do??n coniiiiunication; the type arid aniount of inforn?ation provided by the strong consis- tency guarantees allo'i's processes to infi'? a great deal about the global environnient independently --H notably a consistent ranking of core inerribers arid tile size differ- ence and ten-iporal distance betiveen local views. A less j?owerful protocol cannot optirilize in this fashion, resulting in a hss-eflicient scheme for perforniing the sarrie updates. `Uhus, given an ordered protocol a series of uj?dates aniortize iliessage costs by exploit r?g kno?vledge of the strong ordering properties. 7.4 Usirig a Membership Service ?he coi?plexity results of the last section address only part of the cost-comparison issue that is the goal of this chapter. A secoiid important