Newest Viewed Downloaded

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what are called dependable systems Dependability (attendibilità) implies the following: Availability (disponibilità) Reliability (affidabilità) Safety Maintainability

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Fault Tolerance Basic Concepts failure: il sistema non “mantiene le promesse”, es. alcuni servizi non vengono (comlpetamente) forniti; error: stato del sistema che porta a failure; fault: causa di un errore: transient fault; intermittent fault; permanent fault.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Failure Models Figure 8-1. Different types of failures.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Failure Models Fallimenti arbitrari: difficili da riconoscere e gestire (bizantini…); fail-safe: il sistema può produrre dati arbitrari ma è in grado di riconoscere queste situazioni e gestirle;

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Failure Masking by Redundancy ridondanza informativa; Aggiunta di informazioni aggiuntive, es. codici di Hamming. ridondanza temporale; Ripetizione delle azioni, finchè non hanno successo. ridondanza fisica (in hardware o in software) I processi (macchine) vengono replicati.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Failure Masking by Redundancy Figure 8-2. Triple modular redundancy.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Process resilience Protezione contro i fallimenti dei processi: Replica e raggruppamento dei processi . Resilienza (resistenza): capacità di un materiale di resistere a deformazioni o rotture dinamiche, rappresentata dal rapporto tra il lavoro occorrente per rompere un’asta di tale materiale e la sezione dell’asta stessa: indice, valore di r. capacità di un filato o di un tessuto di riprendere la forma originale dopo una deformazione.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Process resilience: grouping Raggruppamenti di processi identici (replicati): Replica e raggruppamento dei processi ; Un gruppo permette la gestione di collezioni di processi. I messaggi vengono ricevuti dall’intero gruppo di processi: se uno fallisce gli altri possono intervenire. I gruppi vengono gestiti dinamicamente.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Flat Groups versus Hierarchical Groups Figure 8-3. (a) Communication in a flat group. (b) Communication in a simple hierarchical group.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Gestione dei gruppi Centralizzata: group server; Decentralizzata: multicast ai membri di un gruppo; riconoscimento dei crash e rimozione dai gruppi;

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Mascheramento delle failure Primary-based replication: Organizzazione gerarchica con una copia primaria che coordina; In caso di crash della copia primaria un algoritmo di elezione sceglie un’altra replica. Replicated-write protocols (active replication): Gruppi flat; Nessun punto critico singolo;

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems Sistema “k-fault tolerant”: Sopporta il fallimento di k elementi; Ulteriori fallimenti generano risultati impredicibili. Es. sistema a votazione con tolleranza k sono necessare 2k+1 repliche. Con n repliche la tolleranza è n/2-1

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (1) Possible cases: Synchronous versus asynchronous systems. Communication delay is bounded or not: messaggi trasmessi in un tempo max. Message delivery is ordered or not. Message transmission is done through unicasting or multicasting.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (2) Figure 8-4. Circumstances under which distributed agreement can be reached.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (3) The Byzantine agreement problem for three non-faulty and one faulty process In questo caso i processi possono comunicare dati sbagliati; Processi sincroni; Messaggi unicast; Ordinamento dei messaggi preservato; Tempo di comunicazione limitato.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (3) Figure 8-5. The Byzantine agreement problem for three nonfaulty and one faulty process. (a) Each process sends their value to the others.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (4) Figure 8-5. The Byzantine agreement problem for three nonfaulty and one faulty process. (b) The vectors that each process assembles based on (a). (c) The vectors that each process receives in step 3.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Agreement in Faulty Systems (5) Figure 8-6. The same as Fig. 8-5, except now with two correct process and one faulty process.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 RPC Semantics in the Presence of Failures Five different classes of failures that can occur in RPC systems: The client is unable to locate the server. The request message from the client to the server is lost. The server crashes after receiving a request. The reply message from the server to the client is lost. The client crashes after sending a request.

Showing 1 - 20 of 49 items Details

Name: 
140-faulttolerance
Author: 
Steve Armstrong
Company: 
N/A
Description: 
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance
Tags: 
systems | and | the | distributed | steen | principles | tanenbaum | van
Created: 
6/11/2009 9:09:04 AM
Slides: 
49
Views: 
41
Downloads: 
1
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap