Distributed Systems

Satvik Gupta

These notes are by no means comprehensive or complete

Nodes on the internet are identified by IP addresses. Data is passed from one router to another, moving each packet closer to its destination, in the hope that it will ultimately be delivered.

Each router must regularly be supplied with up-to-date routing tables. Individual IP addresses are grouped into prefixes. Autonomous Systems (AS) own these prefixes. Routing tables between ASes are maintained using Border Gateway Protocol (BGP).

Autonomous Systems

An Autonomous System can best be defined as a collection of IP routing prefixes, that are under the control of 1 or more network operators, on behalf of a single entity or domain, that present a single,common and clearly defined routing policy to the Internet.

Transit vs Peering

Peering

Peering is when two ASes allow free-flow of traffic between them and their downstream customers. Peering relations are free of charge. Peers cannot see each other’s upstreams.

Transit

Transit is when an AS (provider) agrees to:

A transit fee is charged by the provider to the customer.

The transit provider may themselves use other transit networks too.

A transit-free network is a network based only on peering. These are generally the top-level ASes. They provide transit to smaller ASes, like ISPs.

IMPORTANT

A TRANSIT PROVIDER WILL NOT ANNOUNCE A PEER ROUTE TO OTHER PEERS, OR TO NETWORKS IT BUYS TRANSIT FROM

Transit and Peering Example Diagram

In the above represented network,


BGP Hijacking

BGP was designed without security - it is based on trust. If an AS says it controls certain IPs, all its peers will believe it, and route app traffic to those IPs, to that AS. Security extensions and third-party validations exist, but they’re not widely used.

An AS may announce an IP it doesn’t own, or claim to have a shorter path to it than is already available - even if that “shortest” path doesn’t actually exist.

Generally, ISPs filter BGP traffic. They allow BGP advertisements from their downstream networks to contain only valid IP space (i.e, IP addresses that the downstream network is known to possess). However, hackings have occurred because this isn’t always true.

If an AS is hacked, this can be exploited. However, this will be quickly found out and reversed.


OSI Model

Open Systems Interconnection.

Widely referenced in academic literature, not often used exactly in industry. For example, the TCP/IP protocol doesn’t fit neatly on top of the OSI model.

From lowest to highest, the 7 layers are:

  1. Physical Layer

    Deals in bits. Handles transmission and reception of raw bit streams over a physical medium.

  2. Data Link

    Deals in frames. Transmission of data frames b/w 2 nodes. Handles frame sync, errors, QoS, physical addressing, etc.

    • Physical Addressing, for e.g, using MAC addresses.
    • NOT Network Addressing.
  3. Network Layer

    Deals in packets. Handles structuring and managing a multi-node network, including addressing, routing,etc.

  4. Transport Layer

    Deals in segments/datagrams.

    Handles reliable transmission of data segments b/w points on a network. Handles segmentation, acknowledgment, multiplexing, handshakes, etc.

    • Segmentation is dividing large data into smaller sizes in order to match packet size imposed by network layer. This is known as MTU (Max Transmission Unit).
  5. Session Layer

    Manages communication sessions, i.e, continuous exchange of information in the form of multiple back and forth transmissions btw 2 nodes. Creates the setup and controls the connection. Performs teardown. DNS is often put in this layer.

    Logon, name lookup, log off occur here.

    Authentication in FTP is built into session layer.

  6. Presentation Layer

    Translates data b/w a network and an application. Includes character encoding, data compression and encryption/decryption.

    AKA Syntax layer.

    TLS/SSL is generally considered to be in this layer.

  7. Application Layer High level protocols, such as HTTP or FTP. Generally include file sharing, message handling, DB access, etc.

The Internet Protocol Suite

The Internet Protocol Suite (TCP/IP) is different than the OSI model. Many layers are similar, but their differences start after the network layer.

TCP/IP layers don’t fit neatly into OSI layers.

Layers in TCP/IP suite are:

  1. Physical
  2. Data Link
  3. Network
  4. Transport
  5. Application

TCP/IP doesn’t differentiate between session,presentation and application layer.

This is why it doesn’t fit correctly into OSI.

For example, TLS is built on top of transport layer such as TCP. But, applications generally use it as if it was a transport layer.

Some mentions of TCP/IP don’t even contain a physical layer. The Data Link and Physical layers are merged into a common “Link” layer. It is unspecified as if the Link layer does the job of the physical layer, or if TCP/IP assumes that physical hardware already exists underneath.


Application Based Multicasting

Nodes organize themselves into an overlay network that is used to spread information. Network routes aren’t involved in group membership.

Starting a multicast


Lamport Clocks

Lamport Algorithm


Totally Ordered Multicasting

Totally ordered multicasting is when events must occur in the same order on all nodes.

For example, consider the following scenario.

This is an inconsistency. The bank will be okay with either amount in the customer’s account, but the amount on all nodes should match with each other.

How to solve this?

We solve this using totally ordered multicasting. The order of events must be the same on all nodes, even if the order is different than the real-world order of events. It should just be the same.

This ensures that all nodes have the same copy of the queue.


Replica Management

Replicas are created for servers, to reduce latency and to provide resilience. They need to managed efficiently and consistency needs to be maintained across all replicas.

Content Distribution

What data should we propagate?


Push vs Pull Semantics

Push

Pull

Hybrid

This approach is based on leases.


Expiration Times for the leases can be based on:


Epidemic Protocols

Replicas can be of 3 types in this protocol.

Anti-Entropy

Server P picks another server Q at random. They exchange info using one of the following approaches:

Pull-based approaches are generally better than push-based. This is because a susceptible node may not find any node that is willing to push to it. With pull-based approaches, it can pull the new info itself.

Combination of pull and push has been found to work the best in practice.

Gossipping Protocol

A mix of anti-entropy and gossipping is the best

Deletion of Data

Deletion of data is done through the use of death certificates. An item that has been deleted is given a death certificate, and these certificates are propagated the same way as normal data would be.

The accumulation of death certificates becomes an issue. If a lot of items are deleted everyday, then a lot of storage space will be taken by just the death certificates.

Hence,old death certificates are removed using expiration dates.This always runs the risk that there was a node that didn’t receive the death certificate.

A special server can be set up that never removes any death certificate. IF a deleted item is seen again (after its death certificate has expired), this special server will see it as well. This special server will circulate the death certificate once again.


Consistency Protocols

Consistency Protocols have the following classification:

Consistency Protocols

Primary Based

Each data item has a primary node, which is the node responsible for it.

Remote Write

Local Write

The status of which node is primary for a particular data item x can change.

Single Copy

Only the primary has the data item x. Processes who need x need to request it from the primary (for read access). If a process wants to update x:

Multiple Copies

Multiple copies exist, but one is primary.

If a process wants to update x:


Replicated Write

In this, writes can be carried out anywhere, not just at the primary.

Active Replication

A special process carries out updates at each replica.

A sequencer can be used that assigns a unique ID to each update. Each update is then propagated with this unique ID.

Quorums

Clients must request and acquire permission from multiple replicas before a read/write operation.

For e.g, get the majority vote before reading/writing.

All processes that vote yes will perform the update and have a newer version of the data.

When reading, majority must say yes and they must all have the same version number of the data. This ensures that the client doesn’t request the data from a replica that has an older version of the data.

Gifford’s Method

Let the number of votes required to read be Nr , and the number of votes to write be Nw. Let the total number of nodes be N.

Gifford’s method states that the value of Nr and Nw should follow the following two rules.

  1. Nr + Nw > N
  2. Nw > N/2

The first rule prevents read-write conflicts.

The second rule prevents write-write conflicts.

A special case of Gifford’s method is Read One Write All

In this, Nr = 1 and Nw = N

By ensuring that updates are written to all nodes, we can read any one of them and be sure that we have the latest version of the data.


Fault Tolerance

Fault Tolerance has the following criteria.

  1. Availability - It measures the degree to which the system is up and ready for use at any instant.

  2. Reliability - It measures the degree to which the system is ready to use for longer periods of time.

  3. Safety - Nothing catastrophic should happen even if the system fails.

  4. Maintainability - The degree to which the system can be brought back up in case it fails, and the underlying issue can be fixed.

Availability Reliability
Measures instant readiness of system Measures readiness of system across long periods of time
Measures the ability of a system to do its job if needed Measures the ability of a system to perform its function for some interval, without failure.
Measures the (average) amount of time the service was down Measures the frequency/probability of failure
A system that goes down one millisecond every hour has high availability but low reliability A system that works perfectly usually, but crashes for 2 weeks every August is highly reliable but has only 96% availability
Availability is 1- (1/3600000), Reliability(measured as failure rate) is once every hour Availability is 50/52, Reliability (measured as failure rate) is once every year

Types of Failure

  1. Transient - One -time.
  2. Intermittent - Repeating sometimes.
  3. Permanent - Always there.

Failure Models

  1. Crash (Fail-Silent)
  2. Fail-stop
  3. Fail-safe
  4. Omission Failure
  5. Timing failure
  6. Response Failure
  7. Arbitrary Failure

Crash

Server halts, but worked correctly until it halts. No response is seen from the server after it halts.

AKA Fail-Silent

Fail-Stop

Server halts, but worked correctly until it halts. Other servers are able to detect it has halted.

Fail-Safe

Server produces junk output, and other processes are able to recognize it as junk.

Omission Failure

Server fails to respond to incoming messages.

Timing Failure

Server’s response is outside the specified time interval.

Response Failure

Incorrect response.

Arbitrary Failure

Server sends arbitrary data at arbitrary times.


Security

Types of Cryptosystems

Symmetric

AKA conventional cryptography/shared-key systems/secret-key systems.

Sender and receiver share the same key, which is used both for encryption and decryption.

The shared key must be kept private. Anyone in possession of the key can read encrypted messages.

The notation Ka, b is used to denote a secret-key shared by A and B.

Asymmetric

AKA Public-key cryptography.

The keys for encryption and decryption are different, but form a unique pair. The key for decryption can only decrypt the data encrypted with its pair key.

Key for encryption - KE.

Key for decryption - KD.

One of the keys is made public, and the other one kept private.

The notation KA+ is used to denote a public key belonging to A, and KA denotes a private key belonging to A.

If Bob wants to send a message to Alice, he should encrypt it using Alice’s public key. Since Alice is the only person who possessed the corresponding private key, only she can decrypt the message.

Hash

A hash function H takes a message m of arbitrary length, and produces a bit-string h having a fixed length.

h = H(m)

By the pigeonhole principle, many inputs can result in the same output.

A good hash function should have the following properties.

i.e, the hash function should be REPEATABLE and IRREVERSIBLE (ONE-WAY).

RSA

An asymmetric encryption algorithm named after its inventors - Rivest, Shamir and Adleman.

Based on the fact that prime factorization of very large numbers is a difficult and time-consuming process.

Steps

  1. Take 2 very large prime numbers - p and q.

  2. Calculate n = p * q z = (p−1) * (q−1)

  3. Choose d such that d is relatively prime to z.

  4. Compute e such that

(e*d)%z = 1

Now, the number d can be used for decryption, and e for encryption.

One of these is kept private, and the other is made public.

Usage

Let the message to be sent be m. Here, m is interpreted simply as a binary number.

  1. Divide m into fixed length blocks, mi, such that:

0 ≤ mi ≤ n

Each mi is also interpreted as a binary number.

  1. The sender calculates ci = (mie)%n All such ci are calculated and concatenated into a single variable c.

  2. c is sent to the receiver.

  3. The receiver calculates yi = (cid)%n

Based on the properties of modulus, and the way we have chosen e and d, we can easily see that yi = mii.

This way, the receiver is able to reconstruct the message.

Properties of RSA

Securely sending messages (Secure Channels)

Securely sending messages has the following problems to solve.

Example using Challenge-Response Protocol

Alice is denoted by A and Bob by B. Their shared key is denoted by KA, B.

  1. Alice sends her identity to Bob, indicating that she wants to communicate with him.
  2. Bob sends a challenge RB to Alice. This could be a random number.
  3. Alice encrypts it using their shared key KA, B and sends the result to Bob.
  4. Bob receives the encrypted message, and decrypts it to check whether it contains RB. If it does, then he knows the person on the other end is Alice, because no one else could have encrypted it with KA, B.

Bob has now verified Alice’s identity. But Alice hasn’t verified Bob’s.

  1. Alice sends a challenge RA to Bob. Bob must encrypt it with KA, B and send it back to Alice. Alice will decrypt it and verify whether it contains RA.
Challenge Response Protocol
Reflection Attack

Suppose we try to optimize the above approach.

Alice has to eventually send her challenge to Bob anyway, so she can just send her challenge when she’s sending her identity (as part of step 1).

Similarly, Bob can return the response to Alice’s challenge, and his own challenge, in a single message.

Optimization of Challenge Response

The above protocol can easily be defeated using a reflection attack.

A reflection attack is a way to attack a challenge response system which uses the same protocol in both directions.

We basically try to trick the target into answering its own challenge.

Steps
  1. The attacker initiates a connection to a target.

  2. The target attempts to authenticate the attacker by sending it a challenge.

  3. The attacker opens another connection to the target, and sends the target this challenge as its own.

  4. The target responds to that challenge.

  5. The attacker sends that response back to the target (“reflects” it) on the first connection.

Example using Reflection Attack

Chuck wants to pretend to be Alice, and talk to Bob. Chuck does not possess the key KA, B. So, Chuck will trick Bob into encrypting his own challenge and giving it to Chuck.

  1. Chuck sends a message to Bob ,containing Alice’s identity and a challenge RC. (Message 1)
  2. Bob returns his challenge RB and his answer to Chuck’s challenge, KA, B(RC). (Message 2)
  3. Chuck needs to prove he is Alice, by encrypting RB with KA, B, and thus return the response KA, B(RB) to Bob.
  4. Chuck opens up a second channel to Bob, but this time he uses RB as his challenge. He sends A and RB in a single message to Bob. (Message 3)
  5. Bob doesn’t recognize that RB is his own key, and encrypts it and sends KA, B(RB) back to Chuck, with another challenge RB2 (Message 4)
  6. Chuck ignores the second channel, including the challenge RB2. He sends back KA, B(RB) (received from Bob in the second channel), as a response in the first channel. (Message 5)
Reflection Attack
Mistakes Made
  1. The same protocol was being used in both directions. It is always better to use a different challenge for the initiator and the responder.

    A basic example would be to make it so that challenges by the initiator (Alice/Chuck) are always odd numbers, and by the responder (Bob) are always even numbers.

    (However even this approach would be susceptible to a MITM (Man in the middle) attack)

  2. Bob gave away valuable information in the form of the response KA, B(RC) and KA, B(RB), without knowing who he was giving it to.

    This wasn’t violated in the original protocol, where Alice had to prove her identity first.

Mutual Authentication in Public-Key Cryptosystems

This system assumes there is some way to verify everyone’s public keys.

i.e, if Alice wants to check whether a particular message was sent by Bob,

This assumes that Alice has a verified way, of getting Bob’s public key, and being absolutely sure that the public key she has indeed belongs to Bob.

Generally, this involves the use of a trusted source, such as a KDC (Key Distribution Center), or a CA (Certificate Authority).

Example

Alice wants to set up a secure channel with Bob. Both possess each other’s public key.

  1. Alice sends a challenge RA to Bob, encrypted with his public key KB+. She knows that only Bob will be able to decrypt this message.

    The message takes the form KB+(A,RA).

  2. When Bob receives the message, he will do the following things:

    1. Decrypt Alice’s message and extract RA from it.
    2. Create his own challenge RB.
    3. Generate a session key KA, B that can be used for further communication (using symmetric cryptography).
    4. Combine all the above 3 things into a single message and encrypt it with Alice’s public key, and send it to Alice.

    The final message takes the form KA+(RA,RB,KA, B)

  3. Alice encrypts Bob’s challenge (RB) with the session key KA, B and sends it back to Bob. This lets Bob know that the person on the other end is in fact Alice, since only Alice could have decrypted the previous message and extracted RB and KA, B from it.

  4. Alice and Bob communicate further using KA, B for this session. Once the session ends, KA, B is destroyed.

Mutual Authentication in a Public Key Cryptosystem

Digital Signatures

Confidentiality and Integrity needs to be maintained in secure channels.

Digital Signatures are used for this. The document is signed using the sender’s public key, which uniquely ties the sender to the message.

Digitally signing messsages
Issues with this scheme

A solution for the second problem is a message digest.

Message Digest

It’s a fixed length string h that’s computed from a message m of arbitrary length, using a hash function H.

If m is changed to m, then it’s hash H(m′) will not be the same as before (H(m)). Thus, modifications will easily be detected.

Instead of signing m, Alice signs H(m), which becomes the signature.

The message sent to Bob is now KB+(m,KA(H(m))), where KA(H(m)) is the signature.

On Bob’s end, Bob will hash the entire message himself, decrypt the signature, and compare the hashes. If they match, all is good.

Digitally signing messages using digests

Diffie-Hellman Key Exchange

This is a method for 2 parties to exchange keys without the use of a third party.

  1. Alice and Bob agree on 2 large numbers, n and g. Both numbers can be made public.
  2. Alice chooses a large number x and keeps it private, and Bob chooses a large number y, and also keeps it private. Alice does not know y and Bob doesn’t know x.
  3. Alice sends gxmodn to Bob.
  4. Bob sends gymodn to Alice.
  5. Alice computes KA, B as: KA, B = (gymodn)x = gxymodn
  6. Bob computes KA, B as: KA, B = (gxmodn)y = gxymodn

This way both Alice and Bob get the same session key, and no one listening from outside will be able to recreate it. This is based on the same principle as RSA.