- Two main approaches
- Do the same thing on each server, but for different data
- Do something different on each server, for some data
- Common issues
- Finding the right server for a given task/datum
- Balancing load across servers and avoiding bottlenecks
- Avoiding excessive inter-server communication
- Naming and resource discovery
- Load balancing
- Falls back on data distribution schemes
- Manual data set placement
- Striping or pseudo-random distribution
- Used for flexibility and bottleneck avoidance, rather than load distribution and fault tolerance
- Often combined with approach #1
- pNFS standardizes Network Attached Secure Disks (NASD) concept
- Clients gets a layout from the NFSv4.1 server (aka "metadata server")
- The layout maps the file onto storage devices and addresses
- The client uses the layout to perform direct I/O to storage
- At any time the server can recall the layout
- Must be done strictly, or may corrupt file system
- Client commits changes and returns the layout when it's done
- pNFS use is optional, client can always use regular NFS server I/O
- Concurrent access to shared state
- Step #1: define consistency
- Easy solution: every read sees most recent write
- Generally, mutual exclusion required (e.g., critical section)
- Step #2: enforce concurrency model
- Locks, semaphores, monitors
- Failures, re-ordered messages
- One option: do updates at a central point
- file server or coordination server
- Second option: support locks
- client obtains lock from server, does scan/update, release it, works with leases
- Third option: cross fingers
- Fourth option: cross fingers and deal with conflicts
- Send pre-update version to server with update
- Server says sorry, try again if pre-update version was out of date
- Called optimistic concurrency model
- If the client dies with a lock, it means that lock is never freed, causing deadlock
- If the server dies having granted a lock, it means server doesn't know what has been granted
- Solution: leases
- Locks granted for only a specified amount of time
- Clients have to keep coming back to refresh lease
- Dead client's lock simply expires after specified time
- Reborn server can simply wait maximum expire time
Assume no conflicts and verify before committing changes
- As opposed to pessimistic schemes that lock before touching
Works well in large systems that have few conflicts
Problem: livelock
Sequences that fail the verify step get rolled back
When conflicting concurrency is frequent, it is possible to work hard and make no forward progress
One solution is to fall back on locking when this happens
In stateless servers, it is possible that a request is completed, but the client is not notified
The client will retry (retransmit) the request after the server recovers
Servers should make specific guarantees about which operations are idempotent
- Read is an idempotent request
- Remove is usually not an idempotent request
- A major issue in function partitioning is making sure that the servers are not too interdependent
- otherwise, they scale very poorly, and may fail together
- e.g., NFS servers are completely independent
- not always possible: data consistency is needed when replicating data
- Updates and consensus
- when there is inter-dependence, the servers must agree on the related state
- e.g., replication, parity, metadata
- very tough, because want to maintain autonomy and performance
- Distributed consensus dilemamas:
- concurrency control
- failure of servers
- when there is inter-dependence, the servers must agree on the related state
- Option #1: primary and secondary servers
- For any piece of data, there is a primary server
- Updates and reads go to primary, which syncs secondary
- Option #2: voting among servers
- Updates and reads go to quorum of servers
- Read quorum and write quorum chosen to overlap
- For any read, majority rules
- Updates and reads go to quorum of servers
- Option #3: 2-phase commit (for atomicity)
- Intention to update is sent to all servers
- Log intention before responding
- Done message is sent to all servers after all respond
- Reads may be able to go to any one up-to-date server
- Intention to update is sent to all servers