NoSQL Db characteristics
NoSQL Database Types
MongoDB
Redis
Elasticsearch
Cassandra
CouchBase
There are many NoSQL solutions around, each one with its own strengths and weaknesses, so the following must be taken with a grain of salt.
- Semi-structured/ unstructured schema.
If your data requirements aren’t clear at the outset or if you’re dealing with massive amounts of unstructured data, you may not have the luxury of developing a relational database with clearly defined schema. Enter non-relational databases, which offer much greater flexibility than their traditional counterparts. Think of non-relational databases more like file folders, assembling related information of all types. If a WordPress blog used a NoSQL database, each file could store data for a blog post: social likes, photos, text, metrics, links, and more. - Limited transactions support (different for each product). Transactions require ACID properties of how DBs perform user operations. ACID restricts how scalability can be improved: most of the NoSQL tools relax consistency criteria of the operatioins to get fault-tolerance and availability for scaling, which makes implementing ACID transactions very hard.
- Scale horizontally ability. NoSQL databases are designed to be scaled across multiple data centers / clusters. They usually try improving scalability of the data store by distributing data processing.
- Fast – high performance.
No joins - read or write to one table. The process of multiple data across tables calls denormalization.
Many NoSQL databases rely on denormalization and try to optimize for the denormalized case. For instance, say you are reading a blog post together with its comments in a document-oriented database. Often, the comments will be saved together with the post itself. This means that it will be faster to retrieve all of them together, as they are stored in the same place and you do not have to perform a join. - Querying language – not SQL!
- Capabilities to store large volume of data - scale out
- In cases that aggregation is natural , the development make easier.
- Database denormalization - Query first approach. design our tables for queries. We need to understand the toolset for each Database. Find a solution that could answer our problem.
Requirements - how do we need to read the data. Do we need schema type? Non-functional requirements:
- Performance. Performance requirements define how well the system performs certain functions under specific conditions. Examples are speed of response, throughput, execution time and storage capacity.
- Scalability. Database scalability is the ability of a database to handle changing demands by adding/removing resources. We need to considure the current data volume + feuture needs. At a high level, scalability help to improve availability and performance when demand is changing, especially when changes are unpredictable.
- Cost. consider software cost + maintenance.
- Integration considerations. Existing technologies that need to integrate.
- Existing dev team skills
- CAP:
The CAP Theorem, states that databases can only ever fulfil two out of three elements:- Consistency – that reads are always up to date, which means any client making a request to the database will get the same view of data.
- Availability – database requests always receive a response (when valid).
- Partition tolerance – that a network fault doesn’t prevent messaging between nodes. In the context of distributed (NoSQL) databases, this means there is always going to be a trade-off between consistency and availability. This is because distributed systems are always necessarily partition tolerant (ie. it simply wouldn’t be a distributed database if it wasn’t partition tolerant.) On distributed NoSQL databases, basically you need to choose between AP to CP. Partition tolerance is a basic requirement from distributed system.
- Keep it simple…
Key Value
Document Store
Column-oriented
Graph
- Store simple data, accessible through a key
- Read/write by key. The data can be from many types – Json/image/Object/Text
- Simple schema
- High performance and scalability
- No complex queries involving multiple keys or joins
- Caching feature
When to use? The data that might be represented by a key-value like configuration, flags, Session management, User preferences
When to avoid? When there is need to interact with the data itself.
Examples – Redis, Amazon DynamoDB, Aerospike
Similar to key-value. Store data in structured formats called documents, often using formats like JSON, BSON, or XML.
- The content stored in the document can be queried and analyzed.
- Flex schema
- Content indexes support. Data can be queried, not only the kay.
- Easy to develop when aggregation is needed
- If the data model needs to change, only the affected documents need to be updated
- Horizontally Scalable
Examples – MongoDB, CouchDB
Document Design Considerations
When documents are initially designed, performance and scalability should be considered
- small number of big documents.
potentially allows information to be read or written in a single operation. This improve atomicity and scalability, due to a fewer relations between independent objects. - large number of simple documents. Appropriate when data-size needs to be kept small, in order to reduce latency. Documents can refer to each other by key.
A column-oriented database stores columns sequentially. Each column will be stored in sequential blocks. For analytical queries that perform aggregate operations over a small number of columns retrieving data in this format is extremely fast. As PC storage is optimized for block access, by storing the data beside each other we exploit locality of reference. On hard disk drives this is partiularly important which due to their performance characteristics provide optimal performace for sequential access. When there is a need for retrieval a few columns from a table with many columns, Row-Oriented Database has to skip over unnecessary data.
- Store & analyze big Data sets
- Columns data stores together
- Quick at aggregate queries (sum, average, min, max, etc.). We read only the columns that uses in the query
- The data can be sorted/indexed
- Adding column – easy
- Single row operations are slow (use batch, work with bulks). Saves storage: no need to define null/repeated data
Examples- Cassandra, InfiniDB, HBase
Store and querying relationships between objects. Logical structure represents data relationships.
Edge - relationship. Edge have a direction.
Vertex - object
- ACID support
- Graph DB does not represent any aggregation.
- Relations can be many to many (no extra table)
- Properties - On both edges and vertex, additional information can be stored (similar to fields). Flexible schema.
- Add new edge is easy
- Finding vertices – based on key or properties
- Traversals –
- To other vertices
- To edges
- Filtering with traversals
Examples – Neo4j, Amazon Neptune
Type: Document storage
MongoDB is a NoSQL database, uses JSON-like (BSON) documents with optional schemas. It designed for ease of development and scaling.
- Flex schema
- Ability to querying the data itself (not only get it)
- Index support – fields/combination of fields/ special indexes. (obviously, only one can use for sharding). Automatically generated or provided by the developer. Must be unique
- Mongodb supports all CRUD + More options – aggregation capabilities, bulk.
CRUD - create, read, update and delete - Good for store a lot/big of documents and analyze them License: AGDL
Mongos – the router of Mongo. Responsible for routing the queries to the suitable shard. Mongos can be replicated
Mongod – the primary daemon process. It handles data requests, manages data access, and performs background management operations. (Configuration server)
- By hash. Hashed sharding uses either a single field hashed index or a compound hashed index (New in 4.4) as the shard key to partition data across your cluster. Hashed sharding provides a more even data distribution across the sharded cluster
- by range (when need to query by range). Range-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with “close” shard key values are likely to be in the same chunk or shard. This allows for efficient queries where reads target documents within a contiguous range. However, both read and write performance may decrease with poor shard key selection. The key couldn’t be changed!
Auto sharding by Mongos - when a chunk size limits the threshold, mongo will immigrate data to another shard.
Range-based sharding is the default sharding methodology if no other options such as those required for Hashed Sharding or zones are configured.
Single Master Architecture (also called master-slave)
Master-slave replication enables data from one database server (the master) to be replicated to one or more other database servers (the slaves). The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates. Master-slave replication can be either synchronous or asynchronous. The difference is simply the timing of propagation of changes. If the changes are made to the master and slave at the same time, it is synchronous. If changes are queued up and written later, it is asynchronous. The replication in mongo is asynchronous.
Mogo auto re-election - Replica sets use elections to determine which set member will become primary (master). Replica sets can trigger an election in response to a variety of events. The replica set cannot process write operations until the election completes successfully. The replica set can continue to serve read queries if such queries are configured to run on secondaries.
Write path:
To the primary node. Asynchronous replication to the slave
Default - Success writing when the master succeeded. Option – write also to the majority/number of nodes.
Read path:
By default, clients read from the primary. when the primary fails, the system unavailable.
(Default configuration)
Consistency - Reads and writes from master. In this way, Mongo ensure consistency.
Availability - when master falls system unavailable till new master re-elected (availability infected).
Starting in MongoDB 4.0, multi-document transactions are available for replica sets. Atomic - Multi-document transactions are atomic. A transaction will not commit some of its changes while rolling back others.
type: key-value
Redis is an open source , in-memory data structure store, used as a database, cache and message broker.
Cache or database?
Cache – redis can be used as distributed or in memory
Primary store - Data can be persist to the disk to make redis safer (but also slower - trade-off)
Both/neither options can be used\
The first persistent option is snapshots - data loads to the disk once in a while. (.rdb files). The second is saves to the log each change (AOF – Append only files).
Saves the commands log is more human friendly but its uses more space and expense performance. Save a snapshot is more data loss risky + slows the system when the save process works.
Restore data from disk:
If AOF used, by executing all the commands from it.
Snapshot - Restore the data from a snapshot is fast and simple.
Redis Supports different data structures – string, hashes, list, sets, sorted set.
Bucketing – store multiples key-value couples in one hash (less key overhead)
Can be used as a pub-sub
Main point: Fast!
License: BSD
Isolation - A basic transaction provides the opportunity for one client to execute multiple commands without other clients interrupt them.
Atomic - Either all of the commands or none are processed
Locking - Optimistic locking
Optimistic – check and set, pessimistic – locks
For creating a distributed environment of Redis you have 2 options - Redis Sential (does not distribute data, only prevents failovers) and Redis Cluster
Redis Cluster is a complete clustering solution.
Redis Cluster principles:
Master-Slave Architecture
The client can connect any node, the node responsible to route the query if it can’t handle it to the correct node (Asynchronous replication)
Auto healing. There is a quorum that needs to agree on a master election and part of the system may become unavailable till the reelection ends
Cluster bus - Every node is connected to every other node in the cluster using the cluster bus. The Cluster bus is used by nodes for failure detection, configuration update and failover authorization
Write path: To the master. Asynchrony replicates to slaves. A master failure before asynchronous streaming of data to a slave would lead to data loss. Mechanisms such as multipath writes and disk replication can be used to prevent it. There is an option to enhance consistency by ‘WAIT’ – success only after replicating to other nodes. Read path: From master. There is an option to scale reads by read-only nodes. Read-only nodes can accept reads and may retrieve stale data (eventual consistency).
Redis uses “Hash slots” - mapping (Similar to consistent hashing)
Redis Cluster was designed as a general solution for high availability and horizontal scalability for all users of the database. As such, it was designed from the ground up with the major value additions to Redis in mind: performance and a strong data model. Because of this, Redis Cluster implements neither true availability nor consistency of the CAP theorem.
Availability – There is a quorum that needs to agree on a new master when electing a new one. At that time, part of the system may become unavailable Consistency – It is possible to lose data if a failure occurs after a master has acknowledged a write but before replication has completed. https://www.credera.com/blog/technology-insights/open-source-technology-insights/an-introduction-to-redis-cluster/ https://redisessentials.com/ - Chapter 9
Lucene search concept: Instead of searching the text directly, it searches an index instead. Index: an index contains a sequence of documents. Document: A document is a sequence of fields. Inverted index: Instead of pages -> word, word -> pages Search: title=”art” -> search for all the documents TF-ITF for relevance
Search engine based on Lucene.
Started as horizontal scalable Lucene.
Very fast for search operations.
Not just for text search! Can handle structured/aggregated data
Part of the Elastic Stack – Kibana (UI Visualization), Logstash Beats (Streaming data into Elastic), X-Pack (Security, Alerting, Monitoring)
Interface - Client API/Restful API + JSON/Analytic tools (like Kibana)
Lucene terms:
Document - The stored document. (document is equivalent to a row in RDBMS)
Field - A document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (eg a string, integer, date), or a nested structure like an array or an object.
Term - A term is an exact value that is indexed in Elasticsearch. Terms can be searched for using term queries.
Index – Like a DB table. Contains inverted indexes that let you search across all documents. Manages as a collection of shards - an index is a logical namespace that points to primary and replica shards.
Type – type defines the schema and mapping shard by documents that represent the same thing. (A log entry, an article…). In Elastic 6, only one type is allowed per index.
Licence: Apache
A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by Elasticsearch.
By default, an index has one primary shard. You can specify more primary shards to scale the number of documents that your index can handle.
Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of a node failure, or the addition of new nodes.
Elastic routs the documents based on ID hashing. This value can be overridden by specifying a routing value.
Single master Architecture
Automatic failover. Elastic selects a new primary
Read path - Round robin requests among nodes. any node can handle read requests.
Write path – to the Primary. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard.
Scale writes - The number of primary Shards cannot be changed later. It is difficult to scale writes. In the worst-case – re-index.
Read heavy is a good case – Add more replicas.
type: column-oriented DB
Apache Cassandra is an open-source, distributed, NoSQL database.
It presents a partitioned wide column storage model with eventually consistent semantics.
License: Apache
Cassandra makes the following guarantees:
- High Scalability - nodes can be added/removed as needed. Linear throughput increasing when adding nodes
- High Availability - fault-tolerant storage system
- Durability - guarantees data durability by using replicas
- Eventual Consistency of writes to a single table
- Lightweight transactions with linearizable consistency (Isolation)
- Batched writes across multiple tables are guaranteed to succeed completely or not at all
- Secondary indexes are guaranteed to be consistent with their local replicas data
Main point: Store huge dataset in “almost SQL” Query Language: CQ, an SQL-like language.
Cassandra terms: - Keyspace: defines how a dataset is replicated, for example in which datacenters and how many copies. Keyspaces contain tables.
- Column Family (Table): defines the typed schema for a collection of partitions. Cassandra tables have flexible addition of new columns to tables with zero downtime. Tables contain partitions, which contain columns.
- Partition key: a key all rows in Cassandra must-have. All performant queries supply the partition key in the query.
- Row: a collection of columns identified by a unique primary key.
- Column: A single datum with a type that belongs to a row.
In Cassandra, users can safely add columns to existing Cassandra databases while remaining confident that query performance will not degrade.
Use at: Netflix, eBay, GitHub, Instagram, Reddit, and more…
Cassandra uses consistent hashing for distributing the data across the nodes. The key of the partitioning calls partition key and it doesn’t have to be unique. For example, If I will want in the future to get all the employees with car = ‘BMW’, I can choose car type as a key, and guarantee that those employees will stay on the same node.
Replication factor - defines how many replicas.
Rack - a cluster of connected machines in a data center. Data Center contains multiple racks. The rack contains one or more nodes.\
Replication strategy - defines which nodes the data replicated:\
- Simple Strategy - For any keyspace, define how many replicas.
For example, if we have an eight node cluster, and a replication factor of 3, then to find the owning nodes for a key we first hash that key to generate a token (which is just the hash of the key), and then we “walk” the ring in a clockwise fashion until we encounter three distinct nodes, at which point we have found all the replicas of that key.
- Network topology strategy - allows a replication factor to be specified for each datacenter in the cluster. It also attempts to choose replicas within a datacenter from different racks to maximize availability. If the number of racks is greater than or equal to the replication factor for the data center, each replica is guaranteed to be chosen from a different rack.
Consistency Levels
In Cassandra, you choose consistency level which allows the operator to pick reads (R) and writes (W) behavior without knowing the replication factor.
Generally, writes will be visible to subsequent reads when the read consistency level contains enough nodes to guarantee a quorum\ intersection with the write consistency level.
Consistency levels - https://cassandra.apache.org/doc/latest/architecture/dynamo.html?highlight=rack#tunable-consistency.
Write path: are always sent to all replicas, regardless of consistency level. The consistency level simply controls how many responses the coordinator waits for before responding to the client.
Read path: the coordinator generally only issues read commands to enough replicas to satisfy the consistency level.
Failure detection - based on a gossip protocol
Isolation - with compare and set (CAS)
Commitlogs are an append only log of all mutations local to a Cassandra node.
Any data written to Cassandra will first be written to a commit log before being written to a memtable. This provides durability in the case of unexpected shutdowns.
Memtables are in-memory structures where Cassandra buffers writes.
In general, there is one active memtable per table.
Memtables may be stored entirely on-heap or partially off-heap, depending on the configuration.
SSTables are the immutable data files that Cassandra uses for persisting data on disk.
SSTables are flushed to disk from Memtables or are streamed from other nodes.
Cassandra chooses Availability and Partition Tolerance from the CAP.
Type: document store
Couchbase Server is an open source, distributed data-platform.
Language: N1QL (sql-like).
Data document type: JSON
Data can be retained either in memory only, or in both memory and storage.
memory-first, async architecture
Data can be replicated across the nodes of the cluster / data centers
Couchbase is straightforward to deploy and manage. Features such as replication are built in with automatic sharding.
Full text search
Parallel query processing - for execute complex, long-running queries that contain complex joins, set, aggregation, and grouping operations.
High availability - all operations can be done while the system remains online.
Use at: Linkedin, paypal,ebay
CouchBase Services:
The services can be deployed and maintained independently of one another (mix & match)
-
Data: Supports the storing, setting, and retrieving of data-items, specified by key.
-
Query: Parses queries specified in the N1QL query-language, executes the queries, and returns results. The Query Service interacts with both the Data and Index services.
-
Index: Creates indexes, for use by the Query Service.
-
Search: Creates indexes specially purposed for Full Text Search. This supports language-aware searching; allowing users to search for, say, the word beauties, and additionally obtain results for beauty and beautiful.
-
Analytics: Supports join, set, aggregation, and grouping operations, which are expected to be large, long-running, and highly consumptive of memory and CPU resources.
-
Eventing: Supports javascript callbacks on mutations and deletions.
-
Cluster Manager : control plane. runs on all the nodes of a cluster, maintaining node processes and coordinating cluster-wide operations like leader election .
-
Master Services. rebalance, auto failover, sharding/ data partitioning
-
Bucket services
Buckets: These store data persistently, as well as in memory.
Ephemeral buckets: To be used whenever persistence is not required
Memcached buckets: Designed to be used alongside other database platforms, such as ones employing relational database technology.
By caching frequently-used data, Memcached buckets reduce the number of queries a database-server must perform.
Clients writing to Couchbase Server can optionally specify durability requirements, which instruct Couchbase Server to update the specified document on multiple nodes in memory and/or disk locations across the cluster, before considering the write to be committed. The greater the number of memory and/or disk locations specified in the requirements, the greater the level of durability achieved.
Couchbase Server acts like a CP system in its default configuration. This is because any access to a given key (read, write, update, delete) is always directed to the node that hosts that active data at that point in time
Any write is also replicated within the cluster, but these replicas are primarily for the purpose of high availability and by default do not service any traffic until made active.
If a single node fails, the data on a node that failed will not accept writes until the node is failed over, although reads can be serviced from replicas if desired.
read and write directly to database nodes (by the cluster Manager).
Couchbase Server has a flat topology with a single node type. All the nodes play the same role in a cluster, that is, all the nodes are equal and communicate to each other on demand.
One node is configured with several parameters as a single-node cluster. The other nodes join the cluster and pull its configuration. After that, the cluster is operated by connecting to any of the nodes via Web UI or CLI.
Up to three replica buckets can be defined for every bucket.
Each replica itself is also implemented as 1024 vBuckets.
Typically, only active vBuckets are accessed for read and write operations, although vBuckets are able to support read requests.
vBuckets receive a continuous stream of mutations from the active vBucket and are kept constantly up to date.
Automatic sharding
shards calls vBackets.
The cluster Manager keep vMap (Hash Table) that maps vBacket to node.
key -> hash (0-1023 values) = vBacket -> node.
Couchbase partitions data into 1,024 virtual buckets, or vBuckets, and assigns them to nodes.
Like MongoDB chunks, vBuckets store all data within a specific range. However, Couchbase Server assigns all 1,024 virtual buckets when it is started, and it will not reassign vBuckets unless an administrator initiates the rebalancing process.
Couchbase Server clients maintain a cluster map that maps vBuckets to nodes. As a result, there is no need for routers or config servers. Clients communicate directly with nodes.
Failure detection Failover can be performed manually or automatically using the built-in automatic failover process. Auto failover acts after a preset time, when a node in the cluster becomes unavailable.
Insertion, mutation, and removal of multiple documents can be staged inside a transaction.
Until the transaction is committed, these changes will not be visible to other transactions, or any other part of the Couchbase Data Platform. (dirty reads)
Full text search words/documents.
Availability - the percentage of time that an asset is operating, compared to its total scheduled operation time. Alternatively, availability can be defined as the duration of time that a plant or particular equipment is able to perform its intended tasks.
Reliability - Reliability quantifies the likelihood of equipment to operate as intended without disruptions or downtime. In other words, reliability can be seen as the probability of success and the dependability of an asset to continuously be operational, without failures, for a period of time.
Durability - Durability refers to long-term data protection, i.e. the stored data does not suffer from bit rot, degradation or other corruption. Rather than focusing on hardware redundancy, it is concerned with data redundancy so that data is never lost or compromised.
Distributed Hashing - server = hash (key) modulo (number of servers)
Consistent Hashing : Distribution of data across a set of nodes in such a way that minimizes the re-mapping/ reorganization of data when nodes are added or removed