Distributed and Cloud Systems Micro Specialization | Week 6: Cloud Storage: Key-value Stores/NoSQL by Prakhar Chauhan | Learn Smarter
K12 Students

Academics

AI-Powered learning for Grades 8–12, aligned with major Indian and international curricula.

Academics
Professionals

Professional Courses

Industry-relevant training in Business, Technology, and Design to help professionals and graduates upskill for real-world careers.

Professional Courses
Games

Interactive Games

Fun, engaging games to boost memory, math fluency, typing speed, and English skillsβ€”perfect for learners of all ages.

games
Week 6: Cloud Storage: Key-value Stores/NoSQL

Key-Value Stores provide a flexible, schema-less architecture designed for high scalability and availability, essential for cloud applications. Apache Cassandra and HBase serve as two prominent examples of Key-Value Stores, each with distinctive architectures and operational approaches to data management. The distinction between the eventual consistency of Cassandra and the strong consistency of HBase highlights different strategies in handling distributed data in cloud environments.

Sections

  • 1

    Design Of Key-Value Stores: Fundamentals And Apache Cassandra

    This section covers the design principles of Key-Value stores within NoSQL databases and specifically discusses Apache Cassandra's architecture and operations.

  • 1.1

    Key-Value Abstraction

    Key-Value Stores provide a flexible and scalable alternative to traditional relational databases, essential for modern cloud applications.

  • 1.2

    Key-Value/nosql Data Model

    This section explores the characteristics of Key-Value stores as part of NoSQL databases, emphasizing their design for scalability, availability, and flexibility in cloud applications.

  • 1.3

    Design Of Apache Cassandra: A Distributed Column-Family Store

    This section discusses the design principles and characteristics of Apache Cassandra, a distributed column-family store that excels in availability and scalability.

  • 1.4

    Data Model (Cassandra Specifics)

    This section outlines the data model specifics of Apache Cassandra within the context of NoSQL databases, focusing on its unique features, design principles, and operational characteristics.

  • 1.5

    Data Placement Strategies

    This section covers data placement strategies in Key-Value Stores with a focus on Apache Cassandra’s methods for distributing and replicating data across its cluster.

  • 1.6

    Snitches

    This section covers the concept of 'snitches' in Apache Cassandra, highlighting their role in determining network topology for efficient data replication.

  • 1.7

    Writes In Cassandra

    This section discusses the write process in Cassandra, highlighting its architecture and mechanisms to ensure high availability and durability.

  • 1.8

    Bloom Filter

    The Bloom filter is a probabilistic data structure that efficiently determines if an element may be part of a set, which impacts Cassandra's read operations by reducing unnecessary disk I/O.

  • 1.9

    Compaction

    Compaction in Cassandra is a process that consolidates multiple SSTables into a single, more efficient SSTable to optimize read operations and manage disk space.

  • 1.10

    Deletes

    This section covers key aspects of deletes in databases, especially focusing on how data deletion is managed in systems like Cassandra.

  • 1.11

    Reads In Cassandra

    This section focuses on the reading processes and mechanisms of Apache Cassandra, outlining its architecture, consistency levels, and the usage of components such as Bloom filters.

  • 1.12

    Membership

    The Membership section explains how Cassandra manages cluster membership and node failure detection using a Gossip protocol.

  • 1.13

    Cap Theorem

    The CAP theorem states that in distributed systems, it's impossible for a data store to simultaneously provide all three guarantees: Consistency, Availability, and Partition Tolerance.

  • 1.14

    Eventual Consistency

    Eventual consistency is a relaxed consistency model that guarantees that, over time, all replicas within a distributed system will converge to the same state, despite temporary inconsistencies after updates.

  • 1.15

    Consistency Levels In Cassandra

    This section explores the various consistency levels in Cassandra, detailing how they affect write and read operations in distributed systems.

  • 1.16

    Consistency Solutions (General Techniques)

    This section explores the general techniques used to achieve consistency in distributed databases, particularly in the context of NoSQL systems like Cassandra.

  • 2

    Design Of Hbase: A Distributed Column-Oriented Database On Hdfs

    Apache HBase is a distributed, column-oriented database that operates on HDFS, providing strong consistency and scalable access to large datasets.

  • 2.1

    What Is Hbase?

    HBase is a distributed, non-relational database modeled after Google's Bigtable, designed for random real-time access to large datasets.

  • 2.2

    Hbase Architecture

    This section outlines HBase architecture, highlighting its components, data model, and operational characteristics.

  • 2.3

    Hbase Components (Detailed)

    This section details the components of HBase, emphasizing its architecture, data model, and operational characteristics.

  • 2.4

    Data Model (Hbase Specifics)

    This section covers the data model of HBase, highlighting its architecture, components, and key features.

  • 2.5

    Storage Hierarchy

    This section discusses the storage hierarchy in HBase, covering its architecture, components, and data model.

  • 2.6

    Cross-Datacenter Replication

    Cross-datacenter replication in HBase allows for asynchronous data replication between distinct clusters to enhance disaster recovery and improve read access in distributed systems.

  • 2.7

    Auto Sharding And Distribution

    This section covers auto sharding and distribution techniques in HBase, highlighting how tables are partitioned and regions are assigned for efficient data handling.

  • 2.8

    Bloom Filter (In Hbase)

    Bloom filters in HBase are probabilistic data structures that determine whether a certain row key may exist in an HFile, significantly enhancing read performance.

  • 2.9

    Fold, Store, And Shift (A Conceptual Summary Of Hbase's Write And Read Paths)

    The section outlines the conceptual processes of writing and reading data in HBase, emphasizing the terms 'Fold', 'Store', and 'Shift' to describe these operations.

Class Notes

Memorization

What we have learnt

  • Key-Value Stores are design...
  • Apache Cassandra operates w...
  • Distributed systems must ba...

Final Test

Revision Tests