NoSQL

From Citizendium
Revision as of 15:53, 28 July 2010 by imported>Charles Treatman (→‎BerkeleyDB)
Jump to navigation Jump to search
All unapproved Citizendium articles may contain errors of fact, bias, grammar etc. A version of an article is unapproved unless it is marked as citable with a dedicated green template at the top of the page, as in this version of the 'Biology' article. Citable articles are intended to be of reasonably high quality. The participants in the Citizendium project make no representations about the reliability of Citizendium articles or, generally, their suitability for any purpose.

Nuvola apps kbounce green.png
Nuvola apps kbounce green.png
This article is currently being developed as part of an Eduzendium student project. The course homepage can be found at CZ:Special_Topics_2010.
To provide students with experience in collaboration, you are warmly invited to join in here, or to leave comments on the discussion page. The anticipated date of course completion is 13 August 2010. One month after that date at the latest, this notice shall be removed.
Besides, many other Citizendium articles welcome your collaboration!


This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

NoSQL refers to a number of non-relational distributed database architectures. NoSQL architectures usually store data as key-value pairs, rather than supporting relations. Some systems eliminate the guarantee of consistency (instead promising eventual consistency) in order to increase scalability. The distributed nature of NoSQL architectures makes such data stores highly scalable and fault-tolerant.

History

NoSQL vs. RDBMS

In many cases, NoSQL databases can process data more quickly than traditional relational database management systems. One reason for this is that data representation in NoSQL databases is much simpler than in relational systems. For example, a table in a relational database might have many columns, but data in a key-value store will always have only two parts: the key and the value. In addition, many NoSQL databases do not fully support ACID transactions. While this allows faster performance, it can also be risky in applications where precision is needed, such as in banking applications.[1]

Disadvantages of NoSQL

Relationship to cloud computing

Types of NoSQL Databases

Key-value Store

A key-value store maintains data as a pair consisting of an indexed key and a value. In general, key-value stores provide a single operation: fetching a single value using its key. Some key-value store implementations include mechanisms for performing a join on two distinct tables. Examples of key-value stores include Oracle's BerkeleyDB and Amazon's Dynamo.

BerkeleyDB

BerkeleyDB is an open source, transactional, embedded database engine. It is available as a library that can be included in any application. Data are represented as key/value pairs. The keys and values in BerkeleyDB can be any objects supported by the programming language. Data are stored in files on disk, as a single file for each key-value store. BerkeleyDB also provides the option to maintain data stores in memory only, if the store is small enough to fit in main memory.

BerkeleyDB provides a number of features competitive with relational databases, including support for transactions, two-phase locking, joins, and write ahead logging. These features make BerkeleyDB very reliable. The BerkeleyDB engine is used in a number of applications. The MySQL database management system offers BerkeleyDB as an option for the storage engine.[2]

BerkeleyDB in itself does not provide a method for distributing data, but using a distributed hash table, it is possible to distribute data across multiple BerkeleyDB instances.

Dynamo

Column-oriented Databases

The column oriented database stores entries by column as opposed to row-oriented databases. This optimizes the SQL databases by making data aggregation easier and maximizing disk performance. Examples of open-source and commercial column oriented databases include: Cassandra(Facebook), Big Table(Google), Hypertable(Open-source implementation of Big Table), Hbase(Open-source implementation of Big Table), etc.

Big Table

Bigtable is a distributed storage system for managing structured data, which is designed to scale to petabytes of data reliably. It has been developed by Google since 2005 and used for more than 60 Google products.

Bigtable is a multi-dimensional sorted map, which can be indexed by a row key, column key, and a timestamp. For example, if we want to store a large collection of web pages, we would use URLs as row keys, various aspects of web pages as column names and the contents of the web pages can be stored in column under the timestamps when they were fetched.Every read or write of data under a single row key is atomic. The columns can be dynamically added. The timestamps represent different versions of data which are assigned by client application. The older versions are garbage-collected. The rows are sorted lexicographically. Consecutive keys are grouped together as "tablets". Column keys are grouped into sets called "column families". Column key is named using syntax: family: qualifier. Access control and disk/memory accounting are at column family level. The data design includes creating and deleting tables and column families, changing cluster, table and column family metadata like access control rights. The client interactions include writing and deleting values, read values, scan row ranges, single-row transactions, map, reduce integration.

Cassandra

Document-based Stores

Future perspective

References

  1. Leavitt, Neal, "Will NoSQL Databases Live Up to Their Promise?", Computer
  2. Olson MA, Bostik K, Seltzer M. Berkeley DB USENIX