NoSQL: Databases for Digital Universe
Indra Prakash Tiwari
Data is an ever growing entity, never failing to increase. With billions of digital devices already in existence and rapid emergence of Internet of Things, the amount of data being generated currently is simply humongous. Now with some saying that we have generated more data in the past two years than all the time combined, the real challenge lies in what we do with this data.
RDBMS has served the needs of enterprises quite well in the past but in the wake of unstructured data and need of real time analytics, NoSQL databases have emerged as a current favorite amongst businesses especially in big data and real time web applications.
What is NoSQL?
Wikipedia states that NoSQL databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. One more point to be noted here is, NoSQL is not a single system/solution, rather it is a class of database management systems that differ from classic RDBMS.
Basic features of NoSQL:
- No schema required: NoSQL database supports inserting of data without spending time on defining a rigid database schema
- Auto elasticity: NoSQL has the ability to spread your data onto multiple servers without requiring application assistance. There will be no downtime when adding or removing servers from data layer for application.
- Integrated caching: NoSQL uses an advance technique to cache data in System memory to increase the performance and overall throughput. This is very different from relational database as it uses a separate infrastructure.
For better understanding of NoSQL database architecture with application, one should understand the concern of separation between data storage and data management. SQL based databases attempt to satisfy both concerns with databases, which is very difficult.
Why NoSQL Database?
NoSQL databases have been getting a lot of attention over the past few years for their good performance, schema flexibility, scalability and analytics capabilities. Even though the IT industry has been dominated by relational databases from the last 40 years application developers are continuously switching to NoSQL DB to achieve the following:
- Rapid application development due to a flexible data schema model
- Sizable ability to scale freely to support huge data and users
- Improved application performance to meet expectations of users who need highly responsive applications even during complex processing of data.
NoSQL’s scalability and performance advantage
An application and its underlying database needs to be upgraded to support continuously increasing number of users and the growth of data. This can be typically achieved with two approaches: scale up and scale out. Scale up involves setting up bigger servers with improved processing speed, increased ram size etc. whereas, scale out follows a distributed approach where physical servers (or virtual) with average computational qualities are added.
Let us look at some diagrams below on how Scale up & Scale Out work, in both RDBMS and NoSQL :
With Relational databases, to support more users or store more data, we need bigger server with additional CPUs, memory and disk storage. NoSQL provides more linear, scalable approach to scaling than RDBMS.
Best use cases for NoSQL databases
- The data stored is semi-structured or unstructured in nature
- The applications that access this data require a certain level of performance and scalability
- The applications that access this data are okay with an eventual consistency
NoSQL implementations are generally categorized as below:
- Document Database – MongoDB, CouchDB, Couchbase, RavenDB, MarkLogic
- Graph Database – Neo4J, Infinitegraph, AllegroGraph
- Key Value Data Store – Riak, Redis, Dynamo, Oracle NoSQL database, Voldemort, Aerospike
- Columnbase database – Cassandra, Hbase, Amazon SimpleDB, Apache Accumulo, Hyper Table, Azure Tables
- In memory data grid – Hazlecast, Oracle Coherence, Terracotta BigMemory, Gemfire, Infinispan, Gridgain, Gigaspaces, Tibco
I recall, once a major insurance company had tried to integrate multiple databases into one consolidated, coherent view of its customers using RDBMS technology. This effort, however turned out be unsuccessful. With MongoDB, they had a proof of concept up and running in just two weeks and was in production within 90 days, resulting in an improved customer satisfaction and boosting call center productivity.
With its many advantages, NoSQL does present itself as an alternative to RDBMS. However, they cannot fully replace them as relational databases are still a good choice for certain use cases – like structured data and applications that require ACID transactions.