In choosing between SQL vs. NoSQL, you’ll need to think about what your data looks like, how you’ll query it, and your scalability needs.
This is the third article of our Big Data in the Cloud series. You can read the first two here:
- Big Data in the Cloud: Why Cloud Computing is the Answer to Your Big Data Initiatives
- 8 Things You Need to Do Before Kicking Off Your Cloud Big Data Initiative
To get notified of our future Big Data in the Cloud posts, click here to sign up for our email list.
In our prior blog post, we listed the 8 things you need to do before kicking off your cloud big data initiative. One of the most important things was determining the cloud data storage components you’ll need to implement.
And within this decision, you’ll need to determine what kind of database you’ll primarily use to store your data.
Your choice will depend mostly on which type of database you’ll use – SQL (structured query language) or NoSQL (Not only SQL).
Let’s find out what you need to think through to make this important decision.
Factors to consider when selecting a SQL or NoSQL database
The first and primary factor in making the SQL vs. NoSQL decision is what your data looks like.
If your data is primarily structured, a SQL database is likely the right choice.
A SQL database is a great fit for transaction-oriented systems such as customer relationship management tools, accounting software, and e-commerce platforms. Each row in a SQL database is a distinct entity (e.g. a customer), and each column is an attribute that describes that entity (e.g. address, job title, item purchased, etc.).
Because of these distinct, structured relationships between rows and columns in a table, SQL databases are best when you need ACID compliance. ACID stands for:
- Atomicity – each transaction either succeeds completely or is fully rolled back.
- Consistency – data written to a database must be valid according to all defined rules.
- Isolation – When transactions are run concurrently, they do not contend with each other, and act as if they were being run sequentially.
- Durability – Once a transaction has been committed to the database, it is considered permanent, even in the event of a system failure.
ACID compliance protects the integrity of your data by defining exactly what a transaction is and how it interacts with your database. It avoids database tables from becoming out-of-sync, which is super important for financial transactions. ACID compliance guarantees validity of transactions even in the face of errors, technology failures, disastrous events, and more.
If your data is very structured and ACID compliance is a must, SQL is a great choice.
On the other hand, if your data requirements aren’t clear or if your data is unstructured, NoSQL may be your best bet.
The data you store in a NoSQL database does not need a predefined schema like you do for a SQL database. Rather, the data can be column stores, document-oriented, graph-based, or key-value pairs. This provides much more flexibility and less upfront planning when managing your database.
With NoSQL, you can:
- Create documents without carefully defining their structure upfront
- Add fields to your database without changing the fields of existing documents
- Store documents that have their own unique structure
- Have multiple databases with different structures and syntax
A NoSQL database is a much better fit to store data like article content, social media posts, sensor data, and other types of unstructured data that won’t fit neatly into a table. NoSQL databases were built with flexibility and scalability in mind, and follows the BASE consistency model, which means:
- Basic Availability
- This means that while the database guarantees the availability of the data, the database may fail to obtain the requested data or the data may be in a changing or inconsistent state.
- Soft state
- The state of the database can be changing over time.
- Eventual consistency
- The database will eventually become consistent, and data will propagate everywhere at some point in the future.
The BASE model was built for maximum flexibility. But there actually are some NoSQL databases that are ACID compliant. You can read more about this in the “The convergence of SQL and NoSQL” section.
The structure of your data is the most important factor in deciding whether to use a SQL or NoSQL database, so put a lot of thought into this before making a decision.
Ability to query data
The next factor to consider is how often you’ll query your data, how quickly you need to run queries, and who will be responsible for running these queries.
Because your data is nicely structured and organized, it is very efficient to query your data with a SQL database.
SQL is a popular programming language that has been around for over 45 years, so it’s extremely mature and well-known. It efficiently executes queries and retrieves and edits data quickly. It’s very lightweight and declarative, and thus is easy to learn. Therefore, queries can be run by less technical staff like business analysts and marketers.
A NoSQL database provides a ton of flexibility in the types of data that you can store, but because of the potentially large differences in data structures, querying isn’t as efficient as with a SQL database.
When NoSQL database technology was being built, developers focused on scalability and flexibility, not query efficiency.
So in order to run NoSQL queries, you will have to perform extra processing on the data. Depending on the NoSQL database you’re using, you may have to implement some level of MapReduce. Many developers build querying functionality into the application layer, instead of worrying about it in the database layer. There have been some attempts at standardizing NoSQL querying, such as XQuery or JSONiq, but these tools haven’t been widely adopted.
Querying NoSQL databases typically requires developers or data scientists, which will be more costly and less efficient.
How often will you query your data, and who will run these queries? The answers to these questions will impact your SQL or NoSQL decision.
SQL and NoSQL databases scale differently, so you’ll have to think about how your data set will grow in the future.
SQL databases scale vertically, meaning you’ll need to increase the capacity of a single server (increasing CPU, RAM, or SSD) to scale your database. SQL databases were designed to run on a single server to maintain the integrity of the data, so they’re not easy to scale.
NoSQL databases scale horizontally, meaning you can add more servers to power your growing database. This is a huge advantage that NoSQL has over SQL.
The ability of NoSQL databases to horizontally scale has to do with the lack of structure of the data. Because NoSQL requires much less structure than SQL, each stored object is pretty much self-contained and independent. Thus objects can be easily stored on multiple servers without having to be linked. This is not the case for SQL, where each table row and column needs to be related.
An analogy for vertical and horizontal scaling is a wedding cake. With SQL, you can feed more people by adding more layers to the wedding cake. With NoSQL, you can just make a bunch of wedding cupcakes. Yum.
As your business grows, so will your database. So make sure you consider your scaling needs.
The convergence of SQL and NoSQL
Both SQL and NoSQL databases have their pros and cons. As such, there has been a movement to take the best characteristics of both types of databases and integrate them so users can realize the best of both worlds.
For instance, MySQL, the most popular open-source relational database, offers MySQL Document Store. This provides the structure of a MySQL database combined with the flexibility and high availability of NoSQL without having to implement a separate NoSQL database.
MongoDB, one of the most popular NoSQL databases, offers multi-document ACID transactions.
AWS’ managed NoSQL database, DynamoDB, also provides ACID-compliant transaction functionality.
And with the easy database setup that cloud service providers offer, you have the ability to use both SQL and NoSQL databases in your cloud data architecture to meet your data storage needs.
Now you have much more flexibility regardless of whether you choose a SQL or NoSQL database, and there are sure to be more flexible options in the future.
Regardless of whether you go with a SQL or NoSQL database (or both!), there are plenty of options to choose from.
On-premise SQL database offerings include:
- MySQL – as mentioned prior, the most popular open-source relational database
- Microsoft SQL server – Microsoft’s enterprise version of SQL
- PostgreSQL – and enterprise-level, open-source database focused on extensibility
- Oracle – full-service (and expensive) SQL option
- MariaDB – an enhanced version of MySQL, built by MySQL’s original developers
- And many more
The major cloud service platforms have their own SQL options:
- AWS has:
- RDS, their standard cloud SQL database
- Aurora, which focuses on increased throughput and scalability
- Microsoft Azure has:
- Azure SQL Database, their managed database-as-a-service
- Azure Database for MySQL, PostgreSQL, and MariaDB
- Google Cloud Platform (GCP) has:
- Cloud SQL, which you can use for MySQL and PostgreSQL
- Cloud Spanner, which combines elements of SQL and NoSQL
On-premise NoSQL database options include:
- MongoDB – by far the most popular NoSQL database
- Redis – an open source, distributed, in-memory key-value database that is super fast
- Cassandra – free, open-source NoSQL database created by Facebook that focuses on scalability and high availability
- Many others
Cloud service providers offer plenty of NoSQL options as well:
- AWS has:
- DynamoDB, its managed NoSQL database
- DocumentDB, a fast, scalable, highly-available MongoDB-compatible database
- Microsoft Azure offers:
- CosmosDB, its globally distributed, multi-model database
- Google Cloud has:
- Bigtable, its NoSQL wide-column database service
- Cloud Datastore, its NoSQL document database service
- Cloud Firestore, a cloud-native NoSQL document database that helps store and query app data
There is no shortage of database options to choose from!
There are plenty of decisions to be made when thinking about your cloud data storage. One of the most important decisions is whether to go with a SQL or NoSQL database as your primary database, and whether you may need both to meet your needs.
You’ll need to think about what your data looks like, how you’ll query your data, and the scalability you’ll need in the future.
SQL databases provide great benefits for transactional data whose structure doesn’t change frequently (or at all) and where data integrity is paramount. It’s also best for fast analytical queries.
NoSQL databases provide much more flexibility and scalability, which lends itself to rapid development and iteration.
We hope this guide helps clear the air on what you need to think about when selecting your database, and the options that are available to you.
What additional questions do you have about selecting a database? Let us know your thoughts in the comments, we’d love to hear from you.
In our next post, we’ll review additional cloud data storage components, such as data warehouses and data lakes. To be notified of future Big Data in the Cloud posts, make sure you sign up for our mailing list below.
Like this post? It likes you too. 🙂 Please share it using the share buttons to the left. Then join our mailing list below, follow us on Twitter @thorntech, and join our Facebook page for future updates.