The way you manage data in your app plays a crucial role in delivering a positive user experience. At the end of the day, it doesn’t matter how well your app’s interface is designed and how clean your code is unless your application is capable of quickly retrieving, processing, and delivering information. Moreover, all of this data should be protected so that intruders can’t get their hands on it. Luckily, this can be achieved with a wisely chosen database management system.
A database is a place where you store and organize all the data you collect through your app, while a database management system (DBMS) is software for conveniently managing this database.
There are more than 300 database management systems on the market. Choosing between so many tools is truly overwhelming. But the nice thing is that you don’t have to. We’ve done it for you and will share our findings. In this article, we give you valuable tips on how to choose the right database for your app.
SQL vs NoSQL database
When it comes to choosing a database, one of the biggest challenges is picking between an SQL (relational) and a NoSQL (non-relational) data structure. While both have good performance, there are certain key differences you must keep in mind.
A relational database is a set of tables that have predefined relationships between them. To maintain and query a relational database, the database management system uses Structured Query Language (SQL), a common user application that provides an easy programming interface for database interaction.
Relational databases consist of rows called tuples and columns called attributes. Tuples in a table share the same attributes.
Advantages of SQL databases
A relational database is ideal for storing structured data (zip codes, credit card numbers, dates, ID numbers). SQL is a mature technology: they’re well-documented, boast great support, and work well with most modern frameworks and libraries. The brightest examples of SQL databases are PostgreSQL and MySQL. Both have proven stable and secure.
Another great advantage of relational databases is their security. Relational databases support access permissions, which define who is allowed to read and edit the database. A database administrator can grant a particular user privileges to access, select, insert, or delete data. This gives no chance for third parties to steal information.
Using a relational database management system (RDBMS) protects against data loss and corruption thanks to compliance with ACID properties: atomicity, consistency, isolation, and durability. To better understand what this means, let’s assume that two buyers are trying simultaneously to purchase a red dress of the same size. ACID compliance ensures that these transactions won’t overlap each other.
Atomicity means that each transaction (a sequence of one or more SQL operations) is treated as a unit. It can either fail completely or succeed completely, and if one of the operations fails, the whole transaction fails. When a user purchases an item, money is withdrawn from the user’s account and deposited to the merchant’s account. Atomicity ensures that if the deposit transaction fails, the withdrawal operation won’t take place.
Consistency means that only valid data that follows all rules can be written in the database. If input data is inavlid the database returns to its state before the transaction. This ensures that illegal transactions can’t corrupt the database.
Isolation means that unfinished transactions remain isolated. It ensures that all transactions are processed securely and independently.
Durability means that the data is saved by the system even if the transaction fails. Thanks to durability, data won’t be lost even if the system crashes.
ACID compliance is beneficial for apps handling financial, healthcare, and sensitive personal data since it automatically provides safety and privacy to users. Thanks to all these advantages, relational databases are a perfect fit for financial and healthcare projects.
Disadvantages of relational databases
But relational databases have disadvantages as well:
They lack flexibility. Relational databases don’t work efficiently with semi-structured or unstructured data, so they aren’t a good fit for large loads and IoT analytics.
When the data structure becomes complex, it becomes harder to share information from one large data-driven software solution to another. At big institutions, relational databases often grow independently in separate divisions.
Relational databases are run only on one server, which means that if you want your DBMS to cope with a larger amount of data, you need to invest in costly physical equipment.
These drawbacks have forced developers to search for alternatives to relational databases. As a result, NoSQL and NewSQL databases have emerged.
NoSQL databases, also called non-relational or distributed databases, serve as an alternative to relational databases. They can store and process unstructured data (data from social media, photos, MP3 files, etc.), offering developers more flexibility and greater scalability.
Data in non-relational databases can be changed on the fly without affecting existing data. Additionally, NoSQL databases can be run across several servers, so scaling them is cheaper and easier than scaling SQL databases.
And since NoSQL databases don’t rely on a single server, they’re more fault-tolerant. This means that if one component fails, the database can continue operating.
But NoSQL databases are less mature than SQL databases, and the NoSQL community isn’t as well defined. Also, NoSQL databases often sacrifice ACID compliance for availability and flexibility.
NoSQL databases can be divided into four types:
This is the simplest type of NoSQL database, which can store only key-value pairs and offers basic functionality for retrieving the value associated with a key. A key-value store is a great option if you want to quickly find information with a key. Amazon DynamoDB and Redis are the brightest examples of key-value stores.
The simple structure of DynamoDB and Redis makes these databases extremely scalable. With no connection between values and no construction schemes required, the number of values is limited only by computing power.
That’s why key-value stores are used by hosting providers like ScaleGrid, Compose, and Redis Labs. Often, developers use key-value stores to cache data. These stores are also a good option for storing blog comments, product reviews, user profiles, and settings.
This type of database is optimized for horizontal scaling, which means you need to add more machines to store more data. This is less costly than scaling relational databases but may lead to high utility costs for cooling and electricity.
But the simplicity of key-value stores can also be a disadvantage. With a key-value store, it’s hard or even impossible to perform the majority of operations available in other types of databases. While searching by keys is really fast, it can take much longer to search by values.
In most cases, key-value stores are used in combination with a database of another type. In the Healthfully and KPMG apps, we used the Redis key-value store in combination with the PostgreSQL relational database management system.
Document-oriented databases store all information related to a given object in a single BSON, JSON, or XML file. Documents of the same type can be grouped into so-called collections or lists. These databases allow developers not to worry about data types and strong relations.
A document-oriented database usually has a tree or forest database model. A tree structure means that a root node has one or more leaf nodes. A forest structure consists of several trees. These data structures help document stores perform fast search. While this makes it difficult to manage complicated systems with numerous connections between elements, it lets developers create document collections by topic or type.
For instance, if you’re creating a music streaming app, you can use a document-oriented database to create a collection of songs by Rihanna so users can easily and quickly find her tracks.
Thanks to their structure and flexibility, document-oriented databases are commonly used for content management, rapid prototyping, and data analysis.
A columnar database is optimized for fast retrieval of columns of data. Column-oriented databases store each column as a logical array of values. Databases of this type provide high scalability and can easily be duplicated.
A column store deals well with both structured and unstructured data, making database exploration as simple as possible. Columnar databases process analytical operations really fast but show bad results when handling transactions. Apache Cassandra and Scylla are among the most popular column stores.
In a graph store, each entity, which is called a node, is an isolated document with free-form data. Nodes are connected by edges that specify their relationships.
This approach facilitates data visualization and graph analytics. Usually, graph databases are used to determine the relationships between data points. Most graph databases provide features such as to find a node with the most connections and to find all connected nodes.
NewSQL – combining the best of SQL and NoSQL databases
Particular attention should be given to NewSQL, a class of relational databases that combines features of both SQL and NoSQL databases.
NewSQL databases are geared toward solving common problems of SQL databases related to traditional online transaction processing. From NoSQL, NewSQL inherited optimization for online transaction processing, scalability, flexibility, and a serverless architecture. Like relatonal databases, NewSQL database structures are ACID-compliant and consistent. They have the ability to scale, often on demand, without affecting application logic or violating the transaction model.
NewSQL was introduced only in 2011, and it still isn’t that popular. It has only partial access to the rich SQL tooling. Flexibility and a serverless architecture combined with high security and availability without requiring redundant system increase the chances for NewSQL databases to become a next-gen solution for cloud technologies.
ClustrixDB, CockroachDB, NuoDB, MemSQL, and VoltDB are the most popular NewSQL databases.
Things to consider when choosing a database
There are several aspects you should pay attention to when choosing your database management tool:
SQL databases are perfectly suited for storing and processing structured data, while NoSQL databases are the best solution for working with unstructured or semi-structured data. If you will manage both structured and unstructured data, you can opt for mixing SQL and NoSQL databases.
As your web product grows, its database should grow as well. Your choice of database may be affected by the type of scaling you prefer, whether horizontal or vertical. Non-relational databases with their key-value stores are optimized for horizontal scaling, while relational databases are optimized for vertical scaling.
As it stores all user data, a database should be well-protected. ACID-compliant relational databases are more secure than non-relational databases, which trade consistency and security for performance and scalability. So you need to take additional steps to secure your NoSQL database.
Important note for choosing DBMS: make sure that your database management system can be integrated with other tools and services within your project. In most cases, poor integration with other solutions can stall development.
For instance, ArangoDB has excellent performance, but libraries for this DBMS are still young and lack support. Using ArangoDB in combination with other tools may be risky, so the community suggests avoiding ArangoDB for complex projects.
List of popular databases management systems
Want to know the most popular databases 2021? Let's check out the following databases list:
It’s backed by Oracle and, hence, is reliable. Developers point out that OracleDB rarely goes down and receives regular updates.
It scales well and can handle an extremely large scope of data. Oracle is currently bringing all its products and services to the cloud, which results in more flexibility.
It’s secure, scrupulously following modern security standards (including PCI compliance) and offering good encryption of sensitive data.
It manages memory very efficiently and easily handles complex operations. Also, it effectively manages and organizes a variety of third-party tools.
It outperforms other solutions in terms of speed of data access across the network.
But OracleDB has downsides as well:
With it’s the most popular DBMS, OracleDB is also one of the most expensive. A Processor License for the Standard Edition will cost you $17,500 per unit.
Oracle has too complicated documentation and lacks good guides. Even though customer support is very helpful, some developers complain about long response times.
These factors make OracleDB an ideal solution for large enterprises that need to store a large scope of data. Small and mid-sized businesses should search for more cost-effective alternatives.
MySQL is one of the most popular relational database management system created in 1995 and now managed by Oracle. This open-source database system has a huge user base and great support, and it works well with most libraries and frameworks. It’s free, but it offers additional functionality for a fixed price.
Developers can install and use MySQL without spending long hours setting it up. Most tasks can be done in the command line. This is a well-structured database with regular updates.
MySQL works perfectly with structured data at the basic level. But if you’re considering scaling your product in the future, you may need additional support, which costs a pretty penny. Also, it takes a lot of time to create incremental backups or change the data architecture in MySQL, while its rivals can do this automatically.
Uber, Facebook, Tesla, YouTube, Netflix, Spotify, Airbnb, and many other companies use MySQL for their services. We also use this DBMS for our projects.
This is an object-relational database, which means that it’s similar to relational databases, only all data is represented in the form of objects instead of columns and rows.
PostgreSQL is an ideal solution for large systems, since it’s scalable and designed to handle terabytes of data. A hierarchy of roles to maintain user permissions means advanced security.
Unlike MySQL, PostgreSQL is a completely free database. Its open-source nature means that all documentation and support are provided by enthusiastic volunteers. It also means that in case you have problems with PostgreSQL, you’ll need to search for an expert who can solve them.
We migrated World Cleanup, an app for managing the World Cleanup Day event, from CouchDB to PostgreSQL. Migrating to PostgreSQL let us not only perform in and out operations simultaneously but also easily handle high loads.
MongoDB is the #1 NoSQL database we use in our projects. In MongoDB, all data is stored in BSON (Binary JSON) documents. Thanks to this, data can easily be transferred between web applications and servers in human-readable format.
MongoDB has onboard replication, providing high scalability and availability. Auto-sharding means that you can easily distribute data to servers connected with your app. In general, MongoDB is a great solution for dealing with massive unstruсtured data sets. It can underpin most big data systems, not only as a real-time, operational data store but also in offline capacities.
But there are several pitfalls of this database platform. It stores key names for each value pair, increasing memory use. Also, there are no foreign key constraints to enforce consistency, and you can perform nesting for no more than 100 levels.
In combination with Redis, we used MongoDB in Boothapp, a social e-commerce platform for the Middle Eastern market.
Redis is an open-source key-value store that’s often used as a caching layer to work with another data storage solution. The main reason why developers opt for Redis is its speed, which far outstrips other database management systems. It’s also easy to set up, configure, and use.
But Redis lacks built-in encryption and stores only five data types: lists, sets, sorted sets, hashes, and strings. The main purpose of Redis is to store data sets without a complex structure. That’s why this tool is usually paired with another type of database system and sometimes used for microservices. Since Redis is a great solution for caching, we use it for this purpose in most of our projects, including in the KPMG, Half Cost Hotels, Mikitsune, and Healthfully apps.
Elasticsearch is an open source document-based database that stores and indexes any kind of data – text, numeric, or geospatial – in JSON format. By doing so, it enables fast search and data retrieval. Elasticsearch is built on Lucene, an open source Java software library that it uses to store and search for data.
One of the major reasons why Elasticsearch is so popular is its scalability. It easily scales horizontally, allowing for the extension of resources.
Starting from Elasticsearch version 6.7, users can manage the data life cycle. Data can be referred to as hot, warm, or cold depending on the number of requests for it and can be stored in hot, warm, and cold data nodes respectively. This functionality allows you to retrieve the most relevant (or the hottest) data quicker, as hot nodes use solid state drives (SSDs), a newer and faster type of storage device. Warm and cold nodes need only traditional hard disk drives (HDDs), which are slower.
Netflix, Stack Overflow, LinkedIn, and Medium rely on Elasticsearch.
Mixing and matching databases
You can use several databases in one project. But combining two databases is not always a good idea. Developers should make this decision only after carefully analyzing a project’s needs and defining the product’s tech stack.
Redis is often used in combination with another databases. We used Redis in combination with PostgreSQL for Healthfully, a medical platform to connect patients and medical professionals. We chose Redis for cache and token storage since it works faster than most modern databases. For the same reason, we used Redis together with PostgreSQL when developing an app for KPMG. We commonly use this pair in our projects, since we can quickly and easily make references from Redis to PostgreSQL.
Using MongoDB and PostgreSQL is a bad idea, since these databases are equal in terms of resource use and data storage. For instance, say you have a social network like Instagram and need to store information about posts, likes, followers, and user profiles. You store data about likes and posts in MongoDB, while user profiles and followers are stored in PostgreSQL. First, you need to take data about profiles from PostgreSQL, then data about posts from MongoDB, which is a time-consuming and inefficient solution.
As you can see, the choice of a DB for your project depends on many factors, including the types of data you’re going to collect and process, integrations with other tools, and the scaling approach you follow. It’s not just a question of SQL or NoSQL, as many think.
And even though proper data management may not be the first thing you consider when optimizing the user experience, it definitely should be. We can help you find the best possible database solution for your web or mobile app. Drop us a line if you want us to help you in selecting the right database for your app.