In this article we will consider the main types of databases. On specific examples, we will reveal the advantages and disadvantages of each model, we will study the scenarios of their application.
What is a database
The database is a set of information about objects, structured in a certain way. Typically, databases are controlled by special software, or database management systems (DBMS).
Depending on the type, the logical structure of the database may have a different description. This difference affects exactly which database is used in the development of a particular product or technology.
The simplest types of databases
Such databases include a database where data is stored with a simple structure: for example, a list of permitted IP addresses for access to the network, setting up project surroundings, a list of subscribers for company newsletters and so on. They are still widespread.
Information about the objects is collected in simple formats in the structure of various formats – TXT, CSV, etc., gaps, tabulation, commas, a semicolon and colon are used to separate the fields.
Examples: ETC/Passwd and ETC/FSTAB in UNIX-like systems, CSV files, INI files, etc.
- Just use. To work with files, a rather primitive text editor.
- It is convenient to work with the configuration data of the applications (accounting data, connection settings to remote servers and devices, ports, etc.).
- It is difficult to establish connections between data components.
- Not for all types of information.
Unlike text files, connections are established between the stored objects. Objects are divided into parents (main classes or categories of objects) and descendants (copies of these classes or categories). In this case, each descendant may have no more than one parent.
The graphic representation of such a database is a tree structure.
Examples: Organization of file systems; DNS and LDAP connections.
- Relations between objects are implemented in the form of physical pointers. For example, in the file system, the path to the folder or file is built from the names of root and nested catalogs;
- Modeling relationships and subordination.
Restrictions: the technology of a hierarchical organization does not imply the connection of “many-core”, which means that the data storage system is quite limited.
This technology develops a hierarchical approach by modeling complex relationships between objects. Here descendants can have more than one parent, but the restrictions on the hierarchical approach are preserved.
Example: IDMS – specialized DBMS for mainframers.
This type of database is the oldest: the theoretical foundations of the approach were laid down by the British scientist Edgar Codd in 1970. Here the data is formed in tables of lines and columns. The lines provide information about objects (values of properties), and in column – the properties of objects (fields) themselves.
Complex relationships of objects in relational databases are modeled using external keys – links to other tables. This allows you to approach the issue of designing a database from the standpoint of normalization – minimizing redundancy when describing the properties of objects.
For example, if we are talking about a restaurant menu, then each dish has weight, price, name, calorie content and category to which it belongs – hot snacks, cold snacks, first dishes, desserts, salads and so on. The connection between dishes and the category is carried out through the reference field of the category index in the table of dishes.
This approach allows:
- Minimize the volume of the database: you do not need to prescribe the name of the category.
- Increase the integrity of the system: in the indicated example, all dishes are tied to the categories of the menu. Adding a dish without a category is impossible, as well as an indication as a link of the index of a non -existent category.
- Simplify scaling: new dishes can be added to existing categories. Also, the addition of new categories, the attachment of new dishes for them and the redistribution of dishes by categories are not excluded.
- Increase failure tolerance: due to the optimal organization of the table scheme, requests for sample and aggregation will work with a lower amount of data, and therefore faster than without normalization. With an increase in the number of records in the tables over time, this will maintain positive user experience.
A good example of modeling complex relationships in the relational database is given in the figure above. Here we see a model of the database of an educational institution, where there are the following objects: a student, course, teacher, department, direction of training.
The relationship of the teacher with the department is organized through the section and course (external keys to the ID course and ID of the teacher in the table section, as well as the department in the course). The student’s connection with the direction of learning is implemented through the student’s training direction (external keys to the student and ID areas of training).
Thus, in order to calculate, for example, the number of students on the course and details statistics on teachers, you need to write a request with the accession of students to the direction, course and teachers, making the appropriate group for teachers.
SQL queries language
Requests in relational databases are formed using the structured SQL language. His proposals allow:
- Make samples,
- Carry out aggregations and groups,
- Change and delete data,
- Modify the structure of the database (create tables, fields),
- Manage user access to certain operations, etc.
In addition to normalization, in the relational database there is also the reverse process – denormalization. It is aimed at transferring the most frequently used fields from external tables to internal ones. Consider this on the example of the messenger.
The user (user) leaves messages in chats (Chat). The data structure is such that the messages are related to the user and chat through external keys (user_from and user_to, as well as Chat_id in the message table; User_id and Chat_id in the User_Chat_Link table). Since the scheme is normalized, various requests for the sample, calculation and aggregation of statistics by chats, users and messages must be performed by joining external tables.
On relatively small amounts of data, these requests will be worked out quickly, and with an increase in the size of the database, they will slow down. The reason lies in the mechanism of connection. It is based on a constructive comparison of two or more tables according to the condition of the connection – for example, the equality of Chat_id in Messages and ID in Chat. And this gives a load on the database server, which only increases with the growth of its size. To optimize this kind of queries, there is a mechanism of denormardization.
The duplicate fields named after the Chat_name (Chat_logo) are added to the user and chatting user_chat_link. The last message (Last_MSG) and the number of unread messages (Unread_MSG_count) are also displayed there.
Now, to obtain the above fields and conducting analytics on them, you can use the User_Chat_Link table without the need to connect with the message table. However, this approach has restrictions.
Due to additional fields, requests for reading and aggregation of data are optimized, however, the cost of this is forced redundancy and complication of the business logic of the application. In particular, writing data changes (Update and Delete), as well as modifications of the base structure (Create) are complicated.
The use of denormardization should be carefully meaningful. You need to be sure that the normalized structure, optimized requests and correctly configured indices are no longer able to satisfy the performance criterion.
The advantages of a relational approach:
- Determination of complex relationships between objects,
- Normalization and denorMalization of data,
- structured query language,
- A rich history of development and widespread (the main tool for the development of various applications and services).
Disadvantages of the approach: a rigid structure of information about objects.
Examples: MySQL, Mariadb, PostgreSQL, SQLITE, etc.
Nosql and non -agency databases
All the advantages and disadvantages of the relational database are based on strict structuring and typification of information about objects. On the one hand, it is possible to optimize the storage and indexing of data due to normalization or denormation. On the other hand, it is difficult to organize the storage and processing of poorly structured (for example, cache objects) or not at all structured data (for example, data from several sources).
To combat these restrictions, a family of non -human database was developed. Consider them in more detail.
This is the simplest variety of non -agency databases. Data is stored in the form of a dictionary where the key is the indicator.
- Storage and processing of data different in type and content: in one storage under different keys, files, lines, text, numbers, JSON objects and other data types can be located.
- High speed of access to data due to targeted storage.
- Light scaling. You can create the rules of hearthings on certain keys – for example, sessions of users of different sites are stored in various segments of the database.
Restrictions: since the approach does not imply rigid typification and structuring of data, the control of their validity, as well as the keys, are given to the developer.
Examples: Amazon, Dynamodb, Redis, Riak, Leveldb, various storage facilities of the cache – for example, Memcache, etc.
In contrast to the “key-value” databases, data here are stored in structured formats-XML, JSON, BSON. However, targeted access to the key data is maintained. In this case, the contents of the document may have a different set of properties.
For example, a user profile catalog: one indicated a favorite dish as preferences, and the other a video game. Since this information cannot be stored in one field due to logical and structural disunity, they are recorded in certain properties of individual documents. If necessary, you can add new properties to the documents without violating the general integrity of the data.
- well suited for quick development of systems and services working with different structured data,
- Easily scale and change the structure if necessary.
Examples: Mongodb, Rethinkdb, Couchdb, Documentb.
This family of bases is intended for modeling complex relationships using the theory of graphs, where the ribs of the count are the connections, and the objects themselves are nodes or peaks.
This approach can be useful when analyzing profiles of social networks users. One user is signed for updates to the second, the other user is signed for a particular community and so on. The technology can also be used in the analysis of the economic activity of counterparties to identify various fraud schemes. For example, you can track the use of certain accounts, cards or details of counterparties in various operations.
Features: High performance, since bypassing the ribs and vertices is much faster than the analysis of many external and internal tables and their compounds under the condition of selection in the relational database.
Examples: neo4j, Janusgraph, dgraph, Orientdb.
As you can understand from the name, records in such bases are stored not on lines, but by columns (columns). Instead of tables, column families are used here. They contain the keys indicating the format of the line of recording information about the object. Each line has its own set of properties, which allows you to store various structured data within the framework of one family.
The technology is actively used in the construction of analytical systems and services working with large amounts of data.
The figure contains an example of column storage of fruit information. Three types of fruits are known: apples, grapes, bananas. All of them are combined into the fruit family.
Each fruit has an individual set of properties. For apples, this is color, price and availability. For grapes, this is color, price, number of berries in conjunction and origin (imported or not). For bananas, this is color, price, number in conjunction and maturity.
To get a detailed summary of one type of fruit, it is enough to indicate its identifier in the request. At the same time, you can build an analytical request for common signs for the whole family – for example, calculate the number of fruits with a group in color, calculate the average price for all fruits in the store, etc.
- With a group of properties according to the columns, when requested, a smaller amount of data is indexed, which ensures a high speed of its execution.
- The wide possibilities of scaling and modifying the structure – so, when adding new columns, they will not have to be strictly formalized, as in the case of relational bases.
Examples: Cassandra, Hbase, Clickhouse.
Time series databases
This type of database can be used if it is necessary to track historical dynamics in a number of indicators. Here the data is grouped by temporary marks. Temporary databases are more often focused on recording than to build complex analytical requests.
The figure above provides an example of using such a database to monitor the state of PC in time in a number of indicators – the temperature of the processor, loading the system and the consumption of RAM.
Features: You can process a constant flow of input data.
Restrictions: productivity depends on the volume of incoming information, the number of tracked metrics, as well as a temporary lag between the recording of new data and reading requests
Examples of database: Opentsdb, Prometheus, Influxdb, Timescaledb
This variety of bases combines SQL and NOSQL approaches to the organization of storage and processing of data. This class class includes Newsql and multiple solutions. Consider them in more detail.
This type of solutions for storing information seeks to ensure a compromise between scalability and consistency while maintaining a relational approach.
The term was proposed in 2011 by 451 Group analyst Matthew Aslet. He noted a high need for such systems for areas working with critical data – healthcare, FinTECH, etc. The characteristic signs of these solutions are: the use of consensus providing algorithms (Paxos, RAFT, etc.), cartilation and sharpening for horizontal scaling.
- Wide sculpture opportunities,
- High performance and data availability.
Restrictions: high requirements for hardware resources of developers. But if the product being developed is a highly loaded system, then the use of such a database makes sense.
Examples of bases of this type: Memsql, Voltdb, Spanner, etc.
Such databases combine several approaches to the organization of data at the same time. This provides functional diversity in the development of systems with their use.
- The ability in one request to work with data stored in different types of databases, without violating consistency;
- Extensive scaling opportunities due to easy integration of new databases into the existing project infrastructure.
An example of a solution of this type: Arangodb.
Database in Selectel
In Selectel, you can launch ready -made cloud databases – we support such DBMSs as PostgreSQL (including 1C: Enterprise), MySQL, Redis, Timescaledb.
Cloud databases make it possible to exclude work with the infrastructure: you can raise the right number of GCDs in a few minutes in the company’s management panel. The solution is fault -resistant and is easily scale. For emergency, reserve copies are created for the rollback of the base condition for up to seven days.
Most routine system administration operations (setting, configuration, maintenance and security) are carried out by Selectel specialists.
In this article, we examined 11 types of databases. Everyone has its own characteristics and restrictions. The decision to choose one or another type must be made taking into account:
- the complexity of stored data and the relationship between them,
- productivity of reading/recording and modification of the structure of the database on the planned volume of data,
- development of the development team,
- The stages of the life cycle of the product being developed (do you finalize the current solution or create something fundamentally new, what are your current and promising resource capabilities).