PostgreSQL – Oldie but Goldie

anynines_databases_series_wp

When talking about databases, most people think of relational databases in a form of tables. There are some popular relational databases like MySQL, SQLite and PostgreSQL. The last one is an open-source Relational Database Management System, which has more than 15 years of development and one of the most advanced RDBMS. Its SQL implementation strongly conforms to the ANSI-SQL standard. PostgreSQL is also fully ACID compliant and supports Foreign Keys, Joins, Views, Triggers, and Stored Procedures (in multiple languages). In this post we will look at the features of PostgreSQL.

ACID Concept

Let us look again at the ACID Concept. ACID is a concept in computer science to guarantee that transactions are reliable. ACID is an acronym and each letter stands for one principle.

Atomicity

If one part of the transaction fails, then the whole transaction fails and the database data won’t be changed. If every statement of the transaction is successful, then the database data will be updated.

Consistency

The database must be in a valid state all the time. So if a database is in a valid state before a transaction, it must be also in a valid state after the transaction.

Isolation

Transactions are processed one after another. To accomplish concurrent transactions, the database system has to provide a concurrency control method.

Durability

When a transaction is committed to the database, the changes should remain.

PostgreSQL and CAP

When we look again to our CAP-Theorem overview, we see that PostgreSQL is on the CA side.

The CAP-Theorem diagram

PostgreSQL is a RDBMS and rates Consistency and Availability over Partition Tolerance.

Now you might ask yourself what is the difference between Consistence in the CAP Theorem and Consistency in the ACID Concept?

Imagine you have a cluster with several database nodes. The consistency of a single node is described by the ACID Concept and the Consistency of all nodes in the cluster is described by the CAP Theorem.

Multi-Version Concurrency Control

As mentioned above,the database system has to provide a method for concurrency. PostgreSQL uses MVCC to manage concurrent accesses to the database. Instead of accessing the database directly, each transaction works with a snapshot of the database state. So a transaction can’t see inconsistent data caused by other concurrent transactions, which updates the same data.
One advantage of MVCC is that read accesses don’t block write accesses and write accesses don’t block read accesses.

Synchronous and Asynchronous Replication

To ensure data consistency, the database distinguishes two replication mechanisms.

Consider the following example:
We have two servers, one master and one slave. Now assume you want to replicate the data from the master to the slave. You have two possibilities to achieve this: Synchronous and Asynchronous Replication.

What is Asynchronous Replication?

Asynchronous Replication simply means that your committed data to the master can be replicated to the slave after the transaction.

What is Synchronous Replication?

Synchronous Replication ensures that your data are at least on two server when the transaction is being committed.

What features does PostgreSQL provide?

Stored Procedures

Every SQL-Statement must be executed by the database server. That means you have to send all your SQL-Statements to the server, wait until the server processes the query and sends the result back to you. With Stored Procedures several SQL-Statements can be grouped on the server. So you only have to send one request to the server to start the stored procedure.

PostgreSQL supports multiple languages for Stored Procedures, some of them are:

  • Ruby
  • Python
  • Perl
  • Java

Administration and Developer Tools

You can find many tools on the PostgreSQL Website that help you to do administrative tasks, such as tools that help you to import data from another database systems (e.g. MSSQL) .

https://www.postgresql.org/download/products/1-administrationdevelopment-tools/

Object – Relational Modeling with PostgreSQL

PostgreSQL supports the stored procedures for many different languages. Some of them are object oriented languages such as Java. Wouldn’t it be nice when you could use a Object-Relation Mapper instead of mapping your objects to tables by yourself? Fortunately PostgreSQL supports this feature. Although the database stores the data relationally, there are many third party tools, that allow you to store your objects directly to the database.

One of these tools is pBeans for Java:
pBeans is a Object-Relational-Mapper (ORM) that do the annoying work of mapping the data of your object to a relational database.Thanks to that,you can work in Java as usual and you’re not bothered with the SQL behind your application.

Full Text Search

You probably know the LIKE operator, but the capabilities of it are limited. Consider the following scenarios:

  • Sometimes you need linguistic support, e.g. when you search for „satisfy“ but you are also looking for „satisfies“. In simple cases you can use OR to search for both, but in more sophisticated queries the OR operators will escalate quickly.
  • When a query returns thousands of matching documents, it might be useful to have a ranking of each result.
  • With the Like operator you have to run through all the data in the database, because there is no index. In large databases that can be very slow.

With Full Text Search a preprocessor runs through the document and creates tokens and an index. A token is for example a word or a number. Then the tokens are converted to lexemes, in other words they are normalized (remove Uppercases, suffixes like ‚s‘ or ‚es‘ in English), so PostgreSQL can identify them easly. A dictionary is used to perform this task. The preprocessed documents are then stored, so they can be accessed again.

Data Types

PostgreSQL supports many data types. You can find a full list here: https://www.postgresql.org/docs/8.4/static/datatype.html
If you still need a data type which is not in this list, you can create it by yourself with the CREATE TYPE command.

Multi-Index-Query

What does it mean? When you have a query with several different columns and each column has their own index, PostgreSQL will use them all! So you don’t have to create an index which combines all of these columns.

Toast

PostgreSQL uses a fixed page size (commonly 8kB) and does not allow tuples to span several pages. Therefore large data cannot be stored directly in it. To avoid this limitation PostgreSQL compresses large data into several physical rows.

PostGis

If you are working with geographic data you can use this open-source project, which adds geographic objects to your PostgreSQL.

Conclusion

We have seen that PostgreSQL is a powerful RDBMS with many features and you should definitely consider it when it comes to choosing a RDBMS. If you are still sceptical you could have a look at some fork projects of PostgreSQL at:
https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases

Sources:

Leave a Reply

Your email address will not be published. Required fields are marked *