PostgreSQL 9 High Availability Cookbook

21 Aug

6969OSPostgreSQL 9 High Availability Cookbook is a very well written book whose primary audience are experienced DBAs and system engineers who want to take their PostgreSQL skills to the next level by diving into the details of building highly available PostgreSQL based systems. Reading this book is like drinking from a fire hose, the signal-to-noise ratio is very high; in other words, every single page is packed with important, critical, and very practical information. As a consequence, this also means that the book is not for newbies: not only you have to know the fundamental aspects of PostgreSQL from a database administrator’s point of view, but you also need to have solid GNU/Linux system administration background.

One of the strongest aspects of the book is the author’s principled and well-structured engineering approach to building a highly available PostgreSQL system. Instead of jumping to some recipes to be memorized, the book teaches you basic but very important principles of capacity planning. More importantly, this planning of servers and networking is not only given as a good template, but the author also explains the logic behind it, as well as drawing attention to the reason behind the heuristics he use and why some magic numbers are taken as a good estimate in case of lack of more case-specific information. This style is applied very consistently throughout the book, each recipe is explained so that you know why you do something in addition to how you do it.

After the first chapter on basic planning, the author jumps to a set of miscellaneous topics in the Chapter 2, and details some important tricks such as defusing cache poisoning, concurrent indexes, and Linux kernel tweaks. This chapter starts to reveal another valuable aspect of the book: the information regarding an open source RDBMS such as PostgreSQL is freely available on the Internet, but depending on your needs, a particular set of information can very well be scattered over a lot of e-mail list messages, forum posts, Wiki pages, etc., and it takes a disciplined mind with a lot of field experience to put all of that scattered information into a single, consistent, logical and easy to follow form.

Starting from Chapter 3, each chapter explores a single topic in a lot of practical detail, starting with connection pooling. This chapter, as well as almost all of the remaining ones has a nice feature: the author always try to explain alternative solutions, describes their advantages and disadvantages, and where possible shows how to combine some alternatives to get best of each.

Chapters 4 and 5, namely Troubleshooting and Monitoring can be thought as a single chapter, because it is difficult to think these fundamental concepts separately. These chapters are also not only valuable for PostgreSQL DBAs but for any DBA or any GNU/Linux system administrator in general. Troubleshooting and monitoring a highly available database requires a book by itself, but since this book’s scope is clearly defined, the author provides enough background and practical starting points in about 70 pages.

I can easily say that Chapter 6: Replication, together with Chapter 7: Replication Management Tools starts to form the ‘meat’ of the book; without successfully implementing and practically managing the replication of your critical database servers, it is impossible to think about building a highly available system, in other words, you need at least one replica of your database system, so that if your primary system goes down, you can very easily switch to your replica (or offload some of your less criticial applications to your replica and relive the stress on your primary system). These two chapters presents you the solid and practical information to achieve that goal. Similar to the previous chapters, the author shows and explains many useful and practical tools, he also does not refrain from presenting an open source tool, walctl, that he developed to as a “PostgreSQL WAL management system that pushes or pulls WAL files from a remote central storage server”. I consider another positive point for the book because it clearly indicates the serious time investment of the author for PostgreSQL and its high availability configuration.

Chapter 8: Advanced Stack, is aptly named, because this chapter, together with Chapter 9: Cluster Control, forms the most advanced and complex part of the book. The author’s warnings regarding the information density, and related real-life complexity of the topics explained in these two chapters should not be taken lightly. Indeed, there are many combinations of events that can lead to subtle and hard to debug errors in case of clusters set up to take over from failing nodes. Creating such a highly available system with Linux based tools such LVM, XFS, DRBD, Pacemaker, and Corosync requires careful planning, probably experimenting in a safe virtual environment, and then a disciplined execution, as well as monitoring. Again, these chapters alone include topics that can take a volume, and a detailed training by themselves, and I think the author kept a good balance between depth and breadth.

Final chapter, Data Distribution, can be considered as a bonus chapter that briefly shows setting up a PostgreSQL server, dealing with foreign tables, managing shards, creating a scalable nextval replacement, and relevant tips and tricks.

There are not many negative sides to this very dense PostgreSQL book. A few minor points that deserves mention are its focus on the most popular Linux distributions such as Red Hat, Debian and their derivatives (FreeBSD and other BSD admins will require slightly more effort), some obsolete networking command usage such as ifconfig instead of ip (but then again, this might be helpful for FreeBSD admins), and inconsistent use of command outputs (sometimes no output is shown, whereas for some commands screen-shots or textual outputs are used inconsistently). One might also argue for a slight reordering of chapters for pedagogical concerns, but then again this is highly open to debate and one’s particular preferences when it comes to system and database administration.

I can recommend PostgreSQL 9 High Availability Cookbook without hesitation to PostgreSQL DBAs who want to push their skill to the next level, and learn the fundamentals of building highly available PostgreSQL based database clusters. It certainly will not be as easy as reading a book, but it is good to know such a book exists as a very good guide.

Leave a comment

Posted by on August 21, 2014 in Books, Linux, sysadmin


Tags: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: