Multi-DB Applications

Something I'm becoming increasingly VERY into are applications with multiple persistence layers.

Arguably, virtually every application ALREADY has multiple persistence layers. Most have a DB and then also get some configurations from the file system or an in-memory data store. In addition a lot applications will also use a separate cache like Redis (which is also a database).

But, what I'm talking about here is embracing the fact that we already do this and then taking it to the next level; using multiple database within an application for storing the same "types" of data. Not every caching problem is best served by something like Redis. And not every piece of application data is best suited to a particular DBMS.

Some tables contain data which is regularly read and manipulated ("operational" data). Other tables are regularly written to, but rarely read (log tables), yet others are read but rarely modified (settings). And then, some of these tables are huge, while others may contain just 1 or a few records and a lot fall in between.

Needless the say, the demands on these scenarios put very different strains on the underlying DB. And even within those examples there can be a near infinite number of permutations of how the data is used under the circumstances.

And so I'm more than a little perplexed that we haven't really built a lot of infrastructure or libraries around being able to be more flexible. Some people may say "that is why you should build microservices as each can use a different DB". But, the problem in my opinion is that whatever criteria you use to define the boundaries of those services, you don't really eliminate the problem.

The reason that the problem can persist is that it can often make sense to have multiple layers or representations of the same data. A very high usage system for example may need multiple caching layers. Or may need to reduce pressure on the DB by offloading certain reads to a replica. 

These are real, if borderline contrived examples, and I'll acknowledge that. The important point is that these are things done in the real world because they offer real world advantages. I would speculate that there is a lot of software out there which would benefit from some of these solutions. But they simply view them as too much effort or cost to implement.

Put another way, we don't do this more often NOT because it is a bad idea, but rather because it is perceived to be difficult or expensive to implement. But I would counter by suggesting that once an application scales up high enough, they end up paying more (whether that be cash, or dev hours) just to ensure that their infrastructure can can keep up with demands. Demands that are exacerbated by using the wrong solution for a certain sub-section of the problem.

It is possible to simplify a lot of this. How do I know? We've already done it on some level in a number of places. Perhaps two of the most popular are Hibernate and Entity Framework. These are both ORMs, but that isn't the only other way we do this sort of thing .Net has interfaces like those used for their distributed caching which tackles it from the angle of caching for example.

The point here is that all it takes is the right abstraction. Hibernate and EF weren't written specifically to make it easy to manage multiple connections at once. But both are perfectly capable. Just wrap a factory pattern around them and all of a sudden you can use the same HQL (Hibernate) or Linq (NHibernate/EF) and talk to a wide array of DB back ends. Or even just multiple instances of the same DBMS and scale that way.

Like anything else, you have to avoid going crazy. If you need multiple records updated in a single atomic transaction, you wouldn't want to spread it across databases unless you were getting real value for the effort and knew how to safely undo the work in a partial commit and how to accurately determine when they've happened.

Just as I suggest with microservices, I would suggest to find an area of the application which logically acts different from a data persistence perspective and try it out there first. Things like logging, settings, localizations, etc... tend to be used very differently than your main application data. As a result they are also likely to be places which would benefit most from a different DB and don't pose a ton of concern around transactions and rollbacks. Or maybe try a multi-layer cache.

The idea is simply to build out the infrastructure or understand how you would fit such a solution into your architecture. Having easy answers means having options when you need to add a new feature or address performance issues. Not having an answer at the ready often means trying to shoe-horn a solution in where it already wasn't working in the past or might not fit if it is something new.

Comments

Popular Posts