Ahh it's Big Data!

Big Data. It is a silly name for a very boring thing. Big data as a term really just defines a scenario where you have so much data that traditional analytics fall apart or become inefficient.

What is hilarious though is that everyone talks about it like its "the future".

The term may have only become pervasive recently... but it has conceptually been around for a damn long time. Google has been a master of Big Data long before the term ever existed and in many business and scientific fields it was around long before that. Big data didn't kill off "little data" then, and it isn't about to now, or likely ever.

Big data is certainly a bigger deal and on a wider scale now than it was in the past. More markets are becoming globalized, and more markets are growing at rapid paces (IoT) and lots of information in these scenarios has a lot of value to a lot of people.

But guess what? Most of us don't care. And likely never will. And likely never need to.

Data which isn't "big data" is generally much more important to the average person on the average day. I don't need a Big Data capable storage system for my contacts. My wife doesn't need it to run her business. In fact, I'm having a REALLY hard time thinking of a time I've ever needed or would have even found my life enriched by having Big Data at my disposal.

Many also think that there will be some fundamental shift in technology as a result. But I think that too is incorrect. By some measures, yes, Big Data databases (NoSQL/Non-RDBMS/etc...) can encapsulate models that traditionally used Relational data models. But do they do it better? A matter of perspective to be sure. NoSQL databases outperform the living crap out of RDBMSs by throwing caution to the wind. Table locks? Nope. Guaranteed data integrity/accuracy? Nope. Blazingly fast? Sure.

On every day data models that most of work with (whether we know we're using them or not) there is no measurable performance difference to the end user between a traditional RDBMS and a NoSQL DBMS. But the RDBMS provides a number of assurances, that even on small scale data the other can't offer. I find it hard to believe that there will ever be a wholesale abandonment of RDBMSs. I can readily accept that in many cases, perhaps even some where they shouldn't, that people will move to NoSQL DB's. But there will always be those who are smart enough to see that there is no such thing as a free lunch and the performance gains NoSQL has over RDBMSs is no exception.

The other thing of note is of course, realistically, most NoSQL databases will still store relational objects. Articles like to claim that documents are king because they are weakly typed and thus less restrictive and quicker to code against. I think that is a rather weak argument. For a small, short term project the argument would likely win out. Perhaps in the sort of sphere where most development is contract work. Hmm sounds an awful lot like Web Development, where DOM is king.

A document DB would allow you to quickly grow your data model as you discover or are given new requirements. But the reality is, at some point the code or even the UI depends on the model and if it is too easy to change it will eventually become too dynamic to contain. This is why application developers are less likely to default to a NoSQL approach. It is also probably a good reason why Typescript is catching momentum. As applications developers move to the web as a cross platform solution more and more big persistent teams on big persistent projects move to document based development and find that while the code is fast it simply isn't manageable when the team and project scale.

If anything, I expect hybrid solutions to become king. And I mean true hybrids. I haven't seen any mainstream ones yet, but I fully expect to see something in the next 5-10 years. A hybrid in my opinion would allow both structured and unstructured data (and perhaps even loosely structured data). And the structured data would be enforced in much the same way a traditional RDBMS would and also offer the data integrity controls, whereas data which is less volatile could be persisted in another "region" of the database where the structure is much "freer" and performance is king.

I say this because the reality is that most complex apps can benefit from both sides of the equation. Most apps have critical transactional data while at the same time having data which is less volatile and more document like in nature to begin with, like logs. I'd very much love to have a single DBMS where both are able to be managed accordingly so that I can trade off performance for data integrity/validity for my mission critical tables, but then focus more on performance and less on structure and worrying about the state of data being retrieved for logging information.

Comments

Popular Posts