NoSQL Databases

The so-called “NoSQL” Databases are document-oriented systems as opposed to the “numbers in tables” approach of the traditional and now desperately outdated relational database model.
They represent an enormous step forward from the conventional RDBMS (Relational Database Management System). But most clients are so locked in to the notion that RDBMSs are what computers are, that they cannot visualize a world not built around SQL, and so the NoSQL community has adopted the position that “No SQL” really just means “Not Only SQL.” I simply don’t accept such a weakness. I am unashamedly campaigning to put an end to a technology that should have left the computer world thirty years ago.

So what are the advantages of NoSQL Systems?

  • Firstly, they are user-oriented rather than computer-oriented. Their concept and their design reflect what people need and want, not what the limitations of the computers of the 1970s were restricted to giving them.
  • Their search systems are much more closely related to the “quasi-intelligent” and highly flexible Search Engines used by organisations such as Google than the rigid mechanics of a primitive programming language (SQL) which, frankly, was already in conflict with every principle of good language design when it was introduced in the 1970s.
  • Their approach to data storage is hierarchical. This idea of categories within categories much more closely reflects how most data are encountered in reality than does the concept of side-by-side tables.
  • What they store is naturally flexible instead of having flexiblity artificially grafted on by such concepts as “blob” fields.
  • They come in a huge variety of forms so you can choose what system will best suit your particular needs, and this is where my expertise and advice would prove invaluable.
  • Because links to other data are not accessed from first principles every time they’re needed using external indices, NoSQL databases can be much faster.
  • For the moment, most of them are either actually FREE and open-source or provided by independent organisations. So I strongly advise getting into using them now. The big players like Oracle and MS are clearly aware of the threat and Oracle’s absorption of the most popular independent RDBMS, MySQL, shows the way that things may soon go.

The basic problem with the traditional design was, in a nutshell:

  • Efficiency: 98% of traditional databases are filled with zeroes!
  • Inflexibility: the zeroes are there because the traditional paradigm had to provide for every possibility in every record even though 90% of the fields so set up were only used by 0.00001% of records.
  • A total failure of the data structure to reflect the structure of the actual data needing to be stored.
  • An inability to recognise that the vast majority of data is hierarchical and involves a lot of cross-referencing that must be fast.
  • It made next to no allowance for the closing of the gap between RAM (semiconductor memory) and HDDs (hard-disc drives) both in speed and in cost, and the fact that such a gap as still exists is neatly bridged by Flash memory, which isn’t volatile.

I used to joke that the traditional model was developed on these leading assumptions, set out as a step-by-step unfolding argument:

  • Access to HDDs (hard disc drives) was so immeasurably slower than access to RAM that they required a totally different approach to data storage.
  • RAM was also incomparably more expensive, costing of the order of hundreds of thousands of times more per unit of data than disc.
  • RAM had to be regarded as hopelessly unreliable: anything in RAM was volatile and would immediately be lost on even a momentary power failure.
  • So everything that mattered had to be stored and continuously maintained on electro-mechanical HDDs.
  • Because of its extreme slowness, data on disc had to be very rigidly organised and there was no practical possibility of cross-referencing with it.
  • The organisation of disc data had to be kept as separate as possible from the actual data, but still had to reside on the HDD.
  • The only meaningful way to store data on HDDs was in tables.
  • Any tables had to have rows of data that could be punched into 80 character punched cards because not only RAM but HDDs were still desperately unreliable and the only things that could ever be trusted were punched cards.
  • Taking these factors together, any element of data had to be accessed from scratch by an elaborate search system every time it was needed.
  • No-one would ever want to store anything on computers except names and salaries.
  • Computers could not and never would be able to store any form of images.

Looking at this list again about four years after I wrote this in, perhaps, an excess of facetiousness, I have to admit it still reads true! I would now stress, that although you can do a surprising lot with SQL when you get the hang of it – and I’m pretty good with it, don’t get me wrong – the problem with the RDBMS concept is simple:

The underlying way the data are stored, in two-dimensional tables, is hopelessly inflexible.

It’s time these antiques were recognised for what they are – sister technologies to the looms of Joseph-Marie Jacquard (c. 1810) and the similar punched card systems of Hermann Hollerith (c.1890).

By combining different approaches to data storage in a single system, an approach sometimes called polyglot data storage, you can match your database system to your actual needs far more effectively.