Tuesday, January 29, 2013

noSQL

Firstly, sorry for all the crappy blogs, just something I'm working through, each draft is so different from the next and eventually I just post it.

Anyway, it's funny how ones conscious attention works in computer engineering.  Case in point, I have been exposed to NoSQL and a lot of the tools (by name, not by use) but it didn't click with me at all.  Now, writing to a linus virtual server cloud, it had occurred to me that not only did I not want to insert an RDBMS from the start.  I think they put constraints that impede iterative development, but they are also very useful and powerful... so the idea is to bring them in exactly where you need them, hopefully late.  So I  started working on and designing a different kind of database approach.

Turns out of course this is the whole point of NoSQL, and they put it the advantage as "schemeless".  These components are easy to work with, so even if they are overkill... at least they oppose almost nothing on the internal design. The decision I made to communicate via JSON/HTTP is also present in some of these tools, as is the general idea of distributed associative arrays.

I have found, after a closer look at MongoDB (something I'd heard the name of quite a while ago but not looked into) in reaction to a youtube comment in a video on the subject, that Mongo will do.  Now after another day or two of looking at it I'm very happy with it... it performs well (according to benchmarks) and is easy to link to in C++ and Python, and has a decent shell, and it seems, a lot of other interfaces and tools.  Mongo basically stores and retrieves Nested Associatives Arrays by namespace and searching.  Indexes optimize this and the storage is BSON, which is a binary nested associative array.  This is what the JCN does, but I had no storage, thus this tool search.

What that leave me with now, as far as responsibilities for the JCN comes back to what it does for the video system.  The python web processes upload a video, writing both the video data and the metadata (in a JSON file).  The JCN sees this (using inotify to watch the ingestion directories).  Use of mongo means I don't have to have the JCN return arbitrary JSON documents (things like user settings, etc), and instead it will focus on integration of the data/document store and the file system.

For example, to finish porting my Novem9 interface system off GAE to this LVSC (linux virtual server cloud, aka OpenStack), I need to be able to store "articles"... the equivalent of a blog post or what have you.  As you can see here (http://www.novem9.org/n9/pds).  NOTE: I'm not pointing there because of content, the content doesn't matter and is obsolete... but for the interface system.  Those articles can be edited in place by a sufficiently privileged user.

I need to do this on the LVSC. But also there is a feature I didn't put in GAE yet, that I would like to have, which is revision control for the article as it is edited.  With this MongoDB and JurisdictionalAgent(my C++ daemon) the latter will allowed saving the contents of the article on disk (MongoDB just has the path)... and the JCN can support commiting changes, looking up old versions and the like, while MongoDB ensures the article can be found in searches.

I feel back on track.

No comments:

Post a Comment