Pyrrhologic: Video Site

So I accomplished a fairly significant refactor of the video server and I realized I'd like to use the blog to decompress from working on it. It's a chance to talk a bit of technicals, youtube comments won't really allow proper space for.

So while the buzzword "cloud" has been around for a while, and is just a word for distributed computing... the fact is the true power of distributed computing has flowered slowly. It doesn't seem that way, but it has. We struggle to do these things. What I'm excited about now is the advent of open source clouds, i.e. OpenStack. Buzzword-wise, programming on the Google App Engine is using "the cloud" or "cloud computing". But it's proprietary. The OpenStack way allows me to have a virtual machine. It could even be a windows machine. And it gets a part of a real machine to run on. Since it runs in a virtual machine, it can be copied to disk, and then cloned.

So the advantage of this is that if someone is good at designing distributed systems, then when they have a good version, they can take a copy of that image, and it can be deployed in minutes to new servers. The servers wake up and cooperate (like a cloud) to provide some service. This is what large web sites do anyway, many servers pretend to be "Google" etc. The cloud is really a word for operating that way without needing a multimillion dollar server farm, to by parts of that resource at commoditty pricing.

So the server is Ubuntu 12.something something. I installed some software like clipbucket (tried one called Mediacore) but found it's a pain. Integrating with it is a pain. Accomodating it is a pain. And I discovered that in the last 6 months the first step of the future arrived, the video tag exists in the latest browsers of significance (bite me IE), and they all play .webm.

I am running apache and using mod_wsgi to run a python module. The standard approach is to use some SQL database, and I figured I'd have to do that, but I'm not that thrilled working with them. They tend to take over the structure of the project. My python modules were using the disk directly. Of course, eventually that is not ok, because they'll write over each other.

However, I've decided to write a C++ server to organize the data on a distributed database idea that allows data to be on disk. The central server will ensure the files are not corrupted, and will be able to compile indexed files for high speed access as the amount of data gets larger. People that want to use SQL and SQL based tools on the database will be able to do so, but that will be exported from what the live systems use. I think, after all, when using a database, you have to get the data into memory and use it somehow, so you can just keep it in that form with lots of data exporting to allow alternate indexing, etc, for different purposes.

Anyway, I got the python apache based server asking the c++ process for the video information instead of using naming conventions and going to disk, via back end http request. At the moment the data fragments are just JSON that was dropped by the upload program. This design allows me to support that, and when the number of videos gets to high, or rather, by that time, I'll be able also to compose the complete database into an indexed form that supports the larger number. I'll do this while still allowing utility programs and web page services to drop files like notes.

My data organizer (jcn) uses inotify to watch the disk for new files of interest to index and compose.

Pyrrhologic

Monday, November 12, 2012

Video Site - JCN

No comments:

Post a Comment