Tuesday, August 29, 2006

AI-engineered codecs?

Someone else has probably had this idea, and someone else has probably explained it better than I'll be able to. Someone else yet has probably proved that the whole thing's implausible. Nuts to them; I have a fleeting thought and I'm going to publish it! :^)

I'm not very well-versed in either genetic algorithms or neural networks, but neither are those just buzzwords to me - I have written genetic and neural equivalents of "Hello world". The techniques differ, and stacking layers of these algorithms has different effects, but the thing that makes them interesting (to me, at the moment) is that we wind up with a computational system without the programmer having to actually solve the problem. A typical neural network will take a piece of input (say, a scanned image of a paper) and produce some output (say, a set of bounding box coordinates of each interpreted dark blob on the image). These outputs may then be the input of another neural net which produces some more output (say, ASCII characters interpreted given those dark blobs) and/or a non-neural algorithm that just looks for patterns (say, a spellchecker). Writing an actual algorithm to turn an image of a paper into a text stream would be massively hard, but writing and training some neural nets is much less painful. Let the computer do the computing.

In the world of Free software (free as in liberty), a world in which I happily live, there are some struggles happening around various proprietary formats. A while back, Unisys declared to the world that they owned math and thus we all had to pay royalties if we wanted to use .gif files. That particular software patent has since expired, but nowadays there are companies convinced that they own the math necessary to encode and decode MP3's and such, and even worse, there are widely-used formats like Apple's "Sorensen" QuickTime and Microsoft's Windows Media for which no Free codecs exist. There's not much motivation for someone to hack such things up, either, as they'd be sued into oblivion the moment their code became useful. God bless the US, eh?

Forget reverse-engineering, forget exploiting the closed code, and especially forget paying big corporations royalties for the privilege of doing math. I propose that it might be possible and worthwhile to write some AI to figure out these pesky formats for us. We have input, say a QuickTime .mov file, and we have the desired output. (Scrape all the raw bitmap image data and all the output audio. It'll be huge, but that's our target.) I don't know what sort of scale, how many layers, or what ghastly amounts of memory and floating-point muscle it would take to set up and train a neural net to get from point A.mov to point B.raw for any given movie, but I doubt it's impossible. Flip the input and output to train an encoder. Not too shabby!

I know even less about genetic algorithms than I do neural nets, but maybe they are better equipped to solve the problem. Plenty of other AI techniques are out there too; perhaps some combination of them would be the optimal approach. The idea is to have a program reinvent codec wheels for us, since others won't share their wheel understanding.

Shackles begone! One way or another, we need to get to the point where the format is irrelevant, only the data matters.

No comments: