Tuesday, August 29, 2006

AI-engineered codecs?

Someone else has probably had this idea, and someone else has probably explained it better than I'll be able to. Someone else yet has probably proved that the whole thing's implausible. Nuts to them; I have a fleeting thought and I'm going to publish it! :^)

I'm not very well-versed in either genetic algorithms or neural networks, but neither are those just buzzwords to me - I have written genetic and neural equivalents of "Hello world". The techniques differ, and stacking layers of these algorithms has different effects, but the thing that makes them interesting (to me, at the moment) is that we wind up with a computational system without the programmer having to actually solve the problem. A typical neural network will take a piece of input (say, a scanned image of a paper) and produce some output (say, a set of bounding box coordinates of each interpreted dark blob on the image). These outputs may then be the input of another neural net which produces some more output (say, ASCII characters interpreted given those dark blobs) and/or a non-neural algorithm that just looks for patterns (say, a spellchecker). Writing an actual algorithm to turn an image of a paper into a text stream would be massively hard, but writing and training some neural nets is much less painful. Let the computer do the computing.

In the world of Free software (free as in liberty), a world in which I happily live, there are some struggles happening around various proprietary formats. A while back, Unisys declared to the world that they owned math and thus we all had to pay royalties if we wanted to use .gif files. That particular software patent has since expired, but nowadays there are companies convinced that they own the math necessary to encode and decode MP3's and such, and even worse, there are widely-used formats like Apple's "Sorensen" QuickTime and Microsoft's Windows Media for which no Free codecs exist. There's not much motivation for someone to hack such things up, either, as they'd be sued into oblivion the moment their code became useful. God bless the US, eh?

Forget reverse-engineering, forget exploiting the closed code, and especially forget paying big corporations royalties for the privilege of doing math. I propose that it might be possible and worthwhile to write some AI to figure out these pesky formats for us. We have input, say a QuickTime .mov file, and we have the desired output. (Scrape all the raw bitmap image data and all the output audio. It'll be huge, but that's our target.) I don't know what sort of scale, how many layers, or what ghastly amounts of memory and floating-point muscle it would take to set up and train a neural net to get from point A.mov to point B.raw for any given movie, but I doubt it's impossible. Flip the input and output to train an encoder. Not too shabby!

I know even less about genetic algorithms than I do neural nets, but maybe they are better equipped to solve the problem. Plenty of other AI techniques are out there too; perhaps some combination of them would be the optimal approach. The idea is to have a program reinvent codec wheels for us, since others won't share their wheel understanding.

Shackles begone! One way or another, we need to get to the point where the format is irrelevant, only the data matters.

Wednesday, August 23, 2006

Save your ears if Mars attacks

I like to say positive things, to point out, "hey, this is cool" or "try this out". But sometimes, something will come along that requires a little negativity. Sometimes you have to be cruel to be kind. In this case I feel obligated to put up a little warning, a warning about a band whose music is capable of decreasing the quality of your life, and should thus be avoided.

The Red Hot Chili Peppers put on a good show. The music is great, and the multimedia display is mind-boggling. The band that opened for them on this tour, however, was bad enough that I feel motivated to urge you to avoid them. The Mars Volta, as they're called, apparently have plenty of history with the Chili Peppers, but that's no justification for what they did to my eardrums. And my wife's eardrums. She was about ready to leave and hide in a nasty bathroom until they left the stage.
Edit: After hearing more about this band from a friend, I'm willing to permit that this show may have been a very, very off night. It warrants further investigation, but it does not change our first impression...
'Nother Edit: Yeah, they sound much better on their albums. Still not really my thing, but entirely listenable. This warning post stands, however. They put on a crap show once, they could do it again. :^)
To illustrate our criticism of he band's sound a bit more clearly, allow me to list some things their noise managed to remind us of:
  • The voice of any adult in a Charlie Brown cartoon
  • "Freaky Outie", a song performed by 8-year-olds in Home Movies
  • A room full of people playing Electroplankton without headphones
  • Master Shake's misguided birthday song
  • Family Guy's stoned talent show performance (at the end of this clip)
Some of the songs themselves sounded fine, at least until the singer piped in. Like his fellow Texan "The Decider", this guy would do well to just shut the hell up. But worse than the songs themselves was the painfully drawn-out chaos that happened in between. I understand the desire to bridge songs, especially during a live show; that can be lots of fun. Less so, however, when the many members of the band do not figure out beforehand which song they'll actually be bridging into.

Each instrument was normally doing something decently musical on its own, but the band members didn't seem any too interested in what the other guys were up to. Maybe it was some high-minded eclectic design style applied to music. I don't know. But it didn't work. I suspect even the band members realize this. At one point the singer was convulsing on stage - a seemingly epileptic fit from which he unfortunately recovered, and went on to do some more shrieking.

So we had a shrieking, unintelligable singer, and instruments feuding over which song was being played, all while two bizarre images like this were shown on a screen behind them. All that was stuffed through the most irritating reverb effect I've ever heard. Somewhere around two or three hertz, its presence was, sadly, the most consistent part of the band's sound. During relative calms in the traumatic bridging progressions, there was an almost whale-like quality about the effect. As a result, I think I actually hate whales.

If I have failed in my warning to you, or if you are morbidly curious (in which case I have also failed) you may hear some of the things I apparently heard here:
http://www.themarsvolta.co.uk/
The samples on the site haven't been stuffed through the aquatic reverb, nor is anything being bridged to several acoustic destinations at once, but it's still pretty hard to follow.