I’ve been playing around with the deep dream/inceptionism neural networks since Google Research released their source code. Getting the dependencies set up is a real pain, but now that it’s working I can write arbitrary Python code to script it.
The first difficulty is that the default settings are already cliche and a bit horrifying. Once you get past the shock value, turning everything into melted dogs is jejune.
Switching training data helps a little, but the MIT Places data has its own strange attractors. Basically, the recursive nature of the algorithm will inevitably lead to it getting stuck in a local maxima. Which has a lot of baseball diamonds in it, apparently.
See? Pretty ugly once it gets stuck in an endless loop of baseball diamonds and pagodas that are fractally assembled from baseball diamonds and pagodas. Well, the pagodas are nice and fractally, but kind of repetitiously boring.
And that’s the main problem: because the recursive modification is based on image recognition, it transforms unique input photos into a pattern it recognizes. And humans are even better at pattern recognition, so we can pretty quickly get to the point where we just glance past the next hundred melted dog-slugs and baseball pagodas.
One solution is to keep the intervention fairly subtle:
You have to glance closer to see the changes to the original image, and it gives the whole thing a kind of impressionistic swirling feel. You can also change it up by using different layers in the network and combining them in different ways, but you’re still getting stuck at local maximas. So subtle is better, but still eventually falls into the same old patterns.
Especially when you do an animation and zoom deeper in:
Still not too bad, but you keep running this and you end up with baseball diamonds again. (About 300 frames later in this case.)
As a side note, these animations have an interesting property: if you feed in the same input they are deterministic, so you can pick a frame you like, tweak the settings, and watch it veer off in another direction.
That helps, because the big problem in keeping these images fresh and interesting is to get new information into the system. Fortunately, the original developers have some help, because yesterday they released some code that allows for the use of guide images. (Somewhat reminiscent of the project I wrote about on the procgen blog.)
For example, this takes the sequence that used to produce that ugly baseball diamond image above and uses a photo of pebbles that I took as a guide image:
Much better. No more baseball diamonds.
From here, designing a system that uses a bunch of guide images to keep things fresh is a good next step. (Though mine is currently taking forever to render the dozens of frames I want for the subtle transitions.) Ultimately, I suspect that a custom-trained CNN will produce the best results, but that will likely require a million or so categorized photos. Just please, no dogs this time.