The image on the left is the 600x446-pixel "original" RGB image; the copy on the right has been tiled into 6706 solid-filled rectangles. See text for discussion.
Conversion of bitmapped images (*.jpg, *.gif, etc.) to vector format (Postscript, SVG) is, in general, a Difficult Problem. In some ways, it's not unlike trying to convert spoken text (in the form of *.mp3 audio) to ASCII. Well okay, maybe it's not that bad, but it's gnarly. You're trying to parse precise geometric shapes out of an ensemble of intensity samples. It's not at all obvious how to do it. Finding a fully general algorithm for describing an arbitrary bitmap as an ensemble of vector-drawable shapes is vexingly difficult. You might as well try to reassemble Yugoslavia.
You have to admit, though, it'd be darned handy to be able to transcode a bitmap image into (say) some flavor of XML (such as SVG). Unlike binary formats, XML is wire-protocol-friendly, queryable at the element and attribute level, transformable using XSLT (and/or E4X, or other technologies), human-readable (bwahhh-ha-ha . . .), and highly compressible. Converting JPEG to XML would open the door (potentially) to many interesting operations. Imagine running XPath queries against images to find specific areas of interest, or morphing one image into another using XSLT.
While there is obviously no one right way to proceed, I'd like to propose an approach to the problem (the problem of how to transcode bitmaps to SVG) based on quadtree decomposition of an image into polygons -- rectangles, in particular. It's possible to apply the same approach using triangles as primitives, instead of rects, or spline patches for that matter, but rectangles offer some noteworthy advantages. Most images are rectangles to begin with, of course, and parsing them into smaller (but adjoining) rects is a natural thing to do. Once parsed, rects allow a relatively compact representation; and at render-time, not many solid shapes are easier or faster to draw.
We have the considerable advantage of being able to start from the perspective of seeing a bitmapped image as just a grid of one-by-one rectangles (known as pixels). Two adjoining pixels of the same color become a 1x2 solid-color rect, four adjoining identical pixels become a 2x2 or 1x4 rect, and so on. Surely we can parse out the "naturally occurring" solid-filled rects within a pixel lattice? Those rects are instantly vector-drawable.
If we do this, we'll get what we get -- probably a very small handful of accidental finds. But we can do better, if (through preprocessing) we quantize pixel values in such a way as to force nearly-equal pixels to be equal in value. This will cause the appearance of a higher percentage of solid-color rects within a bitplane. Of course, when we're done, there's still no guarantee we will have covered the entire visual plane with rects (unless, as I say, we count individual pixels as 1x1 rects). Nevertheless, it's an interesting strategy: Force adjoining almost-equal pixels to be equal in value, then aggregate them into solid-color rects. Deal with the leftovers later.
At the extreme, we can look at the entire image as being a collection of pixels that are nearly equal in value. There is actually considerable variance in pixels, of course (unless the image is truly not very interesting). But what we might do is quantify the variance in some way, and if it exceeds a threshold, divide the image into daughter rects -- and begin again. We know that eventually, if we keep subdividing, we'll get to individual pixels (1x1 rects) that have zero variance within themselves. By this sort of strained logic we might convince ourselves that there should be less variance in small rects than in large ones, as a general trend.
Let me get right to it and propose an algorithm. (Listen up.) Start by considering a rectangle covering the whole image. Going pixel by pixel, calculate some variance measure (such as root-mean-square variation from average) for all pixels, for the whole region. If the variance is low (lower than some arbitrary limit), consider all pixels within the region to be equal; define the region as a rect of color Argb (representing the arithmetic average of the pixel values). If the variance exceeds an arbitrary threshold, subdivide the image. Specifically, subdivide it into four (generally unequal) rects.
To determine where to subdivide, calculate the "center of gravity" of the parent rect. A pixel's lightness or darkness, multiplied by its position vector, gives its moment-in-X and moment-in-Y. Add all the X moments together (and the Y moments). Divide by the number of pixels. The result is the visual center of the region. That's where you divide.
Repeat the procedure on newly created rectangular regions. Any regions with low variance can be rendered as a solid-color rect; the other regions are subdivided, and the algorithm continues until reaching individual pixels, or until every region larger than a pixel was successfully encoded as a rect.
This general technique is known as quadtree subdivision and it lends itself well to recursive implementation. You're not likely to get into stack-overflow problems with it, because with four times as many rects at each recursion cycle, you'll have created (potentially) over 1000 rects in just 5 cycles. Ten levels deep, you've created a million rects. Better start worrying about heap, not stack.
A demonstration of the technique can be seen below. The image on the left was created using a fairly "insensitive" variance threshold of 29 (meaning that subdivision did not occur unless a region had an RMS variance of at least 29, out of a possible range of pixel values of 0..255). Because subdivision happened infrequently, only in the very noisiest of pixel regions, smaller rects tended to cluster in areas of high detail (high variation in pixel intensities), such as around the edges of the iris and eyelashes. The image on the right shows that with the RMS threshold set lower (at 22), we get more detail -- about 300 rects total. (White outlines around the rects have been omitted on the right image.)
The image on the left has been parsed into 136 solid rectangles (with white outlines for illustrative purposes) using the recursive quadtree decomposition described in the text. Notice how subdivision into smaller rectangles tends to coincide with areas of high detail. The image on the right has been parsed into 334 rects.
The algorithm is tunable in a couple of ways. The most obvious way is via the variance-cutoff parameter. If the variance threshold is set low, it means that the slightest bit of "noise" in a given region will trigger subdivision of the region (and continued operation of the algorithm). However, we have to stop subdividing before reaching individual pixels. So another tuning variable is the minimum tile size.
The image on the left is a collage of 790 solid-filled rectangles. At a rect count of 2209, the image on the right is starting to have a bitmap-like feel.
The code that produced these images consists of 200 lines of JavaScript (shown below) and another 200 lines of Java utility routines (in a class called ImageUtilities: see further below). In addition to that, you need the ImageMunger Java application that I described in yesterday's post (yet another 200 lines or so of Java). First, the JavaScript:
To use this file, give it a name like tiles.js, then run ImageMunger (the Java app I described in yesterday's post) from the command line, passing it the name of the image you want to modify, and the name of the script file:
> java ImageMunger myImage.jpg tiles.js
The entry point for the script is doQuadding(), which you can call with a 3rd argument of "svg" if you want to write output to a Scalable Vector Graphics file. Otherwise pass a 3rd arg of "preview" and ImageMunger will simply render the transformed image to a JFrame.
The script makes reference to a number of utility routines written in Java. The utility routines are in a class called (what else?) ImageUtilities, as follows. The routines should be fairly self-explanatory.
Even though the main routine is in JavaScript, the overall processing runs quickly. The algorithm executes in near-linear time and can output around 5000 rectangles per second (to screen, that is; disk I/O not included).
Projects for the future:
- Write gradient-filled rects instead of solid-filled rects, choosing the gradient endpoint colors in such a way as to minimize the differences between the output and the original image.
- Given two SVG images, each an ensemble of rects (preferably equal in number), write a transformation routine that morphs one image into the other via transformations to the individual rects (and their colors). Store the second image as simply an ensemble of transformations (no rects). The first image provides the "reference rectangles" that will be transformed.
- Write a public key image-encryption routine based on the foregoing notion, such that Image A becomes a key that someone can use to encrypt Image B, with private-key Image C unlocking B.
- Instead of writing an SVG image as an ensemble of individual rects, write it as one rect that is repeatedly resized and repositioned via successive affine transformations.
- Encrypt an image using the above technique (rewrite one rect many times through transformation) in concert with matrix hashing. Write an SVG image-encryption routine whose difficulty of decryption depends on the non-commutative nature of matrix multiplication and the numerical instability of inverted matrices.
- Write a "chunky JPEG" routine that essentially uses the quadtree decomposition to chunk the image prior to discrete cosine transformation, instead of using the (canonical JPEG) 8x8 chunking.
- [ Insert your own idea here! ]
No comments:
Post a Comment