PicTapGo Adjustments

One feature I always wanted from PicTapGo was some basic adjustment sliders for things like adding brightness or contrast to an image. The thumbnail-driven editing is intuitive, but can also be limiting when you are just looking for basic photo adjustments. And in version 3.0, that’s exactly what I did. It’s my favorite PicTapGo feature now, and I’m really proud of it.

ptg_adjustments_tool

The problem was always getting realtime performance from the image rendering. The Edit screen’s preview was originally displayed by a plain ol’ UIImageView – we render a UIImage, and set the view’s image to update it. Basic stuff. But you can’t do that 60 times a seconds. To get the “strength” slider producing realtime updates with the UIImageView-based display, we would render a full-strength image, and then just adjust the view’s alpha value, letting UIKit do the blending. This was a perfectly fine simulation of what Core Image would do in the final render, but it doesn’t work when you’re simultaneously adjusting 4 different parameters.

Enter GLKit, and GLKView.

Rendering to a GLKView is a bit tricker, but it’s fast as hell. So I replaced the UIImageView with a GLKView, and basically implemented an OpenGL-accelerated view that knew how to render a PicTapGo document. Everything upstream of the adjustments render was made cacheable in the view, so as the user scrubs the adjustments slider, regardless of how hairy their filter recipe is, the only thing we have to render is the adjustments themselves:

-(void)glkView:(GLKView *)view drawInRect:(CGRect)rect {
    // Grab a lot of this data from the objects we're referencing, because it
    // might change mid-render, and this ensures we have our own consistent state
    // regardless of whether the world is burning down around us.
    CIImage* result;
    CIImage* baseImage = [[CIImage alloc] initWithImage:self.baseImage];
    CIImage* preAdjustmentsImage = [[CIImage alloc] initWithImage:self.preAdjustmentsImage];
    PTGEditPreviewVersion previewVersion = self.previewVersion;
    AppliedRecipeStep* step = self.document.currentStep;
    NSDictionary* adjustmentData = [step.adjustmentData copy];
    BOOL hasAdjustments = adjustmentData.count > 0;
    CGFloat strength = step.strength.floatValue / 100;
    CIContext* ctx = self.ciContext;
    if (previewVersion == PTGEditPreviewVersionOriginal) {
        // don't render anything, so the original image can show through
        result = nil;
    } else if (previewVersion == PTGEditPreviewVersionPrevious) {
        // just draw the base image (previous image)
        result = baseImage;
    } else if (baseImage && preAdjustmentsImage) {
        // Draw the recipe, including adjustments, for this step
        result = preAdjustmentsImage;
        if (!result) {
            dispatch_async(dispatch_get_main_queue(), ^{
                [self updateImages];
            });
        }
        
        // Blend strength if we need to
        if (strength < 0.99) {
            result = [TRHelper blendFromCIImage:baseImage toCIImage:result strength:strength];
        }
        
        // Apply adjustments if we have em
        if (hasAdjustments) {
            TRAdjustments* adj = [[TRAdjustments alloc] init];
            [adj setAdjustmentData:adjustmentData];
            result = [adj applyTo:result];
        }
    }
    if (result) {
        CGRect drawingRect = CGRectMake(0, 0, view.drawableWidth, view.drawableHeight);
        CGRect targetRect = CGRectFitAspect(result.extent, drawingRect);
        
        // Draw it
        TRUNUSED CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
        [ctx drawImage:result inRect:targetRect fromRect:result.extent];
        TRUNUSED CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
        DDLogDebug(@"PTGLiveImageView drawLayer completed in %.4fs", end - start);
    }
}

Next, the trick was getting the TRAdjustments filter chain to render as quickly as possible. Let me start by saying that Core Image is bit of a black box. Most of the time it works as you’d expect, but as you push it in one of any directions (image size, filter chain depth, filter chain branches, etc), it will begin to break, and you won’t really know why. For instance, TRAdjustments applies up to 4 different adjustment steps to an image, and after each adjustment, is blends the adjustment back with the previous step’s result (to account for the varying strength of the effect). If you do this on a full-resolution image, performance falls apart.

The solution is to “rollup” the intermediate steps after each filter, but only for large images (“large” here being an arbitrary threshold that’s meant to exclude UI-sized images, but include anything bigger). At a high level, TRAdjustments does:

/// Composes the entire adjustment chain as a CIImage
- (CIImage*) applyTo:(CIImage*)input {
    DDLogVerbose(@"TRAdjustments applyTo - contrast: %f, brightness: %f, temp: %f, saturation: %f", self.contrast, self.brightness, self.temperature, self.saturation);
    
    // rollup a couple times in between if we're using adjustments
    CIImage* result = input;
    
    result = [self applyBrightnessAdjustment:result];
    result = [self rollupImageIfNeeded:result];

    result = [self applyContrastAdjustment:result];
    result = [self rollupImageIfNeeded:result];

    result = [self applyTemperatureAdjustments:result];
    result = [self rollupImageIfNeeded:result];

    result = [self applySaturationAdjustments:result];
    result = [self rollupImageIfNeeded:result];

    return result;
}

And a rollup looks like this:

-(CIImage*)rollupImageIfNeeded:(CIImage *)image {
    BOOL isLarge = (MAX(image.extent.size.width, image.extent.size.height) > kAdjustmentsLargeImageThreshold);
    CIImage* result = image;

    if (isLarge) {
        CIContext* context = [TRHelper getCIContextForImageSize:image.extent.size];
        CGImageRef cgResult = [context createCGImage:image fromRect:image.extent];
        assert(cgResult);
        result = [CIImage imageWithCGImage:cgResult];
        CGImageRelease(cgResult);
        [TRHelper doneWithCIContext:context forImageSize:image.extent.size];
    }
    return result;
}

A CIImage is really just a recipe for applying filters. It can be just a wrapper around a pixel buffer, with no additional filters, or it can contain an arbitrarily long list of things to do to the source image. To keep that list of things from becoming long enough to trip up CI, we basically force a render into a CGImage, then wrap that result back into a new CIImage. The result is that consumers of the new CIImage don’t inherit the filter chain from the prior image, and instead just reference the rendered pixels.

So with our conditional rollup, we get the shortest possible filter chain when rendering UI updates directly to the GLKView, but still reap the benefit of rolling up the intermediate steps for large images.

Finally, the last lesson learned is that, while not the fastest Core Image filter, CIColorCube is nonetheless pretty performant – faster than blending two images, even though the blend would kick the 3D Lookup Table operation’s ass on the CPU. This was surprising. As an example, I’ve used the Screen, Multiply, and Overlay blend modes for the past 10 years as my go-to method for lightening / darkening / adding contrast to an image. I just think they give a very lovely, photographic result – blend an image with itself using the Screen blend mode to lighten, and so on. But doing this using Core Image was a performance bottleneck. Instead, I built static lookup tables for (most of) the adjustments operations, utilizing CIColorCube instead of the blends.

PicTapGo uses 16x16x16 lookup tables. Back when we were looking at LUTs in RadLab, I wrote a proof-of-concept command-line utility in C that basically did the same thing as CIColorCube, except on the CPU, and when we built PicTapGo, we were excited to see that Core Image already had a GPU-accelerated 3D color lookup table built in. It really is the shiz. Tim Ruddick, who was Technical Director as Totally Rad when we built the initial version of PicTapGo, came up with the brilliant idea of encapsulating the lookup table itself as a PNG image, which has some fantastic advantages – 1) you can manipulate the LUT data directly in other applications, e.g. in Photoshop; 2) it’s visual and pretty to look at; and 3) the resulting files are automatically losslessly compressed into a standard data format. They basically look like this:

cube-multiply-150 cube-overlay-100 cube-screen-150

In the future, if we needed more performance, these LUTs provide a few opportunities for further optimization. First, we could build a Lookup Table composer, which combines two subsequent LUT operations into one (I believe Core Image does this internally, where it can). Second, we could scale the LUT data against an identity matrix to produce the “strength” adjustment, instead of relying on a blend operation. Manipulating 64k lookup table samples will always be faster than applying that lookup table to the entire image. Finally, writing our own Core Image Kernel for all or part of the adjustments chain (in particular, the overlay / multiply / screen routines would be VERY quick as a standalone shader). But none of that is really necessary, since it’s plenty fast enough already.

This was another biggie, time-wise, taking about 4 weeks to implement, as I recall. This included making the document model aware of this new thing called “adjustments” (much harder than it should have been, and not my fault – a discussion for another day), and writing some neat-o peek-a-boo UI elements… actually, now that I’m thinking about it, this required touching nearly every part of the app to accommodate the new feature. It’s been really well-received, and I’m immensely proud of it.