Interlacing

⇦ Rendering Light Field    HOME


Interlacing Algorithm:

Now, we transform the view stack image that we just formed (image composed of sub-images acquired by virtual cameras from 5-by-5 different viewpoints stacked together) to the complimentary light field representation which is composed of super-pixels with 5-by-5 pixels (from each of 5-by-5 views). To put it simply, we will just rearrange the pixels, picking out 1 pixel from each of 5-by-5 sub-image set and putting it in each of 150-by-267 super-pixel set in that order as shown below:

In the Interlacing stage, we go from the View Stack representation of the Light Field to the complimentary Interlaced representation

In the Interlacing stage, we go from the View Stack representation of the Light Field to the complimentary Interlaced Views representation

The interlaced image is supposed to appear as a blurrier or low-pass version of a single view image in the view stack.

Fragment Shader:

Afore-mentioned interlacing logic is coded into the Metal’s fragment shader like so:

fragment float4 LightFieldFragment(FragmentInput in [[ stage_in ]],
constant LightFieldProperties& properties [[ buffer(0) ]],
texture2d multiviewTexture [[ texture(0) ]])
{
// Number of Views and Super Pixels
const int superviewSizeX = properties.superviewSizeX;
const int superviewSizeY = properties.superviewSizeY;
const int viewsCountX = properties.viewsCountX;
const int viewsCountY = properties.viewsCountY;
const int superPixelsCountX = superviewSizeX / viewsCountX;
const int superPixelsCountY = superviewSizeY / viewsCountY;

// Destination Super Pixel and View in X & Y
const int destPixelX = in.position.x; // [0, resX)
const int destSuperPixelX = destPixelX / viewsCountX; // [0, viewsCountX)
const int destSubPixelX = destPixelX % viewsCountX; // [0, viewsCountX)

const int destPixelY = in.position.y; // [0, resY)
const int destSuperPixelY = destPixelY / viewsCountY; // [0, viewsCountY)
const int destSubPixelY = destPixelY % viewsCountX; // [0, viewsCountX)

// Source Pixels in X & Y
const int srcViewX = viewsCountX - destSubPixelX - 1; // [0, viewsCountX)
const int srcViewStartX = srcViewX * superPixelsCountX; // [0, resX)
const int srcViewPixelX = destSuperPixelX; // [0, superPixelsCountX)
const int srcX = srcViewStartX + srcViewPixelX; // [0, resX)

const int srcViewY = viewsCountY - destSubPixelY - 1; // [0, viewsCountY)
const int srcViewStartY = srcViewY * superPixelsCountY; // [0, resY)
const int srcViewPixelY = destSuperPixelY; // [0, superPixelsCountY)
const int srcY = srcViewStartY + srcViewPixelY; // [0, resY)

return multiviewTexture.sample(s, float2(srcX, srcY));
}

Vertex Shader:

The simple vertex shader keeps the default implementation where it takes in the vertex position and texture coordinates as input and assign them to the FragmentInput struct which is passed into the fragment shader as input. Vertex Shader is defined before the Fragement shader:

vertex FragmentInput LightFieldVertex(VertexInput in [[ stage_in ]])
{
    FragmentInput out;
    out.texcoord = float2(in.texcoord.x, 1.0f - in.texcoord.y);
    out.position = in.position;
    return out;
}

Finally, build your application again and once its successful, connect back your iPhone 6 and run the application:

Final Output shows the Interlaced Space Ship with correct Depth Perception

Final Output shows the Interlaced Space Ship with correct Depth Perception

Hooray, you are ready to put on the parallax barrier, align it with moire pattern, and view the magnificient space ship in 3D. Pat yourself on the back.


Note: Key Implementation Details

Some notes on implementation details that should be included here because they are not available anywhere else:

    • We need to create a pass-through vertex shader for both OpenGL and Metal
    • We can just create a pass-through fragment shader for OpenGL
    • For the Metal fragment shader, we declare a struct which contains the symbols that we defined in our SCNTechnique, then pass this struct and our input texture to the the shader like so:
      fragment float4 LightFieldFragment(FragmentInput in [[ stage_in ]],
                                         constant LightFieldProperties& properties [[ buffer(0) ]],
                                         texture2d multiviewTexture [[ texture(0) ]])
      
      
    • We can use the SceneKit semantics defined here to pass vertex information from SceneKit into our vertex shader:
      struct VertexInput{
          float4 position [[ attribute(SCNVertexSemanticPosition) ]];
          float2 texcoord [[ attribute(SCNVertexSemanticTexcoord0) ]];
      };
      

Note: Performance Improvement with Metal:

We initially implemented our interlacing algorithm in the OpenGL’s fragment shader. However, the system couldn’t handle rendering more than 3×3 camera views and it would crash when the GPU debugger was attached. Here is the performance for 3×3 views:

OpenGL renders 9 views at 26fps

OpenGL renders 9 views at 26fps

We then switched to using Apple’s new low-level rendering and compute API, Metal, which replaces the OpenGL API (unlike the Apple’s high-level API, SceneKit that runs on top of the OpenGL). And as noted by various bench markers online, we were able to get dramatic performance improvement. Metal’s performace for our target 5×5 set of views:

Metal shader delivers 47fps for rendering 25 views

Metal shader delivers 47fps for rendering 25 views


Dynamic View Update through Head Tracking:

Once you are able to properly perceive depth through the parallax barrier, you will notice an abrupt skipping of views when you move horizontally or vertically enough to move outside of the head box or the barrier’s field of view. In order to have a seamless view transition, we need to track the position of the head in three dimensions. As soon as the head moves out of the head box in say x dimensin to the right, we should roll over the entire first column of cameras in the virtual camera grid to the right end of the grid. We would also need to use the depth value of the head to update the camera spacing as described in the last section.

It is relatively easy to determine the x & y coordinates of the head position using the face detection capability of the built-in camera API of the iPhone 6. The key then is to map the head position from the camera coordinate system to the world coordinate system used by the rendering scene. We can also estimate the depth using a lookup scheme that approximately maps the average size of the head to the distance from the front-facing camera.

While we try hard to finish up this section, you can refer to Chris Cavanagh’s blog on “Live Video Face Masking on iOS” to learn about implementing the face detection algorithm using iOS’s AV Foundation library.

Comments are closed.