Share icon
Sign up for our newsletter
Close icon

    Enter your email below to receive a free e-newsletter with the latest IWF news, industry and parliamentary updates and events direct to your inbox.

    IWF Head of Software Chris Wilson

    Getting to grips with grid images

    By Chris Wilson, IWF Head of Software Development
    A video is broken down into lots of frames, some of which are used together in a grid or collage to advertise the video, this example is of a man in an autoshop

    We’ve deliberately scaled up our tech team to enhance the efforts of analysts in our Hotline.

    One of our challenges we’re tackling is around the recurring issue of ‘grid images’ in the hashing process. This process creates hashes, or digital fingerprints, of child sexual abuse images and videos that can be used to identify and block the images online.

    Grid images, however, are notorious for causing perceptual hash collisions, which means that the perceptual hashes from grids will sometimes match images of simple repeating patterns.

    A grid is a particular type of preview image used by offenders to advertise and make money from selling child sexual abuse videos online. Offenders create the grid images from video frames of the criminal content and post them on the internet along with a link to entice buyers to a premium file sharing service. These services require a subscription for users to access and download the full video.

    You can read more about commercial child sexual abuse imagery here.

    Grids can constitute up to 40% of the images we process in reports at any given point in time. But most of the tech companies who use our data to block child sexual abuse imagery exclude grid image hashes from searches because of the chance that they might clash with another image.

    The work in progress

    While there is no standard layout, background colour, sub-image size or software used to generate grids, we established that collisions are more likely to occur when a 7×7 grid or higher of sub-images is used.

    To handle grids in an automated fashion, we created a process that can detect the background and separate the images back into the constituent frames from the video for perceptual hash matching and clustering.

    Our testing on synthetic grids and real-world child sexual abuse material has shown this approach to be 95% effective with a 0.2% false positive rate, which is when an image is flagged as matching a grid and extracted erroneously.

    The software we have developed so far is too slow to be useful in real-time detection for external tech organisations, but it is suitable for IWF purposes, and we are working with partners internationally to optimise the code.

    In brief we: 

    A collage of video stills in a grid pattern that has had a filter applied to aid in image analysis
    Apply a quantization filter to restore a uniform background – removing JPEG compression artefacts – and simplify the image.
    Scan the image for any uninterrupted solid background colours that touch all four edges
    A collage of video stills in a grid pattern that has had a filter applied to aid in image analysis
    Use the identified background colour to generate a binary image as a foreground mask
    A collage of video stills in a grid pattern that has had a filter applied to aid in image analysis
    Apply blob detection to find the sub-images, subject to some edge case filtering

    A potential game changer

    This means we can now:

    1. Associate grid images with the source child sexual abuse video if we’ve already assessed it, restoring context to some images which may not otherwise have evidently been criminal content from the grid preview alone.
    2. Automatically detect and exclude grids in their raw form from perceptual hash matching and clustering to prevent collisions.
    3. Add the sub-image perceptual hashes to our data sets which would match the frames of the source videos on platforms that implement scanning on videos and enable the detection of a known video of child sexual abuse even where we don’t have the video itself.

    Impact on clustering

    For a similar area of our work, we use DBSCAN clustering with Photo DNA hashes to group very similar images together. This increases our ability to assess very visually similar images, such as simple resizes, and to improve consistency with our assessment process as we can compare assessments made by different analysts for slightly altered copies of the same images. We found assessments based on clusters to be 112% faster than assessing individual images.

    Grid images come into play for clustering when it comes to videos. The standard Photo DNA process for videos is to first extract the frames. These can then be assessed as images. The video receives its overall assessment rating based upon the “highest category” assessment of any of the constituent frames. This way the video itself can be graded for the severity of the child sexual abuse it contains. We also account for any frames extracted and circulated individually.

    The frames from any given video would form into a Photo DNA cluster much in the same way as copies of the same image would because each frame will be very similar to the next and previous frames. In the case of grids, this means that the extracted sub-images from a grid would merge into the cluster for the source video (if we have it to match against), which can provide vital context for us when assessing images.

    For example, the sub-images of a grid may not be obvious child sexual material because the image does not contain enough of the victim to be sure, but when it merges into the existing cluster, the analysts can then see where the grid came from, and the images before and after it in the video, which can often confirm one way or another whether the image is of a child or not.

    Why is this important?

    This makes our work faster, more accurate, and closes the loopholes which criminals have tried to exploit in their making and sharing of child sexual abuse images and videos.