The project I am working on at present is focused on building an automated system for tracking the movement of honeybees in an observation hive, filmed using a grayscale camera and infrared light. My recent attempt at this has involved extracting features (or keypoints) from the frames and then describing them using the SIFT (Scale Invariant Feature Transform) algorithm, implemented in the OpenCV imaging library. There are a number of other (potentially faster) feature detection algorithms available (SURF, ORB, etc), however as SIFT has been traditionally regarded as the most robust, I thought I would start with it. SIFT works first by extracting what it deems to be robust keypoints/features. For each of these, it will then compute a unique feature descriptor based on the 16×16 pixel area of the feature. By extracting and describing these features, you can try to compare images as shown below.
The coloured circles above indicate the features that were extracted and described. I also cut an image of a bee from both images and tried to use SIFT and a matching algorithm called FLANN (Fast Library for Approximate Nearest Neighbours) to see how good a match I could come up with. The close-up of the bees wasn’t too bad, but the image of the whole frame didn’t work as well. Keep in mind that I’ve gone through the usual process of normalising and smoothing the images.
I then came across an implementation known as dense-SIFT, which performs the same process as above, however, instead of being selective about the features it extracts, keypoints from the entire image are extracted as shown below. This can be quite handy when you have a landscape that you’re wanting to sift (no pun intended) through to find smaller objects.
I then tried to use FLANN to match the images of the bee and it worked very well:
Unfortunately, when I tried to match an image of a bee to an image that was not a big close-up of the bees, I ended up with less impressive results:
Even when I cut out a bee and a section of the area around it at the resolution of the image above showing the whole frame, and tried to match it to part of the bee template, I ended up with messy results.
The lesson I’m taking from this experiment is that while SIFT can work quite well with a high resolution image and a large object you’re trying to extract, when trying to extract multiple small (and similarly looking) objects, I’m going to have to find a method other than this.
For more about bees and the observation hive we have setup, please visit my other blog.