So you’ve posted a photo to your favorite social media site and it suddenly puts little boxes around each person’s face and asks you to tag who they are. The first part—recognizing there’s a face in the image—is facial detection. The second part is facial recognition—putting a name to a face. The reason the site is asking you to name the people in the image? That’s so it can learn to recognize that face in other pictures. Next thing you know, it’ll be auto-suggesting people in other pictures—Facebook, for example, is learning faces.

Now that you have a smart camera set up, we’re going to add both facial detection (“there is a human face in the picture”) and facial recognition (“and that face is my face!”). There are two steps to this. First, we’ll install software that can detect whether there is a face-shaped object in the image (facial detection); then we’ll try to extract that object and see whose face it is (facial recognition).

OpenCV

OpenCV, also known as the Open Source Computer Vision Library (at https://opencv.org), is a library of over 2,500 image-handling algorithms. They can be used for everything from interactive art, to mines inspection, to stitching maps on the web or to advanced robotics. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high-resolution image of an entire scene, find similar images from an image database, remove red-eye from images taken using flash, follow eye movements, and recognize scenery and establish markers to overlay it with augmented reality.

In short, it’s free industry-grade software that is easily set up via Python. We’ll use OpenCV to do facial detection as well as facial recognition. If you decide to delve deeper into image processing, the best part is that OpenCV can do a lot more than just faces. If you want to dig deeper on the background theory and advanced capabilities of machine learning as it applies to images, you can read up on its full abilities at https://opencv.org. We’ll work off their specific section on facial recognition here:

https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html

The same software does both—the first operation is quick, and the second takes a little more effort. Just as with the previous projects, the software for doing the hard work is easy to install. It’s a package called OpenCV, and you probably already installed it in Chapter 1. But if not, install it with

sudo apt-get install opencv
pip install numpy

(Numpy is a numerical library that is required by and speeds up OpenCV.)

We’ll pull out the core parts of the tutorial for our specific SpyPi work. To begin, we’ll get going with the core functionality for your Pi system. After that, if you want to explore deeper, you have all the tools you’ll need in place to do so.

Step 1: Quick Face Boxing

Let’s start by downloading a ready-to-go facial recognition script (from a user named Shantnu Tiwari, username shantnu) that finds faces and draws boxes around them in real time. Navigate to this website from your Pi:

https://github.com/shantnu/Webcam-Face-Detect

Click the Download button to download the zip file. Then, in your terminal, unpack it:

unzip Downloads/Webcam-Face-Detect-master.zip

Then, to run it:

cd Webcam-Face-Detect-master
python webcam.py

Figure 4-1 shows the code in that webcam.py file.

c04f001.tif — **Figure 4-1:** Code from `Webcam-Face-Detect-master/webcam.py`

That’s it. You will now see the view from your webcam, and as the software detects a face, it will draw a box around the face. The video is choppy, with slight delays between what you do and what it shows; this is because it takes a while for the software to run the facial detection. We’ll tackle what the software is doing and add coolness to it, but this script is a great way to show how easily you can do facial recognition on the Pi.

Figure 4-2 shows the first result, the author “found” by the software.

c04f002.tif — **Figure 4-2:** Author’s face found and boxed by SpyPi

It runs without you needing to understand it, but where is the fun in that? Let’s go deeper. First, look at the script (click it to open it in an editor, or if you use the terminal, type less webcam.py). The script is very short and written in Python; it loads the opencv library that handles the image processing as well as an XML file that contains mathematical definitions created by researchers that define what a computer will recognize as “a face.” The script then calls up the web camera and runs a loop to apply the face-finding algorithm.

The first five lines of text set up the OpenCV package and the XML file. The while True: line says “loop forever.” The lines marked with # are comments that describe what the following lines in the code do. First the script captures a frame; then it draws a rectangle around the frame and displays that frame in the rectangle. The final two lines do cleanup. If you press Q at any point during the demo, the program exits, which is handy (and means you don’t need to press Ctrl-C as in earlier chapters).

This script is good enough to tell a person from a stuffed animal or pet, as you can see in Figure 4-3. The accuracy of facial recognition software (even in industrial uses) is still a work in progress, so we’ll look at the sorts of errors you can expect, as well as let you try different methods.

c04f003.tif — **Figure 4-3:** Face-finding test: person vs. stuffed animal. The software correctly found the person and did not match on the stuffed animal.

An interesting point about any image detection is that you get to choose between a false positive—it matches a face even if one isn’t there—and a false negative—there’s a face but it doesn’t find it. Let’s look at some image examples of this basic face-finding problem. Figure 4-4 contains both a false positive—it “finds” a face in some wall tiles that isn’t really a person—and a false negative—it fails to find the author’s face.

c04f004.tif — **Figure 4-4:** False positive (box but no face) and false negative (face and no box)

It’s also worth noting that software relies on patterns, so if the only thing in the field of view has some face-like aspects, such as eyes, it might find a match. If a stuffed animal is the only thing in the camera’s field of view, it’s not surprising if the software sometimes (but not always) tries to tag it as a face. We do this as humans—draw any two dots plus a line, and we’ll see a face. This phenomenon, called pareidolia, means humans tend to see faces in objects. Computer code can have the same problem. A classic example is the face seen in an outlet (Figure 4-5).

c04f005.tif — **Figure 4-5:** Pareidolia—seeing faces in inanimate objects

We can run into this phenomenon with software, particularly if we’re trying to fool it. The software doesn’t know what a “person” or “object” is, and it’s looking for patterns. If you mess up the patterns—put on Juggalo makeup, for example—you can confuse facial recognition. Likewise, an object such as a toy made to look like it has a face might find the software latching onto part of its face-like structure to falsely match a face.

c04f006.tif — **Figure 4-6:** The stuffed animal region around the eyes fools the software into thinking it’s a face.

Either way, you now have real-time facial detection working on your SpyPi. Let’s build on this to do more.

A Little Image Theory

How do you recognize someone? The most common human way is to notice a specific feature or set of features that identifies a specific person: that person with the dark hair and big nose, that other person with the eyeglasses and red lips. We latch onto these for quick identification.

When you do this with a group of people you’ve just met, you compare them to one another. It’s no good saying “she’s the one with dark hair” if everyone in the group has dark hair, so you make quick comparatives for just that group—the one redhead, the one with the narrow eyes, the one with glasses. This kind of quick sorting out of a group is the Eigenfaces method in OpenCV—within a group of known photos, it’s how you tell one from another. The downside of this approach is that you can later get confused if a person changes her hairstyle or takes off his glasses (Clark Kent style!), or even if we see that person in different lighting.

More often, you’ll want to make a list of features that let you distinguish any face from the others: not just “the person with glasses” but a checklist—“Glasses: yes/no. Dark hair: yes/no” and so on. Then each person has a profile of features that you can use to compare with others. That’s the Fisherfaces method in OpenCV. Sure, the computer isn’t smart enough to know what “glasses” or “hair” are, but it can teach itself that the “dark blob at the top of the face-shape” is different from the “light blob at the top of the face-shape”—and that’s practically the same thing for our purposes.

Finally, you can go full-on machine thinking and instead of extracting components of faces to make a checklist, you let the computer do a pixel-by-pixel mapping of how faces change as you scan left to right and up and down. So where we see an “eye,” the computer just says, “There is a white zone with a non-white dot in the middle.” That’s the Local Binary Pattern Histogram (LBPH) method in OpenCV.

In Python, these modules are called as follows:

Eigenfaces: createEigenFaceRecognizer()
Fisherfaces: createFisherFaceRecognizer()
Local Binary Patterns Histograms: createLBPHFaceRecognizer()

These are the three different underlying algorithms available in OpenCV, each with its own pluses and minuses. We’ll be using LBPH, but (as you’ll see) you can change just one line in the code to use the other algorithms.

Step 2: Training Your Camera

Once you have an algorithm, you need to train your camera with the set of faces you think it will see. For example, if you want the camera to identify any visiting students and you have a set of student photos, you first make the software aware of what the students look like; then the camera can not only detect that a face exists but tag it and label it as whose face it is.

Now that your camera can detect a face, we want to make it tell you whether that’s a specific known face, thus creating an ID system. To do this, you’ll need an image of any and all faces you want it to recognize so that you can “train” your camera system to spot specific people.

So, take a bunch of pictures of the faces of people you want to enter in its system. You’ll then train the system by loading those in. Once trained, the program can then automatically recognize known individuals.

An untrained system can still put a rectangle around a face and say “There is a face,” but for identification purposes, training it is useful.

To get this started, first you’re going to modify the previous code to make it easier to rework. We already did this and put it up on GitHub at

https://github.com/sandyfreelance/SpyPi

Go get it and put it on your Pi, and we’ll walk through what it does. This code will 1) display a new image every second, 2) draw rectangles around faces, 3) match any faces it “knows,” and 4) let you also train it by telling it to add a face to its memory. The opening lines just set up the variables and load OpenCV and other programs:

import cv2
import sys
import numpy
import time
import pickle
cascPath = "haarcascade_frontalface_default.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
face_recog = cv2.createLBPHFaceRecognizer()
vid_cap = cv2.VideoCapture(0)
all_faces = []  # list to store 'known' faces in
face_count = 0
savefile = 'webfaces.dat'

After that, the code asks if you want to reload previously saved “known” faces:

yn = raw_input("Do you want to import previously saved faces? y/n")
try:
        if yn[0] == "Y" or yn[0] == "y":
                infile = open(savefile,"rb")
                all_faces = pickle.load(infile)
                infile.close()
                face_count = len(all_faces)
                labels = range(face_count)
                face_recog.train(all_faces,numpy.array(labels))
                print face_count,"faces loaded"
except:
        print "No faces loaded"
        pass

Enough preamble.

Step 3: Identifying Whose Face You Found

The meat of the code starts with the rectangle-drawing functions from before and adds a little face recognition. First, we use the same routines from the previous code that grab an image and draw a box around it:

def detect_face(vid_cap, all_faces, face_recog, face_count):
    faces_found = []  # empty holder for storing what we find
    # Capture frame-by-frame
    ret, frame = vid_cap.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(
          gray,
          scaleFactor=1.1,
          minNeighbors=5,
          minSize=(30, 30),
          flags=cv2.cv.CV_HAAR_SCALE_IMAGE
          )
    # Draw a rectangle around the faces
    for (x, y, w, h) in faces:
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            # also can put a trigger here

What does the comment #also can put trigger here mean? This is where you can put code that triggers when any face is found, in case you want to make an automatic door opener or something similar. It’s just a comment indicating that this is a good place to put instructions for facial detection (before they are recognized).

Now for the next bit. If we can draw a rectangle around a face, we can extract just the face and then compare it with any previous faces we might have stored in a list. We always convert faces to grayscale, which removes color imbalance concerns (and is also the format OpenCV expects). So do this:

# extract just the face as its own image
thisface = frame[y:y+w, x:x+h]
grayface = cv2.cvtColor(thisface, cv2.COLOR_BGR2GRAY)
faces_found.append(grayface)

Here’s the fun part. It’s only one line that compares a new face found with “known” faces, using OpenCV’s .predict method:

       if face_count > 0:
          id = face_recog.predict(grayface)
          print "Found face number",id[0]

And to close off this part, we again display it onscreen and return any faces we found to our main calling program:

# Display the resulting frame
cv2.imshow('Video', frame)
return faces_found

That wraps up the code for the new subroutine. Next we’ll put in some bookkeeping and add in the OpenCV functions that do the recognition. We again use a “keep looping until we tell you otherwise” construct, and we allow for two different keypresses:

Q quits the program.
S saves the currently rectangle-highlighted face as a “known” face.

So we train the system by pressing S when a known person is there, and then the system automatically matches it to any new faces found. You could move the training to its own program, but having the script allow you to both train and predict made it smaller and easier.

while True:
       faces_found = detect_face(vid_cap, all_faces, face_recog, 
       face_count)
       checkme = cv2.waitKey(100)
       if checkme & 0xFF == ord('q'):
          print "Exiting"
          break
       if checkme & 0xFF == ord('s') and len(faces_found) > 0:
           # assumes we are storing one and only one face              
           print "Storing a new face"
           face_count = face_count + 1
           all_faces.append(faces_found[0])
           labels = range(face_count)
           face_recog.train(all_faces,numpy.array(labels))

Yet again, all the real work is being done in the OpenCV .train() method call. You pass it a list of faces you want it to learn, and it handles the rest.

Now to wrap everything up by cleanly closing out our windows (once the user presses the Q key), and then also saving any faces you marked as known (by pressing S, perhaps multiple times):

# When everything is done, release the capture 
video_capture.release()
cv2.destroyAllWindows()
# also save any 'trained' faces 
outfile = open(savefile,'wb')
pickle.dump(all_faces, outfile)
outfile.close()

How well does it work? In Figure 4-7 we pressed the S key to tell the software to store this fine face it found.

c04f007.tif — **Figure 4-7:** Training your SpyPi to recognize me (runtime on left, video capture on right)

Then I’ll move around and see if it can still find me. It can! It keeps listing “Found face number 0” in its runtime window. You can also modify the code and replace that line with, well, whatever you want your Pi to do. You can have it trigger an alarm, send a message, and many other things, by combining other Jumpstart guides with this one. We provide the face recognition; you add the hardware.

So you can see how it saves trained data, Figure 4-8 shows a new run that loads the automatically saved data from the previous run. It still recognizes me, because now I’m in its saved trained data set.

c04f008.tif — **Figure 4-8:** Trained system keeps finding me (runtime on left, video capture on right).

And, to complete the tests, we check whether I can fool it with a cunning disguise.

c04f009.tif — **Figure 4-9:** Even using a Santa hat does not fool the SpyPi face recognizer.

Triggering and Future Steps

Once you have categorized found faces as either known or unknown, you can add triggers to your code to make your Pi react. Maybe you want it to flash an LED (or lock a door) if it sees a face it doesn’t recognize. Or flash an LED (or open a door) if it sees a face it does know. With the previous code and a little work in Python, you can add in these “triggered” effects by editing the code provided. Just take the line that prints “face not found” and have it call a Python routine to carry out your automated wishes.

Using the Pi Ribbon Camera

As usual, if you are using the Pi ribbon camera instead of a USB web camera, you just need to change a few lines in our examples to repoint our code to the correct camera. We’re not going to rewrite all the code here, but you should be able to figure out the mods easily enough. To capture with a web camera, we were using code with this type of call:

camera = cv2.VideoCapture(0)
retval,image=camera.read()
if retval:
    cv2.imwrite('test.jpg',image)

To use the Pi ribbon camera instead, you use constructs of this form:

from picamera import *
camera = PiCamera()
camera.capture('image.jpg')

The picamera package is installed on the Pi automatically, and full documentation is at http://picamera.readthedocs.io/.

Put simply, once code can find a face, it can act on that knowledge. Building this system into a security system is one possibility. Saving only images of faces is another surveillance-type activity you can use it for. There’s not a lot of use for, for example, one week of raw video, even if you can find faces. However, if you capture faces only when they appear and put that together, you have a video log of all the people who traveled past your system.

Add in our earlier motion-detection and time-lapse work, and you have a system that can capture time-lapse images only when there’s activity involving people with faces:

Have it take pictures only when there’s motion (Chapter 3).
For those pictures, if there’s a face, have it save the image of the face (this chapter).
Build those into a time-lapse movie of “people who visited us” (Chapter 2).

Faces are only the start. Using the same OpenCV libraries, you can perform recognition of other objects, edge detection, real-time color filtering or background subtraction, conversion to black and white, motion trails, and all sorts of other image manipulation (see https://pythonprogramming.net/loading-images-python-opencv-tutorial/)—even converting images in near real time to artistic styles like Van Gogh or A-ha/Take-On-Me stylings (see https://larseidnes.com/2015/12/18/painting-videos-with-neural-networks/).

At this point, armed with a Pi and some simple Python software, you can bend image reality to do everything from recognition to alteration. Enjoy capturing and manipulating visual data!

Chapter 4
Machine Learning: Identifying People in Images