Frequency Response Signal from 3D Pose Estimation RGB-D Image for Gait Classification

3 min readMay 23, 2021

RGB-D Image Deprojection from Pixel Coordinates to Real World Coordinates (width x height x depth, in mm)

You are a limited edition, and indeed you are. Not even identical twins are, well ironically, 100% identical. The same can be said for biometrics. The fingerprint, iris pattern, face, and gait.

Uniquely identifying a person by only his/her gait is a tall order though. It requires a tremendous degree of precision. However, it is possible to classify them. The huncher, the C-3PO, the moonwalker, and the whatnots, can all be classified in their groups.

Thanks to the computer science community, Pose Estimation (e.g. OpenPose) is a valuable tool to achieve this. Paired with information from depth image, key points (body part) may be de-projected from pixel coordinate to real-world coordinate. As RGB-D cameras like Intel RealSense become more compact, more affordable, more powerful, and better supported, gathering the RGB-D images may not be as tedious as Kinect time. Sorry, Microsoft.

The key points from a 2D image could be conveniently de-projected to 3D points and linked as a skeleton. Okay, now instead of a 2D, we have a 3D skeleton. Now what? How does this help tell someone apart from the others?

Once the 3D skeleton is constructed for each image frame, each skeleton link could form a vector. And between sequential images, we can get a vector displacement for each skeleton link. Generally, most human has rotational joints, hence, the displacement is a rotation. You take the cross product, you get the rotation axis. If the walker swings back and forth, the rotation axis inverts back and forth as well. And voila, a set of periodic signals.

Sample of one link periodic feature. A left leg. Why? Why not!

We can handle periodic signals like how we handle audio signals. Do FFT, get the frequency response, then simply convolute it. To create a rich dataset, there must be enough walkers to get the frames from though. Willing walkers, who do not mind their catwalk moments immortalized and held in custody by you.

Now the tricky part, the rotation axis vector is sensitive to walking direction. So we must choose if all elements in the vector are necessary. The [X Y Z] elements of a vector. “Yes!” you may say, “I want precision.”, well, you need to track the walker in order to compensate for the orientation of walking. Otherwise, as most people stick to the ground while walking, take only one element that is immune to the orientation. Hint: one that is perpendicular to the ground. This simplifies a lot. The tracking itself is a problem of its own.

In addition to the rotation axis signal, there is also the rotation angle signal, the byproduct of the cross product (no pun intended). It is the measurement of how much angle a link rotates. This one is also immune to the orientation of walking direction.

A headless stickman. The signal not well constructed until near enough inside 10 meters (10000) depth. And even after once near enough, its left leg and arm still doing a Fat Boy Slim’s Push the Tempo.

Getting clean data is also very tricky as our honorable gentleman, Mr. Stickman volunteers to demonstrate. It is a good idea not to let the subject too far or walk away from the camera. But, they need to walk in order for this to work. And at some point, it will be away. Yet maybe, their unique impression is here to stay.

Frequency Response Signal from 3D Pose Estimation RGB-D Image for Gait Classification

Written by Januwar Hadi