-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #105 from taorye/dev
feat: app gesture classifier
- Loading branch information
Showing
16 changed files
with
489 additions
and
2 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
title: MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection | ||
--- | ||
|
||
## Introduction | ||
|
||
The `MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection` can classify various hand gestures. | ||
|
||
The current dataset used is the `14-class static hand gesture dataset` with a total of 2850 samples divided into 14 categories. | ||
[Dataset Download Link (Baidu Netdisk, Password: 6urr)](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g) | ||
|
||
![](../../assets/handposex_14class.jpg) | ||
|
||
This app is implemented in `MaixPy/projects/app_hand_gesture_classifier/main.py`, and the main logic is as follows: | ||
|
||
1. Load the `14-class static hand gesture dataset` processed by the **Hand Keypoint Detection** model, extracting `20` relative wrist coordinate offsets. | ||
2. Initially train on the first `4` classes to support basic gesture recognition. | ||
3. Use the **Hand Keypoint Detection** model to process the camera input and visualize classification results on the screen. | ||
4. Tap the top-right `class14` button to add more samples and retrain the model for full `14-class` gesture recognition. | ||
5. Tap the bottom-right `class4` button to remove the added samples and retrain the model back to the `4-class` mode. | ||
6. Tap the small area between the buttons to display the last training duration at the top of the screen. | ||
7. Tap the remaining large area to show the currently supported gesture classes on the left side—**green** for supported, **yellow** for unsupported. | ||
|
||
## Demo Video | ||
|
||
<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4"> | ||
Classifier Result Video | ||
</video> | ||
|
||
1. The video demonstrates the `14-class` mode after executing step `4`, recognizing gestures `1-10` (default mapped to other meanings), **OK**, **thumbs up**, **finger heart** (requires the back of the hand, hard to demonstrate in the video but can be verified), and **pinky stretch**—a total of `14` gestures. | ||
|
||
2. Then, step `5` is executed, reverting to the `4-class` mode, where only gestures **1**, **5**, **10** (fist), and **OK** are recognizable. Other gestures fail to produce correct results. During this process, step `7` was also executed, showing the current `4-class` mode—only the first 4 gestures are green, and the remaining 10 are yellow. | ||
|
||
3. Step `4` is executed again, restoring the `14-class` mode, and previously unrecognized gestures in the `4-class` mode are now correctly identified. | ||
|
||
4. Finally, dual-hand recognition is demonstrated, and both hands' gestures are accurately recognized simultaneously. | ||
|
||
## Others | ||
|
||
The demo video captures the **maixvision** screen preview window in the top-right corner, matching the actual on-screen display. | ||
|
||
For detailed implementation, please refer to the source code and the above analysis. | ||
|
||
Further development or modification can be directly done based on the source code, which includes comments for guidance. | ||
|
||
If you need additional assistance, feel free to leave a message on **MaixHub** or send an email to the official company address. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
title: MaixCAM MaixPy 基于手部关键点检测结果进行进行手势分类 | ||
--- | ||
|
||
|
||
## 简介 | ||
|
||
由`MaixCAM MaixPy 基于手部关键点检测结果进行进行手势分类`可分类手势。 | ||
|
||
目前使用的数据集为`14 类静态手势数据集`,[数据集下载地址(百度网盘 Password: 6urr )](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g),数据集共 2850 个样本,分为 14 类。 | ||
|
||
|
||
![](../../assets/handposex_14class.jpg) | ||
|
||
|
||
该 app 实现位于 `MaixPy/projects/app_hand_gesture_classifier/main.py`,主要逻辑是 | ||
|
||
1. 加载 `14 类静态手势数据集` 经 `手部关键点检测` 处理后的 `20` 个相对手腕的坐标偏移 | ||
2. 初始训练前 `4` 个分类,以支持手势识别 | ||
3. 加载 `手部关键点检测` 模型处理摄像头并通过该分类器将结果可视化在屏幕上 | ||
4. 点击右上角 `class14` 可增添剩余分类样本再训练以达到 `14` 分类手势 | ||
5. 点击右下角 `class4` 可移除上一步添加的分类样本再训练以达到 `4` 分类手势 | ||
6. 点击按钮之间的小块区域,可在顶部显示分类器上一次训练的时长 | ||
7. 点击其余大块区域,可在左侧显示当前支持的分类类别,绿色表示支持,黄色表示不支持 | ||
|
||
|
||
|
||
## 效果视频 | ||
<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4"> | ||
Classifier Result video | ||
</video> | ||
|
||
1. 视频内容为执行了上述第 `4` 步后的 `14` 分类模式,可识别手势 `1-10` (默认对应其他英文释义),ok,大拇指点赞,比心(需要手背,拍摄时不好演示,可自行验证),小拇指伸展 一共 `14` 种手势。 | ||
|
||
2. 紧接着执行第 `5` 步,回退到 `4` 分类模式,仅可识别 1,5,10(握拳)和 ok,其余的手势都无法识别到正常结果。期间也有执行 第 `7` 步展示了当前是 `4` 分类模式,因为除了前 4 种手势为绿,后 10 种全部为黄色显示。 | ||
|
||
3. 再就是执行第 `4` 步,恢复到 `14` 分类模式,`4` 分类模式无法识别的手势现在也恢复正确识别了。 | ||
|
||
4. 末尾展示了双手的识别,实测可同时正确识别两只手的手势。 | ||
|
||
|
||
## 其它 | ||
|
||
效果视频为捕获的 maixvision 右上的屏幕预览窗口而来,和屏幕实际显示内容一致。 | ||
|
||
详细实现可见源码和上述分析了。 | ||
|
||
二次开发或修改也可直接基于源码完成,内附有注释。 | ||
|
||
如确实仍有需要协助的,可与 maixhub 上发帖留言或发 email 到公司邮箱。 |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
|
||
build | ||
dist | ||
/CMakeLists.txt | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
import numpy as np | ||
|
||
class LinearSVC: | ||
class StandardScaler: | ||
mean:np.ndarray | ||
std:np.ndarray | ||
def transform(self, X): | ||
return (X - self.mean) / self.std | ||
|
||
def fit_transform(self, X): | ||
self.mean = np.mean(X, axis=0) | ||
self.std = np.std(X, axis=0) | ||
return self.transform(X) | ||
|
||
def __init__(self, C=1.0, learning_rate=0.01, max_iter=1000): | ||
self.C = C | ||
self.learning_rate = learning_rate | ||
self.max_iter = max_iter | ||
self.scaler = self.StandardScaler() | ||
|
||
def save(self, filename: str): | ||
np.savez(filename, | ||
C = self.C, | ||
learning_rate = self.learning_rate, | ||
max_iter = self.max_iter, | ||
scaler_mean = self.scaler.mean, | ||
scaler_std = self.scaler.std, | ||
classes = self.classes, | ||
_W = self._W, | ||
_B = self._B, | ||
) | ||
|
||
@classmethod | ||
def load(cls, filename: str): | ||
npzfile = np.load(filename) | ||
self = cls( | ||
C=float(npzfile["C"]), | ||
learning_rate=float(npzfile["learning_rate"]), | ||
max_iter=float(npzfile["max_iter"]) | ||
) | ||
self.scaler.mean = npzfile["scaler_mean"] | ||
self.scaler.std = npzfile["scaler_std"] | ||
self.classes = npzfile["classes"] | ||
self._W = npzfile["_W"] | ||
self._B = npzfile["_B"] | ||
return self | ||
|
||
def _train_binary_svm(self, X, y): | ||
""" | ||
训练一个二分类 SVM。 | ||
""" | ||
n_samples, n_features = X.shape | ||
w = np.zeros(n_features) | ||
b = 0 | ||
for _ in range(self.max_iter): | ||
scores = np.dot(X, w) + b # 计算所有样本的预测得分 | ||
margin = y * scores # (n_samples,) 计算每个样本的 margin | ||
mask = margin < 1 # 获取不满足条件的样本,满足 condition 即为支持向量 | ||
X_support = X[mask] # 支持向量 | ||
y_support = y[mask] # 支持向量的标签 | ||
if len(X_support) > 0: # 向量化更新公式 | ||
w -= self.learning_rate * (2 * w / n_samples - self.C * np.dot(X_support.T, y_support)) # 批量更新 w | ||
b -= self.learning_rate * (-self.C * np.sum(y_support)) # 批量更新 b | ||
return w, b | ||
|
||
def fit(self, X, y): | ||
""" | ||
训练多分类 SVM。 | ||
参数: | ||
- X: (n_samples, n_features) 的特征矩阵 | ||
- y: (n_samples,) 的标签数组,值为多个类别 | ||
""" | ||
self.classes = np.unique(y) # 提取所有类别 | ||
self._W = np.zeros((len(self.classes), X.shape[1])) | ||
self._B = np.zeros(len(self.classes)) | ||
for i, cls in enumerate(self.classes): | ||
binary_y = np.where(y == cls, 1, -1) # 构造一对多的标签 | ||
w, b = self._train_binary_svm(X, binary_y) | ||
self._W[i] = w | ||
self._B[i] = b | ||
|
||
def forward(self, X): | ||
return np.dot(X, self._W.T) + self._B | ||
|
||
def predict(self, X): | ||
return self.classes[np.argmax(self.forward(X), axis=1)] # 返回得分最高的类别 | ||
|
||
def predict_with_confidence(self, X): | ||
def softmax(x): | ||
x_max = np.max(x, axis=-1, keepdims=True) # 处理数值稳定性:减去最大值 | ||
exp_x = np.exp(x - x_max) | ||
return exp_x / np.sum(exp_x, axis=-1, keepdims=True) | ||
res = self.forward(X) # (n_samples, n_classes) | ||
confidences = softmax(res) # (n_samples, n_classes) | ||
return self.classes[np.argmax(res, axis=1)], np.max(confidences, axis=1) # 返回得分最高的类别 | ||
|
||
|
||
class LinearSVCManager: | ||
def __init__(self, clf: LinearSVC=LinearSVC(), X=None, Y=None, pretrained=False): | ||
if X is None: | ||
X = np.empty((0, 0)) | ||
if Y is None: | ||
Y = np.empty((0,)) | ||
|
||
# 转换为 NumPy 数组 | ||
if isinstance(X, list): | ||
X = np.array(X) | ||
if isinstance(Y, list): | ||
Y = np.array(Y) | ||
|
||
# 类型检查 | ||
if not isinstance(X, np.ndarray): | ||
raise TypeError("X must be a list or numpy array.") | ||
if not isinstance(Y, np.ndarray): | ||
raise TypeError("Y must be a list or numpy array.") | ||
|
||
if len(X) != len(Y): | ||
raise ValueError("Length of X and Y must be equal.") | ||
if len(Y) == 0: | ||
raise ValueError("A classifier (clf) must be provided with training samples X and Y.") | ||
|
||
if pretrained: | ||
if clf is None: | ||
raise ValueError("A pretrained classifier (clf) can't be `None`.") | ||
|
||
if clf is None: | ||
if pretrained: | ||
raise ValueError("A pretrained classifier (clf) can't be `None`.") | ||
clf = LinearSVC() | ||
|
||
self.clf = clf | ||
self.samples = (X, Y) | ||
|
||
if not pretrained: | ||
self.train() | ||
|
||
def train(self): | ||
X_scaled = self.clf.scaler.fit_transform(self.samples[0]) | ||
self.clf.fit(X_scaled, self.samples[1]) | ||
print(f"{len(self.samples[1])} samples have been trained.") | ||
|
||
def test(self, X): | ||
X = np.array(X) | ||
if X.shape[-1] != self.samples[0].shape[1]: | ||
raise ValueError("Tested data dimension mismatch.") | ||
X_scaled = self.clf.scaler.transform(X) | ||
return self.clf.predict_with_confidence(X_scaled) | ||
|
||
def add(self, X, Y): | ||
X = np.array(X) | ||
Y = np.array(Y) | ||
|
||
if X.shape[-1] != self.samples[0].shape[1]: | ||
raise ValueError("Added data dimension mismatch.") | ||
|
||
if len(self.samples[0])>0: | ||
self.samples = ( | ||
np.vstack([self.samples[0], X]), | ||
np.concatenate([self.samples[1], Y]) | ||
) | ||
else: | ||
self.samples = (X, Y) | ||
|
||
self.train() | ||
|
||
def rm(self, indices): | ||
X, Y = self.samples | ||
|
||
if any(idx < 0 or idx >= len(Y) for idx in indices): | ||
raise IndexError("Index out of bounds.") | ||
|
||
mask = np.ones(len(Y), dtype=bool) | ||
mask[indices] = False | ||
|
||
self.samples = (X[mask], Y[mask]) | ||
|
||
if len(self.samples[1]) > 0: | ||
self.train() | ||
else: | ||
print("Warning: All data has been removed. Model is untrained now.") | ||
|
||
def clear_samples(self): | ||
self.samples = (np.empty((0, self.samples[0].shape[1])), np.empty((0,))) | ||
print("All training samples have been cleared.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
The touchscreen is segmented into four sections: | ||
|
||
1. The first two are circles located in the upper-right and lower-right corners. | ||
|
||
2. The third section is the area between these two circles. | ||
|
||
3. The fourth section is the largest, covering the entire left area. | ||
|
||
Upon pressing them, the display shows the following messages: | ||
|
||
1. Releasing without moving away will activate them. | ||
|
||
2. It indicates the elapsed time since the last training session. | ||
|
||
3. It shows the number of active classes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
id: gesture_classifier | ||
name: Gesture Classifier | ||
name[zh]: 手势分类 | ||
version: 1.0.0 | ||
author: Taorye@Sipeed | ||
icon: icon.png | ||
desc: Classify the hand gesture. | ||
files: | ||
- app.yaml | ||
- icon.png | ||
- main.py | ||
- LinearSVC.py | ||
- clf_dump.npz | ||
- trainSets.npz |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.