M-LOCK
基于AI的网络安全服务中学习模型可靠可用性的知识产权保护 / Protecting Intellectual Property with Reliable Availability of Learning Models in AI-based Cybersecurity Services
摘要 / Abstract
本文提出了一种新型模型锁定方案(M-LOCK),旨在通过内置的主动验证机制增强深度神经网络(DNN)的可用性保护。该方案通过数据中毒方法(DPMM)训练模型,使其仅在输入包含特定令牌(Token)时输出正确预测,否则准确率显著降低。实验表明,M-LOCK在多个基准数据集上平均将未授权输入的准确率降至10%,同时保持授权输入的高性能。
This paper proposes a novel model locking scheme (M-LOCK) to enhance the availability protection of DNNs through a built-in active verification mechanism. By training models with a data poisoning-based manipulation method (DPMM), M-LOCK ensures correct predictions only when inputs contain a specific token, while drastically reducing accuracy for unauthorized inputs. Experiments demonstrate that M-LOCK reduces the accuracy of unauthorized inputs to an average of 10% across benchmark datasets while maintaining high performance for authorized inputs.
核心方法 / Key Methods
1. M-LOCK 方案
令牌嵌入:在授权输入中嵌入特定令牌(如多像素点或贴纸),未授权输入保持原样。
数据中毒策略:将训练集分为授权数据(保留真实标签)和未授权数据(标签被篡改),通过优化互信息差异训练模型。
2. DPMM 方法
干扰策略:
单一目标:将未授权数据标签固定为某一错误类别。
随机目标:将未授权数据标签随机分配。
随机分布:将未授权数据标签设为均匀分布。
1. M-LOCK Scheme
Token Embedding: Authorized inputs embed specific tokens (e.g., multi-dots or stickers), while unauthorized inputs remain unchanged.
Data Poisoning: Split the training set into authorized data (with true labels) and unauthorized data (with manipulated labels), optimizing mutual information divergence.
2. DPMM Method
Interference Strategies:
Single Target: Assign unauthorized data labels to a fixed incorrect class.
Random Target: Randomly assign unauthorized data labels.
Random Distribution: Set unauthorized data labels to a uniform distribution.
主要实验数据 / Key Experimental Results
表1:不同策略下的准确率对比(部分数据)
| 数据集 | 基线准确率(%) | 单一目标策略(授权/未授权) | 随机分布策略(授权/未授权) |
|-----------------|----------------|---------------------------|---------------------------|
| MNIST | 98.45 | 98.32 / 8.85 | 98.51 / 9.73 |
| CIFAR-10 | 89.76 | 89.47 / 10.20 | 90.41 / 8.06 |
| CIFAR-100 | 69.03 | 68.27 / 1.01 | 69.84 / 2.61 |
| GTSRB(交通标志)| 98.21 | 86.31 / 6.34 | 96.29 / 49.36 |
Table 1: Accuracy Comparison Across Strategies (Partial Data)
| Dataset | Baseline Acc.(%) | Single Target (Auth/Unauth) | Random Distribution (Auth/Unauth) |
|-----------------|------------------|-----------------------------|-----------------------------------|
| MNIST | 98.45 | 98.32 / 8.85 | 98.51 / 9.73 |
| CIFAR-10 | 89.76 | 89.47 / 10.20 | 90.41 / 8.06 |
| CIFAR-100 | 69.03 | 68.27 / 1.01 | 69.84 / 2.61 |
| GTSRB (Traffic) | 98.21 | 86.31 / 6.34 | 96.29 / 49.36 |
图1:中毒比例(γ)对准确率的影响
授权数据准确率随γ增加而提高(γ=0.5时达90.8%)。
未授权数据准确率稳定在10%左右(γ < 0.9时)。

Figure 1: Impact of Poisoning Ratio (γ)
Authorized Accuracy increases with γ (reaching 90.8% at γ=0.5).
Unauthorized Accuracy remains ~10% (γ < 0.9).
安全性分析 / Security Analysis
1. 抗模型提取攻击
攻击者通过API查询训练替代模型,但M-LOCK模型在未授权输入下的准确率仅10.01%(CIFAR-10)。
2. 抗模型微调攻击
对受保护模型进行微调后,未授权输入的准确率仍接近1/N(N为类别数)。
3. 抗令牌逆向工程
使用Neural Cleanse等工具生成的伪令牌无法有效解锁模型(准确率低于11%)。
1. Resistance to Model Extraction
Substitute models trained via API queries achieve only 10.01% accuracy on unauthorized inputs (CIFAR-10).
2. Resistance to Fine-Tuning
After fine-tuning, unauthorized accuracy remains close to 1/N (N: number of classes).
3. Resistance to Token Reverse Engineering
Synthesized tokens via tools like Neural Cleanse fail to unlock models (accuracy <11%).
结论 / Conclusion
M-LOCK通过内置令牌验证和数据中毒训练,显著降低了未授权模型使用的可行性,同时保持授权服务的高性能。该方法在多种攻击场景下表现鲁棒,为AI模型的知识产权保护提供了新思路。
M-LOCK enhances the protection of AI model intellectual property by embedding token verification and data poisoning, effectively reducing unauthorized model utility while maintaining high performance for authorized services. Its robustness against various attacks offers a novel solution for secure AI deployment.