发布时间:2025-03-25 14:00:06编辑:222阅读(88)
确保已经安装好一个深度学习框架PyTorch
pip install torch torchvision -i https://mirrors.aliyun.com/pypi/simple
由于本节需要使用深度学习训练一个识别图形验证码的模型,因此还需要准备一些训练数据。训练数据又包含两部分,一部分是验证码图片,另一部分是验证码的真实标注。可以使用验证码生成器自行生成一些验证码,这样就同时有了验证码图片和标注数据。生成验证码图片需要用到一个叫做captcha的Python库,使用pip可以安装这个库,
pip install captcha
数据准备,第一步是生成随机的文本数据,第二步是根据生成的文本数据生成验证码图片。创建generate.py文件,其中定义了两个方法generate_captcha_text和generate_captcha_text_and_image, 分别用于完成这两步。
from captcha.image import ImageCaptcha from PIL import Image import random import time import setting import os def generate_captcha_text(): """ 用于生成随机的文本数据 :return: """ captcha_text = [] for i in range(setting.MAX_CAPTCHA): c = random.choice(setting.ALL_CHAR_SET) captcha_text.append(c) return ''.join(captcha_text) def generate_captcha_text_and_image(): """ :return: """ image = ImageCaptcha() captcha_text = generate_captcha_text() captcha_image = Image.open(image.generate(captcha_text)) return captcha_text, captcha_image if __name__ == '__main__': # count = 30000 # 生成验证码数 count = 3000 # 验证集数量 path = setting.EVAL_DATASET_PATH # 验证模型保存路径 # path = setting.EVAL_DATASET_PATH # 验证码图片保存路径 if not os.path.exists(path): os.makedirs(path) for i in range(count): now = str(int(time.time())) text, image = generate_captcha_text_and_image() filename = text + '_' + now + '.png' image.save(path + os.path.sep + filename) print('saved %d : %s' % (i + 1, filename))
运行代码,如下
这里生成了非常多的验证码数据,标注结果直接写在文件名中,最终生成的训练集数据如图:
利用同样的方法,可以生成验证集,用于验证模型的训练效果。还是修改count变量和path变量
count = 3000 # 验证集数量
path = setting.EVAL_DATASET_PATH # 验证模型保存路径
运行generate.py
数据都准备好后,就可以开始训练模型了。
3 模型训练
使用的深度学习模型是一个基本的CNN模型,模型定义在model.py文件里,代码如下:
import torch.nn as nn import setting # CNN Model (2 conv layer) class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 32, kernel_size=3, padding=1), nn.BatchNorm2d(32), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.layer2 = nn.Sequential( nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.layer3 = nn.Sequential( nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.fc = nn.Sequential( nn.Linear((setting.IMAGE_WIDTH // 8) * (setting.IMAGE_HEIGHT // 8) * 64, 1024), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU()) self.rfc = nn.Sequential( nn.Linear(1024, setting.MAX_CAPTCHA * setting.ALL_CHAR_SET_LEN), ) def forward(self, x): out = self.layer1(x) out = self.layer2(out) out = self.layer3(out) out = out.view(out.size(0), -1) out = self.fc(out) out = self.rfc(out) return out
可以看到这里定义了三层,每层都是Conv2d(卷积)、BatchNorm2d(批标准化)、Dropout(随机失活)、ReLU(激活函数)和MaxPool2d(池化)的组合,经过这三层处理后,由一个全连接网络层输出最终的结果,用于计算模型的最终损失。
模型的训练过程定义在train.py文件中,整个训练逻辑是这样的:
1 引入定义好的模型,即model.py文件,对模型进行初始化;
2 定义损失函数loss;
3 定义优化器optimizer;
4 加载数据,一般包括训练集数据和验证集数据;
5 执行训练,这个过程包括反向求导、模型权重更新;
6 在执行完特定的训练步数后,验证和保存模型。
了解了基本逻辑之后,就可以尝试用现有的数据训练一个深度学习模型了。运行train.py文件,结果如下
可以看到,训练过程中模型的损失在不断降低,说明模型在不断地学习和优化,同时每训练完一个轮次后都后执行一次模型验证。由于在训练模型时没有使用验证集数据,所以用验证集数据来验证是可以得到模型的真实正确率的,是更为科学的。经过一段时间的训练,模型的损失趋近于0,训练的正确率在不断提升,验证的正确率能达到96%以上,最后可以在本地看到一个best_model.pkl文件,这便是我们想要的模型。
4 测试 在dataset文件夹下面创建一个predict文件夹,在里面生成几张图片验证码进去,后在predict.py文件中加载上面得到的模型best_model.pkl,关键代码如下:
cnn.load_state_dict(torch.load('best_model.pkl'))
并根据加载的模型对定义的CNN模型的权重进行初始化,整个模型加载完毕后,就和刚才训练时一样强大,拥有识别图形验证码的能力。
运行predict.py文件
可以看到输出结果如下:
识别成功,这样就成功训练出了一个识别图形验证码的深度学习模型。
完整代码架构文件如下:
dataset.py,代码如下:
# -*- coding: UTF-8 -*- import os from torch.utils.data import DataLoader, Dataset import torchvision.transforms as transforms from PIL import Image import encoding as ohe import setting class mydataset(Dataset): def __init__(self, folder, transform=None): self.train_image_file_paths = [os.path.join(folder, image_file) for image_file in os.listdir(folder)] self.transform = transform def __len__(self): return len(self.train_image_file_paths) def __getitem__(self, idx): image_root = self.train_image_file_paths[idx] image_name = image_root.split(os.path.sep)[-1] image = Image.open(image_root) if self.transform is not None: image = self.transform(image) label = ohe.encode(image_name.split('_')[0]) return image, label transform = transforms.Compose([ # transforms.ColorJitter(), transforms.Grayscale(), transforms.ToTensor(), # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) def get_train_data_loader(): dataset = mydataset(setting.TRAIN_DATASET_PATH, transform=transform) return DataLoader(dataset, batch_size=64, shuffle=True) def get_eval_data_loader(): dataset = mydataset(setting.EVAL_DATASET_PATH, transform=transform) return DataLoader(dataset, batch_size=1, shuffle=True) def get_predict_data_loader(): dataset = mydataset(setting.PREDICT_DATASET_PATH, transform=transform) return DataLoader(dataset, batch_size=1, shuffle=True)
encoding.py 代码如下
import numpy as np import setting def encode(text): vector = np.zeros(setting.ALL_CHAR_SET_LEN * setting.MAX_CAPTCHA, dtype=float) def char2pos(c): if c == '_': k = 62 return k k = ord(c) - 48 if k > 9: k = ord(c) - 65 + 10 if k > 35: k = ord(c) - 97 + 26 + 10 if k > 61: raise ValueError('error') return k for i, c in enumerate(text): idx = i * setting.ALL_CHAR_SET_LEN + char2pos(c) vector[idx] = 1.0 return vector def decode(vec): char_pos = vec.nonzero()[0] text = [] for i, c in enumerate(char_pos): char_at_pos = i # c/63 char_idx = c % setting.ALL_CHAR_SET_LEN if char_idx < 10: char_code = char_idx + ord('0') elif char_idx < 36: char_code = char_idx - 10 + ord('A') elif char_idx < 62: char_code = char_idx - 36 + ord('a') elif char_idx == 62: char_code = ord('_') else: raise ValueError('error') text.append(chr(char_code)) return "".join(text) if __name__ == '__main__': e = encode("BK7H") print(decode(e))
evaluate.py 代码如下:
import numpy as np import torch from torch.autograd import Variable import setting import dataset from model import CNN import encoding def main(): cnn = CNN() cnn.eval() cnn.load_state_dict(torch.load('model.pkl')) print("load cnn net.") eval_dataloader = dataset.get_eval_data_loader() correct = 0 total = 0 for i, (images, labels) in enumerate(eval_dataloader): image = images vimage = Variable(image) predict_label = cnn(vimage) c0 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, 0:setting.ALL_CHAR_SET_LEN].data.numpy())] c1 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, setting.ALL_CHAR_SET_LEN:2 * setting.ALL_CHAR_SET_LEN].data.numpy())] c2 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, 2 * setting.ALL_CHAR_SET_LEN:3 * setting.ALL_CHAR_SET_LEN].data.numpy())] c3 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, 3 * setting.ALL_CHAR_SET_LEN:4 * setting.ALL_CHAR_SET_LEN].data.numpy())] predict_label = '%s%s%s%s' % (c0, c1, c2, c3) true_label = encoding.decode(labels.numpy()[0]) total += labels.size(0) if (predict_label == true_label): correct += 1 if (total % 200 == 0): print('Test Accuracy of the model on the %d eval images: %f %%' % (total, 100 * correct / total)) print('Test Accuracy of the model on the %d eval images: %f %%' % (total, 100 * correct / total)) return correct / total if __name__ == '__main__': main()
generate.py 代码如下
from captcha.image import ImageCaptcha from PIL import Image import random import time import setting import os def generate_captcha_text(): """ 用于生成随机的文本数据 :return: """ captcha_text = [] for i in range(setting.MAX_CAPTCHA): c = random.choice(setting.ALL_CHAR_SET) captcha_text.append(c) return ''.join(captcha_text) def generate_captcha_text_and_image(): """ :return: """ image = ImageCaptcha() captcha_text = generate_captcha_text() captcha_image = Image.open(image.generate(captcha_text)) return captcha_text, captcha_image if __name__ == '__main__': count = 30000 # 生成验证码数 # count = 3000 # 验证集数量 # path = setting.EVAL_DATASET_PATH # 验证模型保存路径 path = setting.TRAIN_DATASET_PATH # 验证码图片保存路径 if not os.path.exists(path): os.makedirs(path) for i in range(count): now = str(int(time.time())) text, image = generate_captcha_text_and_image() filename = text + '_' + now + '.png' image.save(path + os.path.sep + filename) print('saved %d : %s' % (i + 1, filename))
model.py 代码如下:
# -*- coding: UTF-8 -*- import torch.nn as nn import setting # CNN Model (2 conv layer) class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 32, kernel_size=3, padding=1), nn.BatchNorm2d(32), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.layer2 = nn.Sequential( nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.layer3 = nn.Sequential( nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU(), nn.MaxPool2d(2)) self.fc = nn.Sequential( nn.Linear((setting.IMAGE_WIDTH // 8) * (setting.IMAGE_HEIGHT // 8) * 64, 1024), nn.Dropout(0.5), # drop 50% of the neuron nn.ReLU()) self.rfc = nn.Sequential( nn.Linear(1024, setting.MAX_CAPTCHA * setting.ALL_CHAR_SET_LEN), ) def forward(self, x): out = self.layer1(x) out = self.layer2(out) out = self.layer3(out) out = out.view(out.size(0), -1) out = self.fc(out) out = self.rfc(out) return out
predict.py 代码如下:
# -*- coding: UTF-8 -*- import numpy as np import torch from torch.autograd import Variable import setting import dataset from model import CNN def main(): cnn = CNN() cnn.eval() cnn.load_state_dict(torch.load('best_model.pkl')) print("load cnn net.") predict_dataloader = dataset.get_predict_data_loader() for i, (images, labels) in enumerate(predict_dataloader): image = images vimage = Variable(image) predict_label = cnn(vimage) c0 = setting.ALL_CHAR_SET[np.argmax(predict_label[0, 0:setting.ALL_CHAR_SET_LEN].data.numpy())] c1 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, setting.ALL_CHAR_SET_LEN:2 * setting.ALL_CHAR_SET_LEN].data.numpy())] c2 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, 2 * setting.ALL_CHAR_SET_LEN:3 * setting.ALL_CHAR_SET_LEN].data.numpy())] c3 = setting.ALL_CHAR_SET[np.argmax( predict_label[0, 3 * setting.ALL_CHAR_SET_LEN:4 * setting.ALL_CHAR_SET_LEN].data.numpy())] c = '%s%s%s%s' % (c0, c1, c2, c3) print(c) if __name__ == '__main__': main()
setting.py 代码如下:
import os # 验证码中的字符 # string.digits + string.ascii_uppercase NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] ALPHABET = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] ALL_CHAR_SET = NUMBER + ALPHABET ALL_CHAR_SET_LEN = len(ALL_CHAR_SET) MAX_CAPTCHA = 4 # 图像大小 IMAGE_HEIGHT = 60 IMAGE_WIDTH = 160 # 训练数据集所在的路径,数据用于模型的训练。 TRAIN_DATASET_PATH = 'dataset' + os.path.sep + 'train' # 验证集所在的路径,一般在训练过程中或者训练完毕后用到,可用于验证模型的训练成果。 EVAL_DATASET_PATH = 'dataset' + os.path.sep + 'eval' # 推理集,一般在训练完毕后用到,可用于模型推理和测试。 PREDICT_DATASET_PATH = 'dataset' + os.path.sep + 'predict'
train.py 代码如下
# -*- coding: UTF-8 -*- import torch import torch.nn as nn from torch.autograd import Variable import dataset from model import CNN from evaluate import main as evaluate num_epochs = 30 batch_size = 100 learning_rate = 0.001 def main(): cnn = CNN() cnn.train() criterion = nn.MultiLabelSoftMarginLoss() optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate) max_eval_acc = -1 train_dataloader = dataset.get_train_data_loader() for epoch in range(num_epochs): for i, (images, labels) in enumerate(train_dataloader): images = Variable(images) labels = Variable(labels.float()) predict_labels = cnn(images) loss = criterion(predict_labels, labels) optimizer.zero_grad() loss.backward() optimizer.step() if (i + 1) % 10 == 0: print("epoch:", epoch, "step:", i, "loss:", loss.item()) if (i + 1) % 100 == 0: # current is model.pkl torch.save(cnn.state_dict(), "./model.pkl") print("save model") print("epoch:", epoch, "step:", i, "loss:", loss.item()) eval_acc = evaluate() if eval_acc > max_eval_acc: # best model save as best_model.pkl torch.save(cnn.state_dict(), "./best_model.pkl") print("save best model") torch.save(cnn.state_dict(), "./model.pkl") print("save last model") if __name__ == '__main__': main()
总结:介绍了利用深度学习模型识别图形验证码的整体流程,最终成功训练出模型,并得到了一个深度学习模型文件。往这个模型中输入一张图形验证码,它便会预测出其中额文本内容。由于各个环节使用的验证码都是由captcha库生成的,验证码风格也都是事先设定好的,所以模型的识别正确率会比较高。但如果输入其他类型的验证码,例如文本形状,文本数量,干扰线样式的不同,模型识别的正确率可能并不理想。为了让模型能够识别更多的验证码,可以多生成一些不同风格的验证码来训练模型,这样得到的模型会更加壮大。
当前模型的预测过程是通过命令行执行的,这在实际使用时可能并不方便。可以考虑将预测过程对接API服务器,例如Flask,Django,FastAPI等,把预测过程实现为一个支持post请求的API,这个API可以接受一张验证码图片,并返回验证码的文本信息,这样会使模型更加方便易用。
上一篇: Ai之大模型进阶模型微调-Unsloth微调-实战5.2
下一篇: 没有了
48279
47073
37954
35244
29749
26418
25372
20364
20045
18503
88°
90°
235°
148°
137°
191°
295°
144°
6122°
6845°