验证码多种多样,我这里提供的方法仅对有噪点的验证码进行识别有效。
首先,这是我准备的原始图片 4.png
具体的实现代码
import tesserocrfrom PIL import Image, ImageDrawimport time# image = Image.open("img/4_1.png")# fh = open("img/1.txt", "w")# w, h = image.size# 图片转文本,测试用# for i in range(h):# for j in range(w):# cl = image.getpixel((j, i))# clall = cl[0] + cl[1] + cl[2]# # clall == 0即当前像素为黑色# if clall == 0:# fh.write("0")# else:# fh.write("1")# fh.write("\n")# fh.close()# 将图片转为黑白二色def black_white(image): w, h = image.size for i in range(h): for j in range(w): cl = image.getpixel((j, i)) clall = cl[0] + cl[1] + cl[2] # clall == 0即当前像素为黑色 if clall >= 155*3: # 根据具体的图片修改 image.putpixel((j, i), (255, 255, 255)) else: image.putpixel((j, i), (0, 0, 0))#二值数组t2val = {}def twoValue(image,G): for y in range(0,image.size[1]): for x in range(0,image.size[0]): g = image.getpixel((x,y)) if g > G: t2val[(x,y)] = 1 else: t2val[(x,y)] = 0# 降噪# 根据一个点A的RGB值,与周围的8个点的RBG值比较,设定一个值N(0<8),当A的RGB值与周围8个点的RGB相等数小于N时,此点为噪点# G: Integer 图像二值化阀值 N: Integer 降噪率 0 <8 Z: Integer 降噪次数def clearNoise(image,N,Z): for i in range(0,Z): t2val[(0,0)] = 1 t2val[(image.size[0] - 1,image.size[1] - 1)] = 1 for x in range(1,image.size[0] - 1): for y in range(1,image.size[1] - 1): nearDots = 0 L = t2val[(x,y)] if L == t2val[(x - 1,y - 1)]: nearDots += 1 if L == t2val[(x - 1,y)]: nearDots += 1 if L == t2val[(x- 1,y + 1)]: nearDots += 1 if L == t2val[(x,y - 1)]: nearDots += 1 if L == t2val[(x,y + 1)]: nearDots += 1 if L == t2val[(x + 1,y - 1)]: nearDots += 1 if L == t2val[(x + 1,y)]: nearDots += 1 if L == t2val[(x + 1,y + 1)]: nearDots += 1 if nearDots < N: t2val[(x,y)] = 1def saveImage(filename,size): image = Image.new("1",size) draw = ImageDraw.Draw(image) for x in range(0,size[0]): for y in range(0,size[1]): draw.point((x,y),t2val[(x,y)]) image.save(filename)def start(img_path,save_img_path): image = Image.open(img_path) black_white(image) image = image.convert("L") twoValue(image,100) clearNoise(image,4,1) saveImage(save_img_path,image.size) print(tesserocr.file_to_text(save_img_path))img_path = "img/4.png"save_img_path = "img/4_1.png"start(img_path, save_img_path)
经过处理后得到以下图片 4_1.png
控制台输出结果
ziri
不过以上是在理想情况下的实现,对于某些图片的识别率不高
等后期加上一些算法提高识别率把。