不知道大家是否见过这个东西,反正我清楚记得我是见过的,可是为什么会写这个东西?有以下一个场景:

怎样将一个字符串中的中文字符统一转换成英文字符?

不知道大家会怎样去做?先列举一个例子:

'hello i'm jim'将其中的 'e' 转换成 'i' , 'o' 转换成 '!'

我的做法是:

s = 'hello i\'m jim'
ch = 'eo'
en_ch = 'i!'

for i, c in enumrate(s):
    if c in ch:
        s = s.replace(c, en_ch[ch.index(c)])

简直就是so easy! , 可是... 这两个字太可怕了,后端老大直接开喷,写的啥玩意.难道还有什么好方法?只能悄咪咪的看人家写喽

s = 'hello i\'m jim'
ch = 'eo'
en_ch = 'i!'
TRANS_TABLE = {ord(f): ord(t) for f, t in zip(ch, en_ch)}

s.translate(TRANS_TABLE)

呦吼!!!,这是什么鬼?怎么记得在哪里看见过,想不起来了......赶紧百度,原来这是翻译表,怎么操作呢?

from string import maketrans
s = 'hello i\'m jim'
ch = 'eo'
en_ch = 'i!'

# python2.0+
# from string import maketrans
# TRANS_TABLE = maketrans(ch, en_ch)
# python3.0+
TRANS_TABLE = str.maketrans(ch, en_ch)

s.translate(TRANS_TABLE)

这和后端老大写的不一样呀?等价吗?

# python2.7
from string import maketrans
maketrans('eo', 'i!')
# >>> '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdifghijklmn!pqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

# python3.5
str.maketrans('eo', 'i!')
# >>> {101: 105, 111: 33}

原来如此...通过上面的注释可以看到python3中的字典表很简单清晰,而python2中是一大串看不懂的十六进制数据,其实在python2中默认将0-255整数对应的字符都先进行了转换生成字符串,然后再将传入的第一个参数字符对应的字符串中的位置用第二个参数字符依次进行替换

如:ÿ对应为255,及默认生成字符串的最后一个十六进制数据xff对应的字符,maketrans('1', '2'),那么会发现上面字符串中的1变成了2,字符串我写在文章最下方.

这就是python2/3的一个处理优化.
这样就可以很方便的去处理字符串,而不是像我这样小白只能循环,当然我也不是只能想到循环,还有正则,哈哈哈,但是感觉这种更舒服.最后感叹一句:革命尚未成功,同志仍需努力...

# '1'替换'2'结果字符串
# \x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&\'()*+,-./0223456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff

python2中maketrans源码:

l = map(chr, xrange(256))
_idmap = str('').join(l)
del l
# Construct a translation string
_idmapL = None
def maketrans(fromstr, tostr):
    """maketrans(frm, to) -> string

    Return a translation table (a string of 256 bytes long)
    suitable for use in string.translate.  The strings frm and to
    must be of the same length.

    """
    if len(fromstr) != len(tostr):
        raise ValueError, "maketrans arguments must have same length"
    global _idmapL
    if not _idmapL:
        _idmapL = list(_idmap)
    L = _idmapL[:]
    fromstr = map(ord, fromstr)
    for i in range(len(fromstr)):
        L[fromstr[i]] = tostr[i]
    return ''.join(L)

字符串的这个东西-翻译表

python2中maketrans源码: