我一开始试了试uncompyle6,然而并没有什么用,感觉代码可能被改了很多。后面搜索到了方法,来自 https://0xd13a.github.io/ctfs/0ctf2017/py/

首先阅读https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html ,了解到pyc文件由三部分组成,分别是magic code(4字节)、时间戳(4字节)、marshal后的code(剩下)组成。作者提供了一个利用dis库、marshal库的程序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# disassem.py
import dis, marshal, struct, sys, time, types
import roto
def show_file(fname):
f = open(fname, "rb")
magic = f.read(4)
moddate = f.read(4)
modtime = time.asctime(time.localtime(struct.unpack('I', moddate)[0]))
print "magic %s" % (magic.encode('hex'))
print "moddate %s (%s)" % (moddate.encode('hex'), modtime)
code = marshal.load(f)
show_code(code)

def show_code(code, indent=''):
print "%scode" % indent
indent += ' '
print "%sargcount %d" % (indent, code.co_argcount)
print "%snlocals %d" % (indent, code.co_nlocals)
print "%sstacksize %d" % (indent, code.co_stacksize)
print "%sflags %04x" % (indent, code.co_flags)
print("code length: %d" % (len(code.co_code)))
show_hex("code", code.co_code, indent=indent)
dis.disassemble(code)
print "%sconsts" % indent
for const in code.co_consts:
if type(const) == types.CodeType:
show_code(const, indent+' ')
else:
print " %s%r" % (indent, const)
print "%snames %r" % (indent, code.co_names)
print "%svarnames %r" % (indent, code.co_varnames)
print "%sfreevars %r" % (indent, code.co_freevars)
print "%scellvars %r" % (indent, code.co_cellvars)
print "%sfilename %r" % (indent, code.co_filename)
print "%sname %r" % (indent, code.co_name)
print "%sfirstlineno %d" % (indent, code.co_firstlineno)
show_hex("lnotab", code.co_lnotab, indent=indent)

def show_hex(label, h, indent):
h = h.encode('hex')
if len(h) < 60:
print "%s%s %s" % (indent, label, h)
else:
print "%s%s" % (indent, label)
for i in range(0, len(h), 60):
print "%s %s" % (indent, h[i:i+60])

show_file(sys.argv[1])

然后我把这个代码改成python3,打开python3 disassem.py crypt.pyc,marshal库会报错ValueError,文档里面显示:

1
because the data has a different Python version’s incompatible marshal format), raise [`EOFError`](exceptions.html#EOFError), [`ValueError`](exceptions.html#ValueError) or [`TypeError`](exceptions.html#TypeError). The file must be a readable [binary file](../glossary.html#term-binary-file).`

也就是marshal这个库相对于pickle,不需要持久化很久,所以它会随着python版本发生改变。因为magic code由4个字节组成,前两个字节指示python版本号,可以通过查看python源码https://github.com/python/cpython/tree/2.7 中Python文件夹里的import.c文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Known values:
Python 1.5: 20121
Python 1.5.1: 20121
Python 1.5.2: 20121
Python 1.6: 50428
Python 2.0: 50823
Python 2.0.1: 50823
Python 2.1: 60202
Python 2.1.1: 60202
Python 2.1.2: 60202
Python 2.2: 60717
Python 2.3a0: 62011
Python 2.3a0: 62021
Python 2.3a0: 62011 (!)
Python 2.4a0: 62041
Python 2.4a3: 62051
Python 2.4b1: 62061
Python 2.5a0: 62071
Python 2.5a0: 62081 (ast-branch)
Python 2.5a0: 62091 (with)
Python 2.5a0: 62092 (changed WITH_CLEANUP opcode)
Python 2.5b3: 62101 (fix wrong code: for x, in ...)
Python 2.5b3: 62111 (fix wrong code: x += yield)
Python 2.5c1: 62121 (fix wrong lnotab with for loops and
storing constants that should have been removed)
Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp)
Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode)
Python 2.6a1: 62161 (WITH_CLEANUP optimization)
Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND)
Python 2.7a0: 62181 (optimize conditional branches:
introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
Python 2.7a0 62191 (introduce SETUP_WITH)
Python 2.7a0 62201 (introduce BUILD_SET)
Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD)

这个pyc文件前两位是03 F3小端排列,相当于十进制的62211,所以这个pyc文件需要python2.7的解释器执行。

使用python2.7用同样的命令打开,在第23行dis.disassemble(code)会报错,因为这个pyc改了很多opcode,有些无法找到对应的opcode,因此注释这一行后继续执行,得到下面的结果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
magic 03f30d0a
moddate 66346f58 (Fri Jan 6 01:08:38 2017)
code
argcount 0
nlocals 0
stacksize 2
flags 0040
code length: 34
code
990000990100860000910000990200880000910100990300880000910200
99010053
consts
-1
None
code
argcount 1
nlocals 6
stacksize 3
flags 0043
code length: 92
code
990100680100990200680200990300680300610100990400469905002761
020061010027610300279906004627990500276102009906004627990700
276804009b00006001006104008301006805006105006002006100008301
0053
consts
None
'!@#$%^&*'
'abcdefgh'
'<>{}:"'
4
'|'
2
'EOF'
names ('rotor', 'newrotor', 'encrypt')
varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot')
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name 'encrypt'
firstlineno 2
lnotab 00010601060106012e010f01
code
argcount 1
nlocals 6
stacksize 3
flags 0043
code length: 92
code
990100680100990200680200990300680300610100990400469905002761
020061010027610300279906004627990500276102009906004627990700
276804009b00006001006104008301006805006105006002006100008301
0053
consts
None
'!@#$%^&*'
'abcdefgh'
'<>{}:"'
4
'|'
2
'EOF'
names ('rotor', 'newrotor', 'decrypt')
varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot')
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name 'decrypt'
firstlineno 10
lnotab 00010601060106012e010f01
names ('rotor', 'encrypt', 'decrypt')
varnames ()
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name '<module>'
firstlineno 1
lnotab 0c010908

const是python代码中的静态变量,varnames是普通变量,names是外部变量。比如a = str(1) + '2',其中a存在varnames,str存在names,1和’2’存在const中。

可以看到上面的pyc中使用了rotor这个外部变量,搜索后发现这是一个加密的库(http://www.bugingcode.com/blog/python_rotor.html ),大概用法如下:

1
2
3
4
5
import rotor
key = '12345'
rot = rotor.newrotor(key)
rot.encrypt(balabala);
rot.decrypt(balabala);

看到pyc里有key_a、key_b、key_c、secret4个变量,大概会是有abc来生成secret。因为pyc文件里opcode会识别不了,所以我想要不改改dis.disassemble这个函数,让能识别先识别,不能的就留opcode的值。下面就是改了disassemble。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
if op >= HAVE_ARGUMENT:
oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = oparg*65536L
print repr(oparg).rjust(5),
if op in hasconst:
if oparg >= len(co.co_consts):
print '(const 0x' + hex(oparg) + ')', # 添加判断
else:
print '(' + repr(co.co_consts[oparg]) + ')',
elif op in hasname:
if oparg >= len(co.co_names):
print '(names 0x' + hex(oparg) + ')', # 添加判断
else:
print '(' + co.co_names[oparg] + ')',
elif op in hasjrel:
print '(to ' + repr(i + oparg) + ')',
elif op in haslocal:
print '(' + co.co_varnames[oparg] + ')',
elif op in hascompare:
print '(' + cmp_op[oparg] + ')',
elif op in hasfree:
if free is None:
free = co.co_cellvars + co.co_freevars
if oparg >= len(free):
print '(free 0x' + hex(oparg) + ')', # 添加判断
else:
print '(' + free[oparg] + ')',
print

然后再次运行python3 disassem.py crypt.pyc,但是得到的结果很迷:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
magic 03f30d0a
moddate 66346f58 (Fri Jan 6 01:08:38 2017)
code
argcount 0
nlocals 0
stacksize 2
flags 0040
code length: 34
code
990000990100860000910000990200880000910100990300880000910200
99010053
1 0 <153> 0
3 <153> 1
6 MAKE_CLOSURE 0
9 EXTENDED_ARG 0

2 12 <153> 2L
15 LOAD_DEREF 0 (free 0x0x0)
18 EXTENDED_ARG 1

10 21 <153> 65539L
24 LOAD_DEREF 0 (free 0x0x0)
27 EXTENDED_ARG 2
30 <153> 131073L
33 RETURN_VALUE
consts
-1
None
code
argcount 1
nlocals 6
stacksize 3
flags 0043
code length: 92
code
990100680100990200680200990300680300610100990400469905002761
020061010027610300279906004627990500276102009906004627990700
276804009b00006001006104008301006805006105006002006100008301
0053
3 0 <153> 1
3 BUILD_SET 1

4 6 <153> 2
9 BUILD_SET 2

5 12 <153> 3
15 BUILD_SET 3

6 18 STORE_GLOBAL 1 (newrotor)
21 <153> 4
24 PRINT_EXPR
25 <153> 5
28 <39>
29 STORE_GLOBAL 2 (encrypt)
32 STORE_GLOBAL 1 (newrotor)
35 <39>
36 STORE_GLOBAL 3 (names 0x0x3)
39 <39>
40 <153> 6
43 PRINT_EXPR
44 <39>
45 <153> 5
48 <39>
49 STORE_GLOBAL 2 (encrypt)
52 <153> 6
55 PRINT_EXPR
56 <39>
57 <153> 7
60 <39>
61 BUILD_SET 4

7 64 <155> 0
67 DELETE_ATTR 1 (newrotor)
70 STORE_GLOBAL 4 (names 0x0x4)
73 CALL_FUNCTION 1
76 BUILD_SET 5

8 79 STORE_GLOBAL 5 (names 0x0x5)
82 DELETE_ATTR 2 (encrypt)
85 STORE_GLOBAL 0 (rotor)
88 CALL_FUNCTION 1
91 RETURN_VALUE
consts
None
'!@#$%^&*'
'abcdefgh'
'<>{}:"'
4
'|'
2
'EOF'
names ('rotor', 'newrotor', 'encrypt')
varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot')
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name 'encrypt'
firstlineno 2
lnotab 00010601060106012e010f01
code
argcount 1
nlocals 6
stacksize 3
flags 0043
code length: 92
code
990100680100990200680200990300680300610100990400469905002761
020061010027610300279906004627990500276102009906004627990700
276804009b00006001006104008301006805006105006002006100008301
0053
11 0 <153> 1
3 BUILD_SET 1

12 6 <153> 2
9 BUILD_SET 2

13 12 <153> 3
15 BUILD_SET 3

14 18 STORE_GLOBAL 1 (newrotor)
21 <153> 4
24 PRINT_EXPR
25 <153> 5
28 <39>
29 STORE_GLOBAL 2 (decrypt)
32 STORE_GLOBAL 1 (newrotor)
35 <39>
36 STORE_GLOBAL 3 (names 0x0x3)
39 <39>
40 <153> 6
43 PRINT_EXPR
44 <39>
45 <153> 5
48 <39>
49 STORE_GLOBAL 2 (decrypt)
52 <153> 6
55 PRINT_EXPR
56 <39>
57 <153> 7
60 <39>
61 BUILD_SET 4

15 64 <155> 0
67 DELETE_ATTR 1 (newrotor)
70 STORE_GLOBAL 4 (names 0x0x4)
73 CALL_FUNCTION 1
76 BUILD_SET 5

16 79 STORE_GLOBAL 5 (names 0x0x5)
82 DELETE_ATTR 2 (decrypt)
85 STORE_GLOBAL 0 (rotor)
88 CALL_FUNCTION 1
91 RETURN_VALUE
consts
None
'!@#$%^&*'
'abcdefgh'
'<>{}:"'
4
'|'
2
'EOF'
names ('rotor', 'newrotor', 'decrypt')
varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot')
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name 'decrypt'
firstlineno 10
lnotab 00010601060106012e010f01
names ('rotor', 'encrypt', 'decrypt')
varnames ()
freevars ()
cellvars ()
filename '/Users/hen/Lab/0CTF/py/crypt.py'
name '<module>'
firstlineno 1
lnotab 0c010908

结果很迷,特别是150行、151行,应该是load attr、load name然后call function才对。然后我想着要不解析手工试试,python指令格式如: opcode + 参数位置 + 0x00组合,比如load_const 1 表示加载第1个(从0开始)的const变量,load_const的opcode是0x64,所以load_const 1表示0x64 0x01 0x00。opcode对应数字可以在https://github.com/python/cpython/blob/2.7/Include/opcode.h 查看,其他的比如return 只有一个字节0x53表示,然后我按照这个规律分析code。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
990100  LOAD CONST  '!@#$%^&*'
680100 store varname keya
990200 LOAD CONST 'abcdefgh'
680200 store varname keyb
990300 LOAD CONST '<>{}:"'
680300 store varname keyc
610100 load varname keya
990400 LOAD CONST 4
46 某个操作需要参数key_a与4
990500 LOAD CONST '|'
27
610200 load varname keyb
610100 load varname keya
27
610300 load varname keyc
27
990600 LOAD CONST 2
46 某个与2
27
990500 LOAD CONST '|'
27
610200 load varname keyb
990600 LOAD CONST 2
46 keyb与2的操作
27 难道是+号?
990700 LOAD CONST 'EOF'
27
680400 store varname secret
9b0000
600100 load attr newrotor
610400 load varname secret
830100 call function newrotor
680500 store varname rot
610500 load varname rot
600200 load attr encrypt
610000 load varname data
830100 call function encrypt
53 return

但是如果按照opcode来翻译上面的代码会出现问题,比如610000应该store_global、990700等等不对应的情况,所以我看了看大佬怎么写的,发现他是自己写一个rotor的脚本,再反编译后与crypt.pyc作对比,我按照他的思路将字节码翻译,特别的99应该是const因为第26行99070取出第8个变量,而只有const静态变量里面有8个元素,故99表示load const。但是里面的0x46、0x27并不知道是怎么操作,比如第9行前,load varname keya,load const 4, 相当于op1(keya, 4),python里面使用栈来传参数,参数顺序按照栈底到栈顶的顺序。因此观察0x46和0x27的部分代码,将其变为伪代码,其中op1表示0x46、op2表示0x27:

1
secret = op2(op1(op1(op2(op1(keya, 4), '|'), op1(op2(op2(keyb, keya), keyc), 2)), '|'), op2(keyb, 2))

其中keya是字符串,而4是整数,那么op1(keya, 4)的操作应该是*,返回字符串,而op2(op1(keya, 4), '|')是两个字符串的操作,应该是+,故secret可以通过下面的代码得到:

secret = key_a * 4 + '|' + (key_b + key_a + key_c)*2 + '|' + key_b*2 + 'EOF'

然后将这串代码写成脚本来实现对encrypt_flag的解密

1
2
3
4
5
6
7
8
import rotor
key_a = '!@#$%^&*'
key_b = 'abcdefgh'
key_c = '<>{}:"'
secret = key_a * 4 + '|' + (key_b + key_a + key_c)*2 + '|' + key_b*2 + 'EOF'

rot = rotor.newrotor(secret)
print(rot.decrypt(open('encrypted_flag', 'rb').read()))

最后得到答案:flag{Gue55_opcode_G@@@me}

参考

python字节码 http://www.bravegnu.org/blog/python-byte-code-hacks.html
opcode参考 http://unpyc.sourceforge.net/Opcodes.html