我一开始试了试uncompyle6,然而并没有什么用,感觉代码可能被改了很多。后面搜索到了方法,来自 https://0xd13a.github.io/ctfs/0ctf2017/py/
首先阅读https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html ,了解到pyc文件由三部分组成,分别是magic code(4字节)、时间戳(4字节)、marshal后的code(剩下)组成。作者提供了一个利用dis库、marshal库的程序。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 import dis, marshal, struct, sys, time, typesimport rotodef show_file (fname ): f = open (fname, "rb" ) magic = f.read(4 ) moddate = f.read(4 ) modtime = time.asctime(time.localtime(struct.unpack('I' , moddate)[0 ])) print "magic %s" % (magic.encode('hex' )) print "moddate %s (%s)" % (moddate.encode('hex' ), modtime) code = marshal.load(f) show_code(code) def show_code (code, indent='' ): print "%scode" % indent indent += ' ' print "%sargcount %d" % (indent, code.co_argcount) print "%snlocals %d" % (indent, code.co_nlocals) print "%sstacksize %d" % (indent, code.co_stacksize) print "%sflags %04x" % (indent, code.co_flags) print ("code length: %d" % (len (code.co_code))) show_hex("code" , code.co_code, indent=indent) dis.disassemble(code) print "%sconsts" % indent for const in code.co_consts: if type (const) == types.CodeType: show_code(const, indent+' ' ) else : print " %s%r" % (indent, const) print "%snames %r" % (indent, code.co_names) print "%svarnames %r" % (indent, code.co_varnames) print "%sfreevars %r" % (indent, code.co_freevars) print "%scellvars %r" % (indent, code.co_cellvars) print "%sfilename %r" % (indent, code.co_filename) print "%sname %r" % (indent, code.co_name) print "%sfirstlineno %d" % (indent, code.co_firstlineno) show_hex("lnotab" , code.co_lnotab, indent=indent) def show_hex (label, h, indent ): h = h.encode('hex' ) if len (h) < 60 : print "%s%s %s" % (indent, label, h) else : print "%s%s" % (indent, label) for i in range (0 , len (h), 60 ): print "%s %s" % (indent, h[i:i+60 ]) show_file(sys.argv[1 ])
然后我把这个代码改成python3,打开python3 disassem.py crypt.pyc
,marshal库会报错ValueError,文档里面显示:
1 because the data has a different Python version’s incompatible marshal format), raise [`EOFError`](exceptions.html#EOFError), [`ValueError`](exceptions.html#ValueError) or [`TypeError`](exceptions.html#TypeError). The file must be a readable [binary file](../glossary.html#term-binary-file).`
也就是marshal这个库相对于pickle,不需要持久化很久,所以它会随着python版本发生改变。因为magic code由4个字节组成,前两个字节指示python版本号,可以通过查看python源码https://github.com/python/cpython/tree/2.7 中Python文件夹里的import.c
文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Known values: Python 1.5: 20121 Python 1.5.1: 20121 Python 1.5.2: 20121 Python 1.6: 50428 Python 2.0: 50823 Python 2.0.1: 50823 Python 2.1: 60202 Python 2.1.1: 60202 Python 2.1.2: 60202 Python 2.2: 60717 Python 2.3a0: 62011 Python 2.3a0: 62021 Python 2.3a0: 62011 (!) Python 2.4a0: 62041 Python 2.4a3: 62051 Python 2.4b1: 62061 Python 2.5a0: 62071 Python 2.5a0: 62081 (ast-branch) Python 2.5a0: 62091 (with) Python 2.5a0: 62092 (changed WITH_CLEANUP opcode) Python 2.5b3: 62101 (fix wrong code: for x, in ...) Python 2.5b3: 62111 (fix wrong code: x += yield) Python 2.5c1: 62121 (fix wrong lnotab with for loops and storing constants that should have been removed) Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp) Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode) Python 2.6a1: 62161 (WITH_CLEANUP optimization) Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND) Python 2.7a0: 62181 (optimize conditional branches: introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE) Python 2.7a0 62191 (introduce SETUP_WITH) Python 2.7a0 62201 (introduce BUILD_SET) Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD)
这个pyc文件前两位是03 F3小端排列,相当于十进制的62211,所以这个pyc文件需要python2.7的解释器执行。
使用python2.7用同样的命令打开,在第23行dis.disassemble(code)
会报错,因为这个pyc改了很多opcode,有些无法找到对应的opcode,因此注释这一行后继续执行,得到下面的结果。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 magic 03f30d0a moddate 66346f58 (Fri Jan 6 01:08:38 2017) code argcount 0 nlocals 0 stacksize 2 flags 0040 code length: 34 code 990000990100860000910000990200880000910100990300880000910200 99010053 consts -1 None code argcount 1 nlocals 6 stacksize 3 flags 0043 code length: 92 code 990100680100990200680200990300680300610100990400469905002761 020061010027610300279906004627990500276102009906004627990700 276804009b00006001006104008301006805006105006002006100008301 0053 consts None '!@#$%^&*' 'abcdefgh' '<>{}:"' 4 '|' 2 'EOF' names ('rotor', 'newrotor', 'encrypt') varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot') freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name 'encrypt' firstlineno 2 lnotab 00010601060106012e010f01 code argcount 1 nlocals 6 stacksize 3 flags 0043 code length: 92 code 990100680100990200680200990300680300610100990400469905002761 020061010027610300279906004627990500276102009906004627990700 276804009b00006001006104008301006805006105006002006100008301 0053 consts None '!@#$%^&*' 'abcdefgh' '<>{}:"' 4 '|' 2 'EOF' names ('rotor', 'newrotor', 'decrypt') varnames ('data', 'key_a', 'key_b', 'key_c', 'secret', 'rot') freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name 'decrypt' firstlineno 10 lnotab 00010601060106012e010f01 names ('rotor', 'encrypt', 'decrypt') varnames () freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name '<module>' firstlineno 1 lnotab 0c010908
const是python代码中的静态变量,varnames是普通变量,names是外部变量。比如a = str(1) + '2'
,其中a存在varnames,str存在names,1和’2’存在const中。
可以看到上面的pyc中使用了rotor这个外部变量,搜索后发现这是一个加密的库(http://www.bugingcode.com/blog/python_rotor.html ),大概用法如下:
1 2 3 4 5 import rotorkey = '12345' rot = rotor.newrotor(key) rot.encrypt(balabala); rot.decrypt(balabala);
看到pyc里有key_a、key_b、key_c、secret4个变量,大概会是有abc来生成secret。因为pyc文件里opcode会识别不了,所以我想要不改改dis.disassemble这个函数,让能识别先识别,不能的就留opcode的值。下面就是改了disassemble。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 if op >= HAVE_ARGUMENT: oparg = ord (code[i]) + ord (code[i+1 ])*256 + extended_arg extended_arg = 0 i = i+2 if op == EXTENDED_ARG: extended_arg = oparg*65536L print repr (oparg).rjust(5 ), if op in hasconst: if oparg >= len (co.co_consts): print '(const 0x' + hex (oparg) + ')' , else : print '(' + repr (co.co_consts[oparg]) + ')' , elif op in hasname: if oparg >= len (co.co_names): print '(names 0x' + hex (oparg) + ')' , else : print '(' + co.co_names[oparg] + ')' , elif op in hasjrel: print '(to ' + repr (i + oparg) + ')' , elif op in haslocal: print '(' + co.co_varnames[oparg] + ')' , elif op in hascompare: print '(' + cmp_op[oparg] + ')' , elif op in hasfree: if free is None : free = co.co_cellvars + co.co_freevars if oparg >= len (free): print '(free 0x' + hex (oparg) + ')' , else : print '(' + free[oparg] + ')' , print
然后再次运行python3 disassem.py crypt.pyc
,但是得到的结果很迷:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 magic 03f30d0a moddate 66346f58 (Fri Jan 6 01:08:38 2017 ) code argcount 0 nlocals 0 stacksize 2 flags 0040 code length: 34 code 990000990100860000910000990200880000910100990300880000910200 99010053 1 0 <153 > 0 3 <153 > 1 6 MAKE_CLOSURE 0 9 EXTENDED_ARG 0 2 12 <153 > 2L 15 LOAD_DEREF 0 (free 0x0x0) 18 EXTENDED_ARG 1 10 21 <153 > 65539L 24 LOAD_DEREF 0 (free 0x0x0) 27 EXTENDED_ARG 2 30 <153 > 131073L 33 RETURN_VALUE consts -1 None code argcount 1 nlocals 6 stacksize 3 flags 0043 code length: 92 code 990100680100990200680200990300680300610100990400469905002761 020061010027610300279906004627990500276102009906004627990700 276804009b00006001006104008301006805006105006002006100008301 0053 3 0 <153 > 1 3 BUILD_SET 1 4 6 <153 > 2 9 BUILD_SET 2 5 12 <153 > 3 15 BUILD_SET 3 6 18 STORE_GLOBAL 1 (newrotor) 21 <153 > 4 24 PRINT_EXPR 25 <153 > 5 28 <39 > 29 STORE_GLOBAL 2 (encrypt) 32 STORE_GLOBAL 1 (newrotor) 35 <39 > 36 STORE_GLOBAL 3 (names 0x0x3) 39 <39 > 40 <153 > 6 43 PRINT_EXPR 44 <39 > 45 <153 > 5 48 <39 > 49 STORE_GLOBAL 2 (encrypt) 52 <153 > 6 55 PRINT_EXPR 56 <39 > 57 <153 > 7 60 <39 > 61 BUILD_SET 4 7 64 <155 > 0 67 DELETE_ATTR 1 (newrotor) 70 STORE_GLOBAL 4 (names 0x0x4) 73 CALL_FUNCTION 1 76 BUILD_SET 5 8 79 STORE_GLOBAL 5 (names 0x0x5) 82 DELETE_ATTR 2 (encrypt) 85 STORE_GLOBAL 0 (rotor) 88 CALL_FUNCTION 1 91 RETURN_VALUE consts None '!@#$%^&*' 'abcdefgh' '<>{}:"' 4 '|' 2 'EOF' names ('rotor' , 'newrotor' , 'encrypt' ) varnames ('data' , 'key_a' , 'key_b' , 'key_c' , 'secret' , 'rot' ) freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name 'encrypt' firstlineno 2 lnotab 00010601060106012e010f01 code argcount 1 nlocals 6 stacksize 3 flags 0043 code length: 92 code 990100680100990200680200990300680300610100990400469905002761 020061010027610300279906004627990500276102009906004627990700 276804009b00006001006104008301006805006105006002006100008301 0053 11 0 <153 > 1 3 BUILD_SET 1 12 6 <153 > 2 9 BUILD_SET 2 13 12 <153 > 3 15 BUILD_SET 3 14 18 STORE_GLOBAL 1 (newrotor) 21 <153 > 4 24 PRINT_EXPR 25 <153 > 5 28 <39 > 29 STORE_GLOBAL 2 (decrypt) 32 STORE_GLOBAL 1 (newrotor) 35 <39 > 36 STORE_GLOBAL 3 (names 0x0x3) 39 <39 > 40 <153 > 6 43 PRINT_EXPR 44 <39 > 45 <153 > 5 48 <39 > 49 STORE_GLOBAL 2 (decrypt) 52 <153 > 6 55 PRINT_EXPR 56 <39 > 57 <153 > 7 60 <39 > 61 BUILD_SET 4 15 64 <155 > 0 67 DELETE_ATTR 1 (newrotor) 70 STORE_GLOBAL 4 (names 0x0x4) 73 CALL_FUNCTION 1 76 BUILD_SET 5 16 79 STORE_GLOBAL 5 (names 0x0x5) 82 DELETE_ATTR 2 (decrypt) 85 STORE_GLOBAL 0 (rotor) 88 CALL_FUNCTION 1 91 RETURN_VALUE consts None '!@#$%^&*' 'abcdefgh' '<>{}:"' 4 '|' 2 'EOF' names ('rotor' , 'newrotor' , 'decrypt' ) varnames ('data' , 'key_a' , 'key_b' , 'key_c' , 'secret' , 'rot' ) freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name 'decrypt' firstlineno 10 lnotab 00010601060106012e010f01 names ('rotor' , 'encrypt' , 'decrypt' ) varnames () freevars () cellvars () filename '/Users/hen/Lab/0CTF/py/crypt.py' name '<module>' firstlineno 1 lnotab 0c010908
结果很迷,特别是150行、151行,应该是load attr、load name然后call function才对。然后我想着要不解析手工试试,python指令格式如: opcode + 参数位置 + 0x00组合,比如load_const 1 表示加载第1个(从0开始)的const变量,load_const的opcode是0x64,所以load_const 1表示0x64 0x01 0x00。opcode对应数字可以在https://github.com/python/cpython/blob/2.7/Include/opcode.h 查看,其他的比如return 只有一个字节0x53表示,然后我按照这个规律分析code。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 990100 LOAD CONST '!@#$%^&*' 680100 store varname keya 990200 LOAD CONST 'abcdefgh' 680200 store varname keyb 990300 LOAD CONST '<>{}:"' 680300 store varname keyc 610100 load varname keya 990400 LOAD CONST 4 46 某个操作需要参数key_a与4 990500 LOAD CONST '|' 27 610200 load varname keyb 610100 load varname keya 27 610300 load varname keyc 27 990600 LOAD CONST 2 46 某个与2 27 990500 LOAD CONST '|' 27 610200 load varname keyb 990600 LOAD CONST 2 46 keyb与2的操作 27 难道是+号? 990700 LOAD CONST 'EOF' 27 680400 store varname secret 9b0000 600100 load attr newrotor 610400 load varname secret 830100 call function newrotor 680500 store varname rot 610500 load varname rot 600200 load attr encrypt 610000 load varname data 830100 call function encrypt 53 return
但是如果按照opcode来翻译上面的代码会出现问题,比如610000应该store_global、990700等等不对应的情况,所以我看了看大佬怎么写的,发现他是自己写一个rotor的脚本,再反编译后与crypt.pyc作对比,我按照他的思路将字节码翻译,特别的99应该是const因为第26行99070取出第8个变量,而只有const静态变量里面有8个元素,故99表示load const。但是里面的0x46、0x27并不知道是怎么操作,比如第9行前,load varname keya,load const 4, 相当于op1(keya, 4),python里面使用栈来传参数,参数顺序按照栈底到栈顶的顺序。因此观察0x46和0x27的部分代码,将其变为伪代码,其中op1表示0x46、op2表示0x27:
1 secret = op2(op1(op1(op2(op1(keya, 4), '|'), op1(op2(op2(keyb, keya), keyc), 2)), '|'), op2(keyb, 2))
其中keya是字符串,而4是整数,那么op1(keya, 4)
的操作应该是*
,返回字符串,而op2(op1(keya, 4), '|')
是两个字符串的操作,应该是+
,故secret可以通过下面的代码得到:
secret = key_a * 4 + '|' + (key_b + key_a + key_c)*2 + '|' + key_b*2 + 'EOF'
然后将这串代码写成脚本来实现对encrypt_flag的解密
1 2 3 4 5 6 7 8 import rotorkey_a = '!@#$%^&*' key_b = 'abcdefgh' key_c = '<>{}:"' secret = key_a * 4 + '|' + (key_b + key_a + key_c)*2 + '|' + key_b*2 + 'EOF' rot = rotor.newrotor(secret) print (rot.decrypt(open ('encrypted_flag' , 'rb' ).read()))
最后得到答案:flag{Gue55_opcode_G@@@me}
参考 python字节码 http://www.bravegnu.org/blog/python-byte-code-hacks.html opcode参考 http://unpyc.sourceforge.net/Opcodes.html