>> Good afternoon, Def Con. How is it going?
So this is generating ROP payloads from numbers. So please join me in welcoming Alexandre Moneger.
>> Hello everyone. Thanks a lot for attending. So, basically, today I'll be talking about return programming and how to generate pay loads using numbers. A little bit about me first. I work for Cisco systems in the UK. I am a Security Engineer in the security business. Which has not got nothing to do with what I am going to talk about today.
Personally I'm interested about bits and bytes and all that stuff. And the usual disclaimer. The research was done on my own time and they do not reflect the opinions of our employer and are mine.
You can't hear me? Sorry. All right. Is that better?
So, basically, I'll go through a brief rot overview. What is it? Why do we use it and why has it kind of become essential today. The automation basically of the ROP payload generation, why do we want to do that or not and what are the challenges with it? Then I'll go into a little bit of sort of the core of the subject which I call number stitching. Which is basically a possibly new way to generate ROP payloads from numbers instead of from strings. So I'll go through, you know, the goal what was I trying to achieve. You know, why I did it. How I found the gadgets. And basically the core of the problem which is the change problem which I'll get back to later. Then I'll have a quick chat about, you know, the pros, the cons, the tooling that I'm releasing, and any future work.
Okay so I'll have a very brief introduction about ROP just so that I'm sure that everybody understands what it is. So, basically, you know, before in the old days you could just attack, you know, exploit a vulnerability using, you know, just a bunch of shellcodes and you could just execute it anywhere. Unfortunately, or fortunate, nowadays you can't do this anymore. Right. You can't execute shellcode and memory just like that. So, you know, historically we got to a point where we got to return our programming. Which is basically instead of executing stuff through shellcodes use the target binary to execute code for you. So this means that you are not executing code yourself. You are asking your target basically to do it for you. All right. So, basically, I'll just come comment a bit on what I wanted to achieve also.
So, basically, what I wanted to achieve was to only use gadgets from the (inaudible). So I was I saying when you look for gadgets and instructions you are going to get your program to use them. So what I wanted to do is take it a bit further and extracts it more and basically be able to use the structures generating by the parlor while it (inaudible) to attack the vulnerability. So I'll try to generate the payloads putting them in memory.
Coming back to my overview, so we know that we want to basically reuse instructions from the target binary in order to bypass ASLR and XOW. Why we use ROP is we don't have a choice. The memory protections have gotten much better. We can't do without it. So we use it. The level of complexity is much higher. If we could avoid it, it would be nice. But we can't.
It maintains control using the stack pointer and not the instruction pointer. I won't go into details but basically this is how you maintain control over the flow of the program. And ROP is multi‑staged more or less by design. It's that you are going to have a first stage which is going to be using the instructions from your target binary to generate a payload for you. So that payload, you are going to put it on a custom stack you are going to create within the program. So your idea is, okay, let's take a tiny piece of the program and put some instructions there which I'm going to execute later. So that's your stage zero. First phase basically which is grab some stuff, put it in the memory and when you finish doing that, change the control flow and execute that. As I said, it's more or less the only way around today's OS protections. You know, if you look at all the IE exploits and all that stuff which happened in 2013 they all used ROP to evade all those memory protections.
So just a bit of vocabulary. What is called a gadget basically is just a useful instruction inside the target binary. So it's going be a very small piece of assembly with a return instruction at the end. So that is what you are going to call to execute it. How do you find them basically is that you go over your binary, you look for every return instruction which is ZXC three and assemble backwards from that and that will give you what you can work with. To do that there are many good tools available ROP gadget which just go over binary and give you what gadgets you use to attack it. Obviously the number of gadgets you can find in a binary is dependent on the size of the binary. If it's huge you got a lot of chance of finding a lot of gadgets. If it's small you've got much less and it becomes much harder.
So once you have done your stage zero, you know, you've built your fake frame, your fake stack frame basically in memory, you want to transfer control to that payload. Because that's what you want to execute. You want to get a shell or whatever you want. So shifting control from the current control flow into your stack frame is called stack pivoting, basically. And it's a way of re‑directing control flow so that you basically get then your code to run once you have built it.
So I'm going to talk about automating the payload generation which is basically how from a program can I take bits of memory and put them, create that fake stack which I will later execute. So this is the classical approach of what happens. Is that you find a bunch of bytes in memory. So the bytes you are looking for obviously are shellcode bytes. So you got a shellcode you are going to start from or something you want to achieve a bunch of functions you want to call and you are going to look for those bytes in memory and copy them to a stack you control. To achieve that you are going to look for some particular gadgets which is a move gadget. So you are going to try to move something from a resistor you control to a zone and memory you control. Or you can use a bunch of functions if they are exported by the target binary which is not always the case.
So you've got a bunch of potential problems in this. It's that you're counting basically on the availability of a new gadget. In real world you have generally got it. So in big programs it's not a program. It can require some referencing which I won't speak too much about but that's if you want to get basically your ‑‑ call up a particular function and also you are expecting some bytes to be available in memory. Typical in shellcode you look for your interrupt number and you are expecting that number to be in memory because otherwise you not going to be able to build your shellcode. If that number is not there it generally requires manual work to get the missing byte. If we look at a shellcode here which I took, basically you can see if I look for, you know, the byte 7368, you know, I can find them in the program so that's nice. I can find those two bytes which, you know, are part of my shellcode and I'm going to be able to use them. If I look for something different, basically, for example, 682 F I don't find that in the program right. So that means I am going to have to do some manual work, manipulating some bytes to get that. So back to the gadget. Very small binary do not tend to have many mood gadgets. So it mean that's the automation of the pellet is complicated. And in the case of this particular gadget I'm showing, you've got a problem sometimes with nail bytes and basically it also needs extra work to get rid of it.
Okay so here I'm going to introduce number stitching which is basically I was asking myself a question, is it possible to exploit a halo world type vulnerability with most memory protections enabled? I went stack canaries and all that. Basically, you know, a gross programming mistake. You copy, you know, a user supplied argument with string copy and you get a (inaudible) or something like that. Is it exploitable with everything enabled? Also, I was wondering can I exploit this independently of what the program does. Using only the come parlor or (inaudible). So, in other words, it's like is it possible to not rely at all on the target binary to generate a ROP payload. So I'll go through it, like, how a program is built and, you know, how it's linked and what happens. And we'll see what we get.
So, basically, all the following of this is going to be based on the gadgets I have been able to find here. Basically all the other stuff was done due to the gadgets which were made available to me. So, if you just take a hello world and compile it so what happens. Right. If you look at what your program actually contains you see there's a whole bunch of other f unctions which you did not intend to be there. You got the start and (inaudible) and all that stuff. So where does that come from and can you use that?
So if you look basically at link time at what happens, the linker will call the SO. So you think it is a dynamic library but it's actually not. It is a script. And basically that script has got a static library in it. So meaning that when you comply a program against lip C you are going to have most of your functions dynamically linked but also have a subset of those which are statically linked into your ‑‑ statically linked into your binary. So, if you look at that, there's quite a few functions in there which you could possibly use, right. So all this basically is if you use it will be statically linked into your program. Meaning that, you know, those gadgets will be at fixed addresses.
So the problem I had is that those functions aren't always used, they depend on various options and linking options. I just look for gadget rhythm and I couldn't find anything. So I think it's good to say what you weren't able to achieve. So, basically, looking for gadgets in that failed. So I went back to my binary and said is there anything else which is added which I could use? And so, basically, I looked at the binary and there's a bunch of functions which I added by GCC in some cases. So they've got no symbols attached to them. So it's kind of hard to figure out what they do. But there's just a bunch of anonymous functions which I inserted into your program. If you look at the disassembly of your program, you will see there's some functions where there is no symbols associated and they're just there basically. Those functions seem to relate to profiling. And so that is why they're there, basically. So what was surprising to me is that profiling seems to be enabled by default on some distributions. And so this stuff is actually statically linked into your program and all these little tiny pieces of assembly are put into your program without you knowing about it. So to check what the default options are for GCC you can look at GCC minus (inaudible) will dump basically the list of come parlor options done. You might find stuff in this. So, basically, this work was done for GCC 4.5 which is pretty old. Basically I look for gadgets inside the code that was embedded by GCC into the target binary. Based on that, I kind of disassembled those functions and I end up with this stuff to work with them so this basically is only generated by the com parlor. It's not code which is generated by the target application. So you can see that I've got a bunch of stuff that I get to work with. So the first one allows me to control EBX. Put a random value in there and control that. The other one allows me to pivot the stack which I've achieved control and I want to execute my shock code at the end. The two others which are the meats of the torque is that there is a right to memory. So, basically, through control of EBX I can right the value which is EAX. Meaning that the value I can dump it somewhere in memory. And I've also got the other way around. A right to the (inaudible) meaning that a value my control in EBX I can grab a value from memory and load that.
That means in short, you know, an attacker you control EBX and that's about it, right. So you have to find a way to achieve control of EAX also. Okay so this ‑‑ so, basically, the further part of the talk will just focus on those two instructions, those two gadgets. And basically how I used them in order to achieve execution.
So, basically, we've got a useful gadget here which is add memory to a (inaudible). I removed the trailing stuff and all that because it's not necessary. So you control EBX. So it means you can grab a value memory and load it. Is that useful? Well you don't control what's in memory necessarily. So, you know, it's ‑‑ at first I didn't think it could be potentially useful but actually it is. I'll come back to this later.
So now the reverse. Basically once I've got a value I'm interested in the resistor I can dump it into memory right. This allows me to create my fake stack break. So assuming that I control EAX I can build a fake stack frame, you know, at the addresses pointed to by EBX. So that means that by chaining these cores I can, you know, copy my shellcodes into a frame I control and then, you know, trigger execution of it. So, basically, this is the more general approach to ROP is that you are going to choose a spot in memory where you can build your stack. So, you know, the memory properties you want is that obviously you can write to it. With today's memory protection you can either execute or you can write. So you are going to choose a zone where you can write. You should look at the previous instruction it was an add instruction. So I have to find what is called kind of a code cave which is a zone where ‑‑ which is just padded with bytes. So just a bunch of zeroes. When I add it I just add to zero. So, you know, it avoids further complication.
And then you choose the shellcode you want. To just pick a shellcode, any one you want, set the ID for whatever you want and basically just hope that you are going to ‑‑ to copy that shellcode to a frame you control and then execute it. So now the unusual approach I'd say to do this is that I chose to deal ‑‑ since I had to deal with arithmetic operations much as add I chose to actually see the shellcode not as a string but as a number. So, basically, if I cut it into small pieces of four bytes on 32 bits and eight bits on 64 I just cut it in four chunk and basically interpret each chunk. That just says basically take a string and it is actually a number. So, if you keep track of each of the index of each chunk you know basically which index your number is and what you want to do is since you are going to always be adding you want them to be ordered because you always want to go from smaller to bigger. So, basically, you just order them. You take your shellcodes, cut into little bits as numbers, and then basically you keep track of their position and you reorder them from smallest to biggest, right. And then once you have done that you compute the difference between each chunk. So that will give you basically Deltas between your pieces of shellcodes and it will keep your shellcodes basically (inaudible) increasing. Just meaning each time you add a value to your resistor it's going to always go up.
So what this looks like if I take this example. At the top you've got a shellcode. Just a string of text. If you take it as a number and reorder it, you get to line two basically. So, if you see the end of the shellcode has the shift back to the front and I've ordered all that together. All right so they're ordered in increasing number. If you take the difference between each basically you get small Deltas. So you see basically at the last line that position three of your initial shellcode is now position one. Position two is actually two minus three and it's at position two and et cetera.
So your shellcode is now represented as increasing Deltas. So, if you add ‑‑ if you take your initial value and add the Delta you find back the value of your shellcode. If you do that again you find the next value of your shellcode. Do it again and the next value. Once you have reach Thad value you want to dump it to your fake stack but you want to dump it in a way that you remember at which position it was initially right. Otherwise you are going to have the wrong spot. Once you do that you just repeat, repeat, repeat then eventually you've copied what you want into a zone of memory. So as an example say you want to copy the number 010234 into memory. So you find that number in memory. So for the timing I am just assuming it's there. I'll speak later what happens if it's not. You will see the number there is in memory and you know the address of that and basically you copy that into EAX. So you have achieved loading a value into a register for memory right. So that's kind of easy. Now you've got the value you want of your shellcode so you want to dump it on to your fake stack frame. If you look at that, that guy was actually at position three. So you actually drop it on the stack at the right position. And now you are going to look for the next number which is, you know, the Delta which is in this case 040404. So you assume that also is in memory. You take it and sum it and get back to the same value and you dump it. And you do that over and over again. And in the end basically you have achieved building a shellcode from a bunch of numbers that you found out in memory. So now it comes to the problem of how easy is it to find shellcode numbers in memories. Because basically if you take your shellcode and look at it as a bunch of numbers, it's actually pretty high numbers, right. It's going to be ‑‑ if you look at the example I put there 683199 if you take string and, you know, (inaudible) it actually ends up at 668766 which is a high or negative number. But we're relying on the fact that will be in memory and we can copy it. What happens if it's not? So here basically the next part is just, all right, I've got that value, well I'm hoping for that value to be memory, but it's not, what can I do and how can I build it. So, basically, the answer is that it's not very easy to find big numbers in memory. They're not ‑‑ so I had to look at a bunch of programs and I saw that they're basically not really there. So here I put a small example of looking for, you know, 01020304 in GDB and it just doesn't yield anything. If you look at multiple programs and search for those numbers, they're not really there. So I have to find basically a different approach. So the approach was ‑‑ my idea was that if I can basically scan memory for numbers and if I can find a way to add them together to end up the chunk value then I'm fine, right. So by definition memory, you know, has got a crazy number of numbers in there, right. If you look at individual numbers, it's got heaps in there. The approach is basically take a file, take its read only segment and basically just scan it for numbers, right. So just, you know, look at the beginning of it, see it as a number, shift it by one byte, see it as number and shift it by one byte and see it as another number. So you scan all that segment just looking for a bunch of numbers, right. So where I decided to look for was basically in the read only segment. Because obviously you don't want those numbers to change at run time. Here they come.
So, basically, I looked inside the segments which were read only. So that basically I knew that at run time those numbers would not change at all. And exclude all the read/write segments basically which we're going change. Basically if you don't have position in codes ‑‑ thank you very much.
>> Show some love for our first time speaker at Def Con!
>> Thank you very much.
>> Thank you very much.
>> What do you guys think of his graphics?
>> I'm sorry that was just (inaudible).
>> That's all right. So, basically, the read only segment to find your numbers. So what you do is scan that read only segment look for the numbers, you know, shifting by one byte each time and keep track of their addresses, right. And so what you end up is with a whole bunch of numbers and the problem seems pretty simple is that how to add up these random bunch of numbers to find the chunk I want, right. So all you are looking for is finding the best combination of numbers which add up to a chunk. So this problem seems pretty simple because, you know, for human we do it all the time. But for a computer it's pretty crap. It's basically called the corn change problem. It took me awhile to figure out it was what I was looking for. The example is you buy an item at 425 euros and you get a five euro note. And what's the most efficient way to return change on that? You know, that's something we know how to do pretty easy. You know, in Europe we'd give back a $0.50 coin, $0.20 coin and $0.05 coin. To basically that's the coin change problem is what's the most efficient way of giving money back basically. So, basically, if you look at this problem in dollars the answer is different right. You are looking for same thing. $0.75. So, basically, in the U.S. you would give half a dollar and a quarter. To you see that here the solution basically depends on your coin set. And so it depends on the numbers you found in memory. So solving the problem for human is pretty simple. Solving the problem for a computer is a bit more difficult. So, basically, you can achieve an ideal solution to the problem by using dynamic programming which basically will give you the most efficient solution. So, you know, maybe some of you guys have done that in high school, you know, it's a pretty simple problem to solve when you've got small numbers. The problem is since I am dealing with pretty big numbers is that dynamic programming doesn't scale and it just kind of blows my memory. So I can't get to scale it for massive numbers yet. So I have to use a different approach. So I used an approach called the greedy approach which is just slightly different which basically achieves the same thing but won't give you the optimal solution. It will give you a solution. So how it works is that it's just simple. Just like you take the biggest coin which fits in the interval and then just add and add and add. One thing we're lucky with in memory you've got many, many small numbers. Number 1, two, three, four, five. All that is like you've got bits of them in there. So, basically, you always can achieve the greedy approach which is kind of nice. But it's sub optimal. So kind of back to the $0.75 problem looking, you know, with ‑‑ so, you know, trying to return $0.75 using the greedy approach basically you just go down the list of the coins and take the biggest one which fits. In this case 50, 20 and $0.05. So, basically, all my tool does is that, you know, you give it a number in memory, it will scan all memory and it will try to find you the best solution for this. So, basically, I wrote a tool called Ropnum which is basically just trying to find a solution to the coin change problem. So, basically, you give it a number, you give it ‑‑ and it will basically throw out all the addresses where those numbers added together end up with a chunk, right. So, if you look at this back into the context of what I was talking before, here you are basically looking to add up a bunch of numbers in memory to reach a chunk of the shellcode right. So this is what Ropnum will do is give you basically that. It's got a bunch of extra features. Ignore null bytes, exclude numbers. It can print all memory pointing to a number and zero. Et cetera. All these features are not critical they just make the export that easier because you can exclude addresses which are not in your range. So an example usage is basically find me the address of numbers in the segment containing the text action, so read only section, which added together solve the contract problem. Here if you look it will spit out at the lower part it spits out five numbers basically. And if you add those together, you get your target number which is your shellcode chunk. So, if we look back at this basically ‑‑ it will give you the individual parts of the number which summed together gets your shellcode. Here I just showed if I re‑add all the values together I get my initial value. Now coming back to my gadget, it's that here by putting the value I want in EBX and by adding repetitively I can add get to the value of my (inaudible). So, if I put all this together, so, basically, so you take your shellcode, you cut it, order it and take the Deltas. You look for numbers in memory. And basically you add them together until they reach the value of a chunk. Here it means you have achieved the value of the shellcode. Once that chunk is reached you basically just dump it on to a stack frame you control. And you repeat until the shellcode is complete. And in the end you just transfer control over to your shellcode. So I wrote a tool also which automates all this process basically where you just give it a shellcode and it will spit out some Python code to generate your export. So you give it a shellcode, anything you want, you give it a frame address, so where you want to copy your data too which will generally be the data section. It takes care of all the boring details, right, which make your life hell. It will spit out a bunch of Python codes which generate the payload. So if you are familiar with rock gadget and all that stuff, it does something pretty similar to that. Now what it does also is that, you know, for the timing we've copied our payload to a fake stack we thought but that stack is read/write. So we'll add a small and protect read write stub frame before the payload. So what it does is basically it will allow your ‑‑ to change the memory permissions at run time so that your shellcode can run. And so it's got a bunch of additional functions like it can start with, you know, an arbitrary EX value and look up numbers in sections and segments depending on what you want to do and all that kind of stuff. So a bit more on why you need and protect stub. So you have copied your shellcode into a zonal memory which is RW. So, if you return execution to that it's not going be able to execute right. You are just going to get a fault from your processor because the memory permissions are wrong. So you need to make that page kind of read write execute at run time. This is where it clicks in. It's just standard function which allows you to change memory permissions at one time.
So, basically, I just added small stub in the front of that and once that has executed it will jump back into your shellcode. So an example usage is basically, you know, to copy a shellcode to a fake located in the data section putting it in protect frame and looking up segments in the RO segment in a binary you end up basically with this. And so all this basically will spit you out a bunch of Python codes which will just once run will basically generate your shellcode. So I didn't put the output on the slides because it's pretty verbose and just a bunch of codes which isn't very interesting. I put it on the CD though and I've got an example on gighub if you want to have a look at it. So, if we have a look at GDB and what's happening inside GDB basically you will see that I put the values in yellow. The values which have basically been copied. If you look at the values the first ten X seven is read, write execute. So, basically, that one is written, you know, at its proper index. The next one is 0X08 which is the number and you will see that guy is written completely at the end. This kind of shows, you know, the tracking of the index and, you know, where stuff has to go to rebuild your shellcode. So then I write 0X1000 which is the next biggest number on the shellcode and it will write it back, you know, at a different index, et cetera, et cetera. So, if you continue that, you know, execution ten times just to see it go faster, you will see that there's a bunch of missing zones in orange basically which will be filled in later. So it's just ‑‑ this is just to show the shellcode is out of order. So the end shellcode is basically the shellcode is complete in memory with the RWE and protect stub just before it. So there's a bunch of pros and cons to this technique. I don't know, I was wondering about this and it's got a real use case scenario. It's just something I wondered with and I found it nice but I don't know if it's useful in real life. It's got a bunch of pros that it can encode any shellcode. All you are dealing with is addresses right. You never actually look for value. You never pop values in. Just addresses. The lower two bytes of your address are controls basically. So you can exclude those values if you don't want them. Obviously the initial goal is that you are not affected by X and R or X and W. So, basically, it will allow you to bypass all those memory protection.
The cons basically is that the payloads are fairly large right because you're generating ‑‑ you are adding, you know, multiple values, right, to each value of a chunk. So up to five values together. So that requires quite a bit of iterations to get there. What that means is those iterations transform into a code link basically so your payload grows. So your stage zero can get pretty big. So that's a pretty big con.
So the further usage to this is that let's imagine that, you know, here we were looking at EX. I am still looking at the specific case of GCC4.4.5. Is that I achieve control of EAX through this technique. Basically one thing I had is that sometimes EAX, the initial value you've got in there changes depending on, you know, your user input data. So one nice trick that is if you control EAX is that you can more or less get a value you want in to there through doing just a random function call in the PLT. And since, you know, in the Linux AVI EAX holds the return value of a function call. So if you do random function call and make it fail or succeed or whatever you can kind of control the initial value you've got in there.
So now I tried to take an approach to shrink the size of stage zero. So one approach was basically that instead of loading the whole shellcode in memory what I could do is create some kind of gadget table at the location I wanted. Instead of take the whole thing, you know, the whole shellcode which is, you know, a big number and just ditching it there what I can do is instead of that load of bunch of additional gadgets and use that was a table to continue my exploit. For example say you've got your exploit and you would like to have those gadgets I put in there. The up codes I put on the right. Basically your shellcode becomes 59 Cf,3 , blah, blah, blah. So, if you load that you use number stitching to generate that and put that in memory, then you can actually call them directly from, you know, a stage zero payload from your payload. So that way you can kind of insert additional gadgets within the target's memory space. Which is kind of practical. And so yes since, you know, the tool can change the memory positions it will work. Then you just call the gadgets from your table or it doesn't have to be necessarily gadgets. It can just be bytes right. You can just add a bunch of bytes and it increases the availability of your bytes. And so you have kind of the ability to load any gadget or byte in memory just using a bunch of memory address. I haven't automated that part in the tool yet it's just kind of, you know, on the side.
So future work. So, basically, I'd like to, you know, continue having a look on numbers and basically if they're available to memory and, you know, not subject to ASLR. So check a bunch of binary and see if I can figure out if anything comes up. Realistically shouldn't be but you never know. Maybe something will come up. And basically search for gadgets in new versions of libc/gcc. So it seems difficult but, you know, I might find a bunch of gadgets in there which could be exploitable and allow to have that separation basically of exploiting the program through it's kind of dependencies in a way.
On the tooling side basically I'd like to get dynamic programming approach to work with large numbers. I don't think it will be completely possible but I'll have a look into it. At 64‑bit support, I mean, here regarding the methodology it's pretty simple right. The numbers are just bigger so it doesn't change anything. The problem will be with the improtect frame which might be harder to introduce. I think basically that there would be advantage in introducing a mixed approach. So, you know, possibly integrating this and other tools which are already doing their job really well. So, you know, rock gadget can generate payloads once it's got everything from a string. But maybe try to automate finding the missing bytes basically using this technique which means that you would have a proper, a real automatic payload generation. You know, that's obviously interesting. Basically I'd like to introduce in the tool basically the concept of, you know, gadget tables and, you know, introducing random bytes at random addresses.
So just, you know, to summarize what I wanted to do was basically, you know, achieve exploitation without using gadgets from the target binary but from, you know, stuff introduced by the environment. That took me down the track, you know, of giving you those two particular gadgets that took me down the track of, you know, having to kind of scan memory for numbers and find ways to add them together to achieve what I wanted.
So if you want to have a chat about it or if you've got any ideas or think this is useful or not just shoot me an email. If you want to try out the tool, it's available gighub. So give it a shot. Thank you very much for your time and for listening to this and have a good Def Con. Cheers.