[POST 0/4] catch syscall – Introduction

Hey.

I’ve been a little busy these days, specially because of GDB. I’m becoming more evolved with this project (since it’s also my job), and have already submitted my first patch (the link will open in a new window; it basically contains the “introduction” message for a series of 4 patches) to the developer’s mailing list. Well, and because of that, I decided that it would be good to “advocate” for the patch and for the feature it implements (and try to convince the developer’s that it’s worth accepting it!).

Basically, the patch implements a new feature which is called catch syscall. This is a long-waited feature in GDB, and there was even an attempt (made by Alan Curry in 2007). Now, this feature is pretty much ready to be included and I’m waiting for the reviews/comments this time (that’s the third try I do).

Probably, the explanation I’m gonna give here will be a series of 5 posts including this (just like I did with the patches), so that I’ll explain (in some detail, but probably not that much) how I did the things and why they are important for GDB in general. I also intend to make some comments about needed (at least IMO) modifications in the source-code. Hope that you like it.

So, as this is the first post AND the introduction of this new feature, I’ll begin explaining why would wee need such feature in the debugger. Basically, if you already know the strace utility, you probably already know why this is a cool thing to have :-). Otherwise, let me try to explain.

For most existing operating systems, the kernel exports an interface in order allow communication between it and userspace programs. This interface is popularly known as system calls, or just syscalls. As the name says, they are “special” functions that can be called from userspace. They are implemented directly in the kernel side, so userspace programs can request tasks to the kernel and it will perform them on behalf of those programs. If you ever programmed in your life, it’s almost certain that you used syscalls, even not knowing that you were doing it! (and specially because the first program you almost everyone writes is the “Hello World!”, which uses the write syscall). So, the importance is increasing to you, isn’t it? ;-)

The problem arises specially when you want to know what are the syscalls called by the program you’re debugging. As I said before, they are not always visible to the programmer. For example, the “Hello World” example that you may have written can be coded as:

printf (“Hello World!\n”);

And only after the compilation process (actually not exactly, but anyway) it becomes (along with a bunch of other function calls):

write (…, “Hello World!\n”, …);

So, as you can notice, we don’t always have this information explicitly marked in our programs. But wait, why would I want to do this? Simple: because you’re debugging something, and when you’re doing this it’s useful to have as much information as you can. Of course, this is a “general” explanation and there are better ones, but I won’t spend my time with it. If you can’t realize why this is good to have, just keep in mind that it is :-).

Ok, continuing with the explanation, how can we track this info and know what syscalls are being called? That’s where the catch syscall feature becomes visible :-). It basically takes care of this “dirty work” and can warn you if the program being debugged called (or returned from) a syscall. Oh, and it can also “filter” some syscall so that GDB will only stopped if that syscall is called/returned. There’s no practical limit for filtering the syscalls, so you can ask GDB to keep track of as much as you want (of course if you ask GDB to keep track of too many syscalls it’ll slow down a little).

It’s also worth mentioning that the patch is implemented (currently) only for PowerPC, PowerPC64 and x86 architecture. It shouldn’t take a long time for me to be able to send the patch for x86-64 too, and I think people will eventually want to extend this to other archs as well.

So, that’s it. In the next post, I’ll try to explain how I did the architecture-independent part of the patch. Let’s see if I my explanations are OK ;-).

How to make a patch – the explained way

Hi everybody,

If you ever contributed to a open-source project, probably you have already had to handle patches for files (like preparing them, sending them, and applying them to the code). If this is your case maybe you don’t need to read this post at all (though I’d recommend you to do so anyway). But if that’s not your case and you’re starting to contributing to open-source, or if you just want to learn something new and interesting, keep reading :-).

I’ll try to organize this post in a few topics so you’ll feel more comfortable in reading it. Let’s start.

1) A little introduction

The ‘patch‘ utility is considered to be one of the most useful programs ever written. Together with ‘diff‘, they are the “dynamic duo” for the most majority of the FLOSS projects.

Basically, what it does is simple: you have to feed it with a text pre-formatted (we’ll see how to generate this “pre-formatted text” later) to represent differences between two versions of the same file (say, “YOUR_FILE” and “YOUR_FILE_MODIFIED”). Then, ‘patch‘ modifies “YOUR_FILE” so it can be an exactly copy of “YOUR_FILE_MODIFIED”. These differences are generated by a program called ‘diff‘.

The ‘patch‘ tool was written by Larry Wall (creator of the Perl programming language) in 1985, and the ‘diff‘ tool was written by Douglas Mcllroy, in the 70’s. They are the basis for almost every open-source project since they provide a smart and cheap way to represent modifications in a file.

2) Why patching files?

Patching files, as said above, is the smarter and cheaper way to deal with modifications in a file. Think about a file that has, say, 4 KB in size (can be a C program, doesn’t matter). Now, imagine that you modified just 2 lines in it, totalizing 10 bytes (i.e., these 2 lines are 10 bytes long). If you try to represent this difference without using ‘diff‘ and ‘patch‘, you’ll have (in the worst case) to send your entire 4-KB-file to all your friends/colleagues that are working on this file with you. Now if you suppose you have 100 people working with (a large open-source project), you would have to send almost 400 KB (4 KB x 100 = 400 KB) through your network! Ok, nowadays it may be an irrisory value due to our DSL connections, but imagine doing this 10 times a day…

Now, try to enlight your world and think that you do have ‘diff‘ and ‘patch‘! The ‘diff‘ program would represent this modification using approximately 30 bytes (10 bytes to represent the excluded lines, 10 bytes to represent the added lines, and another 10 bytes for the header). Ok, now doing some math again, you would have to send almost 3 KB (30 bytes x 100 = 3000 bytes), which is way too lower than the previous value! You have a great decreasement of network usage, and the most important, you improved the readibility of your modification! Now you don’t have to tell your friends “hey, take a look at lines X and Y because I modified something there to fix the problem Z”. You just have to say “that’s the modification I’ve done to fix the problem Z”. Got it? :-)

3) How to make your own patch

Finally, now that you have understood what a patch is, let’s try to make our own. First, let’s write something in a file. I’ll assume that you know C, but even if you don’t you can keep reading because I’ll not focus on the code.

sergio@miki /tmp/patch_howto $ cat ourfile.c
#include <stdio.h>

void print_hello (void)
{
        printf ("Hello, World!\n");
}
int main (int argc, char **argv)
{
        print_hello ();
        return 0;
}

Ok, so here we are with our beautiful Hello World. But you’re still not happy with it. You want it to print someting else! Maybe a longer sentence? Maybe you want to remove that comma between the two words! Doesn’t matter. I myself prefer it to say something else, like “Hello, World! This is a post about patching files!” So I’ll do the modification, but as I want to make a patch, I’ll create a “backup” version of the original file so that I can take the differences later.

sergio@miki /tmp/patch_howto $ cp ourfile.c ourfile.c.original

It also doesn’t matter how you choose your backup’s name, but it’s kind of a convention to name it like “file.orig” or “file.original“. Now, we can modify our code and include that other phrase that we want it to say. Did it? Good. Now comes the cool stuff :-). Go one directory down (cd ..), and type:

sergio@miki /tmp $ diff -u patch_howto/ourfile.c.original patch_howto/ourfile.c

(Obviously, you should change the path used in the example above to reflect your directory’s name and file’s name).

You should see something pretty much like:

--- patch_howto/ourfile.c.original      2008-06-27 22:22:25.000000000 -0300
+++ patch_howto/ourfile.c       2008-06-27 22:25:32.000000000 -0300
@@ -2,7 +2,7 @@

 void print_hello (void)
 {
-       printf ("Hello, World!\n");
+       printf ("Hello, World! This is a post about patching files!\n");
 }

 int main (int argc, char **argv)

Wow! My first patch! Yeap :-). Now, save it to a file:

sergio@miki /tmp $ diff -u patch_howto/ourfile.c.original patch_howto/ourfile.c > myfirstpatch.patch

And you’re done :-). Now you have your first patch in your hands. So let’s see what we just did in the line above.

First, you invoked the ‘diff‘ program passing two arguments to it: your original file, and your modified file (be aware that the order of these arguments do matter). You also passed one modifier to it, namely -u. This modifier means that you want ‘diff‘ to generate the differences in a “unified format” (this is the “pre-formatted text” that I mentioned earlier). This unified format allows the ‘patch‘ tool to recognize and apply correctly your modifications, and has the advantage to be more human-readable than other formats that ‘diff‘ can output (as an exercise, try to run this ‘diff‘ command without using the -u and see what happens).

Also, it’s good to notice that it’s a convention to generate your patch outside your code’s directory tree. Some people also prefer to keep two directories (one “original” and another “modified”), and to run ‘diff‘ against those dirs. I won’t tell you to do this or that, because it depends on your needs. But it is certainly a good idea to make some tests and find out what’s your preferred way to work with patching :-).

4) Applying someone’s patch into your code

You know how to generate the patch, but you still don’t know how to apply it? Now problem :-). For this lesson, let’s assume that you still have the old “ourfile.c” code, which prints only “Hello, World!“. As you’re not satisfied with it too, you decide to apply the patch that makes the program print “Hello, World! This is a post about patching files!“. Remember, your patch is named “myfirstpatch.patch”.

So first make sure you have the old version of the code:

sergio@miki /tmp/patch_howto $ rm ourfile.c
sergio@miki /tmp/patch_howto $ mv ourfile.c.original ourfile.c

Now, everything you have to do is:

sergio@miki /tmp/patch_howto $ patch -p1 < ../myfirstpatch.patch
patching file ourfile.c

Wow again! Try opening your file and see what happened. Now the full message should be there as expected. But wait… what did we do this time?

Now, you invoked ‘patch‘ without arguments, but you provided data through its standard input. That’s the right way to do it: redirect the output of you patch to the input of the program. Now, you may notice that we passed one modifier to the tool: the -p1. Do you remember that we generated our patch from one directory under the file’s dir? And do you remember that in the patch’s header there were information about its relative path? That’s why we must provide the -p1 option: because we are telling ‘patch‘ to remove the first directory (left) from this relative path, and start applying the patch from the current directory. As the relative path was patch_howto/FILENAME, it removed the patch_howto part and applied the patch to FILENAME (which, in our case, is ourfile.c).

It’s also good to notice that it’s a very common practice in open-source projects to make/apply patches like that. Always go 1 directory below when making, and always provide the -p1 modifier when applying.

5) Extra modifiers for diff

There are also good modifiers for the ‘diff‘ command. You’ll find something like -pruN. They mean:

-p: Show the C function that has changed in the header of the diff file. It’s useful when dealing with large files and lots of modifications.

-r: Go recursive (used when you have that schema with 2 directories that I told above).

-u: See above :-)

-N: Treat absent files as empty, so if you have new files in your “modified directory” they are included in the diff (and you don’t get error messages).

Well, that’s it for now :-). As usual, take a look at the manpages for both commands; it should help you more than this post!

Bye!

How to personalize a package’s CFLAGS in Gentoo

Ok, last attempt to revive this blog ;-)

Well, there’s a little trick that I’d like to share with you guys that use Gentoo. Some time ago, I had to modify the value of the $CFLAGS environment variable for 2 packages in the Portage because they had some problems when the debugging option was enabled for configure. After some search, I’ve found the solution for this. Here goes the step-by-step (it’s pretty easy, I promise):

1) First, find the package’s path inside the Portage. This path is basically compounded by <category>/<package_name>. Let’s suppose that this path is app-editors/vim, so if your Portage is under /etc/portage, you should have a directory called /usr/portage/app-editors/vim/.

2) Now, you should create a file named /etc/portage/env/<category>/<package_name>, just like that:

#> touch /etc/portage/env/app-editors/vim

3) Edit this file with you preferred editor and put there the $CFLAGS value that you want, like that:

CFLAGS=”-D_HAVE_XYZ_”

If you want to append some value to the existing $CFLAGS, just do it:

CFLAGS=”${CFLAGS} -D_HAVE_XYZ_”

4) You are ready to go! :D

See? It’s pretty easy. Note that you can also add other variables (like $LDFLAGS), and emerge will handle them for you. If you have some trouble with the procedure, just let me know :-).

See ya!

I’m still here

I know it may sound like an apologize, but I really didn’t have time to update this blog :-( … Yeah, yeah, you must be thinking “when this guy will stop talking sh*t and post something useful here?”, but I promise that for now on I will take care of this space ;-) … I’m sorry for the inconvenience :P

My brand new blog ;-)

Hello everybody!

This is my brand new blog, about technology, computers and stuffs like that! I have another blog running (http://elendur.com/blog), but it’s about wild thoughts that I have (personal stuff, you know), and it’s in Portuguese. Feel free to join there!
Well, that’s it! I hope you enjoy it, and stay tuned for the first useful post!  See ya!

Follow

Get every new post delivered to your Inbox.