If you ever contributed to a open-source project, probably you have already had to handle patches for files (like preparing them, sending them, and applying them to the code). If this is your case maybe you don’t need to read this post at all (though I’d recommend you to do so anyway). But if that’s not your case and you’re starting to contributing to open-source, or if you just want to learn something new and interesting, keep reading :-).
I’ll try to organize this post in a few topics so you’ll feel more comfortable in reading it. Let’s start.
1) A little introduction
The ‘patch‘ utility is considered to be one of the most useful programs ever written. Together with ‘diff‘, they are the “dynamic duo” for the most majority of the FLOSS projects.
Basically, what it does is simple: you have to feed it with a text pre-formatted (we’ll see how to generate this “pre-formatted text” later) to represent differences between two versions of the same file (say, “YOUR_FILE” and “YOUR_FILE_MODIFIED”). Then, ‘patch‘ modifies “YOUR_FILE” so it can be an exactly copy of “YOUR_FILE_MODIFIED”. These differences are generated by a program called ‘diff‘.
The ‘patch‘ tool was written by Larry Wall (creator of the Perl programming language) in 1985, and the ‘diff‘ tool was written by Douglas Mcllroy, in the 70′s. They are the basis for almost every open-source project since they provide a smart and cheap way to represent modifications in a file.
2) Why patching files?
Patching files, as said above, is the smarter and cheaper way to deal with modifications in a file. Think about a file that has, say, 4 KB in size (can be a C program, doesn’t matter). Now, imagine that you modified just 2 lines in it, totalizing 10 bytes (i.e., these 2 lines are 10 bytes long). If you try to represent this difference without using ‘diff‘ and ‘patch‘, you’ll have (in the worst case) to send your entire 4-KB-file to all your friends/colleagues that are working on this file with you. Now if you suppose you have 100 people working with (a large open-source project), you would have to send almost 400 KB (4 KB x 100 = 400 KB) through your network! Ok, nowadays it may be an irrisory value due to our DSL connections, but imagine doing this 10 times a day…
Now, try to enlight your world and think that you do have ‘diff‘ and ‘patch‘! The ‘diff‘ program would represent this modification using approximately 30 bytes (10 bytes to represent the excluded lines, 10 bytes to represent the added lines, and another 10 bytes for the header). Ok, now doing some math again, you would have to send almost 3 KB (30 bytes x 100 = 3000 bytes), which is way too lower than the previous value! You have a great decreasement of network usage, and the most important, you improved the readibility of your modification! Now you don’t have to tell your friends “hey, take a look at lines X and Y because I modified something there to fix the problem Z”. You just have to say “that’s the modification I’ve done to fix the problem Z”. Got it? :-)
3) How to make your own patch
Finally, now that you have understood what a patch is, let’s try to make our own. First, let’s write something in a file. I’ll assume that you know C, but even if you don’t you can keep reading because I’ll not focus on the code.
sergio@miki /tmp/patch_howto $ cat ourfile.c
void print_hello (void)
printf ("Hello, World!\n");
int main (int argc, char **argv)
Ok, so here we are with our beautiful Hello World. But you’re still not happy with it. You want it to print someting else! Maybe a longer sentence? Maybe you want to remove that comma between the two words! Doesn’t matter. I myself prefer it to say something else, like “Hello, World! This is a post about patching files!” So I’ll do the modification, but as I want to make a patch, I’ll create a “backup” version of the original file so that I can take the differences later.
sergio@miki /tmp/patch_howto $ cp ourfile.c ourfile.c.original
It also doesn’t matter how you choose your backup’s name, but it’s kind of a convention to name it like “file.orig” or “file.original“. Now, we can modify our code and include that other phrase that we want it to say. Did it? Good. Now comes the cool stuff :-). Go one directory down (cd ..), and type:
sergio@miki /tmp $ diff -u patch_howto/ourfile.c.original patch_howto/ourfile.c
(Obviously, you should change the path used in the example above to reflect your directory’s name and file’s name).
You should see something pretty much like:
--- patch_howto/ourfile.c.original 2008-06-27 22:22:25.000000000 -0300
+++ patch_howto/ourfile.c 2008-06-27 22:25:32.000000000 -0300
@@ -2,7 +2,7 @@
void print_hello (void)
- printf ("Hello, World!\n");
+ printf ("Hello, World! This is a post about patching files!\n");
int main (int argc, char **argv)
Wow! My first patch! Yeap :-). Now, save it to a file:
sergio@miki /tmp $ diff -u patch_howto/ourfile.c.original patch_howto/ourfile.c > myfirstpatch.patch
And you’re done :-). Now you have your first patch in your hands. So let’s see what we just did in the line above.
First, you invoked the ‘diff‘ program passing two arguments to it: your original file, and your modified file (be aware that the order of these arguments do matter). You also passed one modifier to it, namely -u. This modifier means that you want ‘diff‘ to generate the differences in a “unified format” (this is the “pre-formatted text” that I mentioned earlier). This unified format allows the ‘patch‘ tool to recognize and apply correctly your modifications, and has the advantage to be more human-readable than other formats that ‘diff‘ can output (as an exercise, try to run this ‘diff‘ command without using the -u and see what happens).
Also, it’s good to notice that it’s a convention to generate your patch outside your code’s directory tree. Some people also prefer to keep two directories (one “original” and another “modified”), and to run ‘diff‘ against those dirs. I won’t tell you to do this or that, because it depends on your needs. But it is certainly a good idea to make some tests and find out what’s your preferred way to work with patching :-).
4) Applying someone’s patch into your code
You know how to generate the patch, but you still don’t know how to apply it? Now problem :-). For this lesson, let’s assume that you still have the old “ourfile.c” code, which prints only “Hello, World!“. As you’re not satisfied with it too, you decide to apply the patch that makes the program print “Hello, World! This is a post about patching files!“. Remember, your patch is named “myfirstpatch.patch”.
So first make sure you have the old version of the code:
sergio@miki /tmp/patch_howto $ rm ourfile.c
sergio@miki /tmp/patch_howto $ mv ourfile.c.original ourfile.c
Now, everything you have to do is:
sergio@miki /tmp/patch_howto $ patch -p1 < ../myfirstpatch.patch
patching file ourfile.c
Wow again! Try opening your file and see what happened. Now the full message should be there as expected. But wait… what did we do this time?
Now, you invoked ‘patch‘ without arguments, but you provided data through its standard input. That’s the right way to do it: redirect the output of you patch to the input of the program. Now, you may notice that we passed one modifier to the tool: the -p1. Do you remember that we generated our patch from one directory under the file’s dir? And do you remember that in the patch’s header there were information about its relative path? That’s why we must provide the -p1 option: because we are telling ‘patch‘ to remove the first directory (left) from this relative path, and start applying the patch from the current directory. As the relative path was patch_howto/FILENAME, it removed the patch_howto part and applied the patch to FILENAME (which, in our case, is ourfile.c).
It’s also good to notice that it’s a very common practice in open-source projects to make/apply patches like that. Always go 1 directory below when making, and always provide the -p1 modifier when applying.
5) Extra modifiers for diff
There are also good modifiers for the ‘diff‘ command. You’ll find something like -pruN. They mean:
-p: Show the C function that has changed in the header of the diff file. It’s useful when dealing with large files and lots of modifications.
-r: Go recursive (used when you have that schema with 2 directories that I told above).
-u: See above :-)
-N: Treat absent files as empty, so if you have new files in your “modified directory” they are included in the diff (and you don’t get error messages).
Well, that’s it for now :-). As usual, take a look at the manpages for both commands; it should help you more than this post!