Using bash to find a file modified in a specific date/time interval?

Did you ever need to find a file stored somewhere in a bunch of folders, which you didn’t remember the name, but you remember about when you last modified the file? At work, it sometimes happens that I need to find some log file, among 20 Gigs of log files, which was modified on a certain day. Here is a simple bash trick to do just that:

 touch -d "13 october 2006 15:00:00" ~/date_start
 touch -d "14 october 2006 21:00:00" ~/date_end
 find some/path -newer ~/date_start -and -not -newer ~/date_end

Vim Tip: Restore cursor’s last position

Here’s a useful vi trick that turned out to be a real time saver. It’s a set of commands you add to your vimrc file (on windows) that will recover the cursor’s last position when the file was last closed. The original tip was found on the vim wikia, which is full of those very useful vim tricks. Give it a try!

" Tell vim to remember certain things when we exit
"  '10 : marks will be remembered for up to 10 previously edited files
"  "100 : will save up to 100 lines for each register
"  :20 : up to 20 lines of command-line history will be remembered
"  % : saves and restores the buffer list
"  n... : where to save the viminfo files
set viminfo='10,\"100,:20,%,n~/.viminfo
 
" when we reload, tell vim to restore the cursor to the saved position
augroup JumpCursorOnEdit
 au!
 autocmd BufReadPost *
 \ if expand("<afile>:p:h") !=? $TEMP |
 \ if line("'\"") > 1 && line("'\"") <= line("$") |
 \ let JumpCursorOnEdit_foo = line("'\"") |
 \ let b:doopenfold = 1 |
 \ if (foldlevel(JumpCursorOnEdit_foo) > foldlevel(JumpCursorOnEdit_foo - 1)) |
 \ let JumpCursorOnEdit_foo = JumpCursorOnEdit_foo - 1 |
 \ let b:doopenfold = 2 |
 \ endif |
 \ exe JumpCursorOnEdit_foo |
 \ endif |
 \ endif
 " Need to postpone using "zv" until after reading the modelines.
 autocmd BufWinEnter *
 \ if exists("b:doopenfold") |
 \ exe "normal zv" |
 \ if(b:doopenfold > 1) |
 \ exe "+".1 |
 \ endif |
 \ unlet b:doopenfold |
 \ endif
augroup END

Fixing ASCII/Text to Binary mode in CVS

I’ve often found myself in the situation where I need to quickly fix the file mode from ASCII to Binary in CVS. Here are quick instructions:

cvs admin -kb BinaryFile.xls
cvs update -A BinaryFile.xls
cvs commit -m "make it binary" BinaryFile.xls

Useful keyboard shortcuts for Bash

I’ve been playing around with bash for the past two years now, and found some of these keyboard shortcuts to be very useful. I’ve tried to put the ones I use the most often on here. For other cool bash shortcuts and tricks, visit

  • CTRL + R: Reverse search through history for a previously used command;
  • CTRL + A: Moves your cursor to the beginning of the current line;
  • CTRL + E: Moves your cursor to the end of the current line;
  • ALT + F: Moves your cursor forward one word on the current line;
  • ALT + B: Moves your cursor backward one word on the current line;
  • CTRL + W: Deletes the word before the cursor;
  • ESC + D: Deletes the word after the cursor;
  • CTRL + U: Clears the line before the cursor position. If you are at the end of the line, clears the entire line;
  • CTRL + K: Clears the line after the cursor;
  • CTRL + C: Kills the current running process;
  • CTRL + Z: Suspends the current job and runs it as a background process;
  • TAB: Auto-complete files and folder names;

For the sake of completeness, I’ve added these other shortcuts that I saw on another website. However, I find that I do not use those on a regular basis. In reality, there are plenty more bash keyboard shortcuts around, but I just never bothered trying them out yet… For a much more extensive list of shortcuts (way too extensive for my tastes), visit web hosting uk‘s blog post on the subject.

  • CTRL + L: Clears the Screen (I use the “clear” command);
  • CTRL + H: Same as backspace;
  • CTRL + D: Exit the current shell (I type “exit” in Putty);
  • CTRL + T: Swaps the last two characters before the cursor;
  • ESC + T: Swaps the last two words before the cursor;
  • CTRL + XX: Moves between end-of-line and current cursor position;
  • CTRL + Y: Recovers previous deletion (Not file deletion!);

grep through gzip files vs tarball files

I recently came across a couple of instances, on random forums, where the user was trying to grep through a huge log file that had been tarred and then gzipped. Only then did I realized that some people out there do not know the differences and advantages between a gzip file and a tarball (actually, that particular dude didn’t know purpose of a tarball). So, for my personal benefit, I did a couple of experiments with some big log files (413 Megabytes of text file) to see what the speed vs storage benefits were when using gzip and tar.gz, all with bash (sorry tcsh users!).

A little on gzip files

A gzip file is simply a file formatted in a certain way, using a compression algorithm called DEFLATE, that makes the file smaller (read more on wikipedia) that It is best to use gzip when trying to save some space, but still want to have easy access to your files for peeking in them (see speed results of grepping through log files at bottom). The scenario where I use gzip the most is for compressing 100-200 log files, each 20-30 megs (again, see stats at bottom for space gains). An important feature of gzip is that it applies to single file (this is not a ZIP file people!) and so you will end up with 100-200 little gzip files, instead of big ones. This can be a little annoying for transferring to clients, and handling in general (notice the for loop in my bash script).

# gzip multiple files
for i in `find . -name "*.log"`; do gzip $i; done
 
# grep through all those gzip files
for i in `find . -name "*.log.gz"`;do zgrep "TIME" $i; done

A little on tarball files

The ideal situation in which to use a tarball is when you want to compress directories and data files, which you want to bundle up in one nice and tidy package for users/clients to download but still preserve some of the file system information such as permissions and directory structure (read more on wikipedia). Note here that the end product is one file, which can be extremely useful in some cases (even a necessity at times). I say a necessity because sometimes, when handing in log files through a ftp, you may want to have one package that you’ve encrypted for your client (using pgp), in which case you wouldn’t want to have 20-30 small packages all encrypted separately… In any case, to produce a tarball with a lot of logs, you can do the following under bash:

# Create a tarball
tar -czvf logs.tar.gz *.log
 
# grep through a tarball
zcat logs.tar.gz | tar -xvf - | xargs grep "TIME"

Storage vs Speed for grepping through different types of files

Some statistics on storage vs speed, while grepping in different types of files.
Type of file Size of files (Megabytes) Time for grep (Seconds)
RAW 413 248 2
GZIP 27 656 3
TARBALL 27 520 19

Notice enormous gain in space going from raw log files to gzip log files. You have a 93% reduction in size (RAW to GZIP), compared to a mere 0.05% reduction in size by going from many GZIP files to one tarball. Now, I didn’t even talk about the loss in speed when compressing. That is, of course, the most important thing to consider when dealing with files that you will need to peek through from time to time (logs are a perfect example).

In order to compare the speed for each scenario, i used a very simple bash script, which I copied here for documentation (notice that I redirect the output to a ‘toto’ file so that i don’t get anything printed on my screen). The performance of my ‘grep’ command on the RAW logs was very good, 2 seconds to find 107 336 occurences of “TIME” in the 10 logs. Now comparing this with the results of the GZIP logs and the TARBALL, 3 and 19 respectively, you can quickly see that it is extremely advantageous to use gzip for log files (Look at the little graph of the different time if you are a more visual person…). Not only the gain in storage is negligible when going from GZIP to TARBALL, but the speed at which you have access to your data is a lot slower.

echo "Testing speed of RAW"
echo "===================="
echo $(date)
for i in `find . -name "*.log"`;do grep "TIME" $i >> toto; done
echo $(date)
 
echo "Testing speed of GZIP"
echo "===================="
echo $(date)
for i in `find . -name "*.log.gz"`;do zgrep "TIME" $i >> toto; done
echo $(date)
 
echo "Testing speed of TARBALL"
echo "======================="
echo $(date)
zcat logs.tar.gz | tar -xvf - | xargs grep "TIME" >> toto
echo $(date)

Concluding remark

The are situations where a tarball is necessary (or advantageous), but, in general, to keep the size of many log files down and still be able to search through them, I recommend using gzip. Not to mention that all your favorite bash commands come in a gzip flavour (zcat, zgrep, zdiff, zmore, etc) and vi can easily read a gzip file on the fly! What more can you ask for!

© Copyright Bonuel Photography - Theme by Pexeto