Dec 31 2009

Using what’s available

datchley

I’ve seen a lot of questions from new and beginning programmers on various forums and other online communities. A lot of them, albeit, geared toward a particular language, like shell programming, Perl or C; but generally all them are in the Unix variety of environments, primarily Linux. Many times, they are asking how to do a given task in a given language, when that task might be better done in other ways. Yes, there are times where you’ve been hand-cuffed to a particular language or operating environment; but there are generally 1001 ways to skin a cat in Unix and people tend to forget this as they’re picking up a language like Perl, Python, Ruby and others. If you are writing scripts and programs on Unix that operate on the back-end of things — i.e. not CGI or web related and/or dealing with GUI interfaces — you are almost always better off combining your primary language and those smaller scripting languages and other tools that Unix provides for developers to make your life easier, and help keep your code manageable.

In our example, someone on a Perl forum was asking about how to copy a set of files from a directory and it’s sub directories to another single directory. All the files were gzipped, with a .gz extension. Doing this in Perl using File::Find or some other such feature is a bit of overkill for this given task. The Unix shell and accompanying commands it provides can handle this with less typing.

$ find . -name '*.gz' | xargs -I {} mv {} /new/parent/folder/

This is very quick and can even be called from the given Perl script via system() or backticks. Otherwise, doing it in Perl with File::Find would take a number of lines to find the matching files in the given directory tree and then you’d still have to write the Perl code to move the files over to the new directory. Just call the above snippet from your Perl script, check the return results and move on. No need to reinvent the wheel.

For reference, here’s the equivalent operation using Perl only:

#! /usr/bin/perl

use File::Find;
use File::Copy;

my $target = "/new/parent/folder";

find (\&filemove, "./");
exit(0);

sub filemove() {
        return unless -f $File::Find::name && m/\.gz$/;
        move($_, \"$target/$file\");
}

That’s much more coding. And honestly, you’re not saving yourself much portability unless you plan on running this in a non-Unix environment as well. Remember, there’s no problem combining different types of languages or scripts to accomplish the given task. That’s what all those tools are there for to begin with!

Linux/Unix provide developers with a deep and rich tool-set of commands and utilities which they should take advantage of in solving problems.  It was a design goal of Unix to provide discrete commands and tools with very specific functionality that could be combined to solve more complex problems – so why not extend that paradigm to include all the great modern languages and tools that we all use today as well.

Enjoy!


Dec 18 2009

chext: Batch rename file extensions

datchley

Most Linux systems come with a rename command today; but some of the commercial Unixes like AIX and HP-UX don’t have many of the command sets that Linux users have come to rely on.  This isn’t quite the same as rename, but it’s a script I put together a while back because changing the extension on a number of files was something I was doing quite frequently on those types of systems.  Here’s the script, feel free to copy/paste and use.

#!/bin/sh
# chext - batch rename files by changing the file extension
# Author: Dave Atchley <dave@tuxz0r.net>
#----------------------------------------------------------------------

usage() {
	echo "Usage: $0 [-R] OLD NEW"
	echo
	exit 1
}

case "$1" in
# -R recursive option
-R) 	if [ $# -ne 3 ]; then
		echo "error: missing arguments"; usage
	fi
	ext=$2
	new=$3
	files=`find . -name '*$2'`
	;;
-h)	echo "chext: change the extension on multiple files"; usage
	;;
*) 	if [ $# -ne 2 ]; then
		echo "error: missing arguments"; usage
	fi
	ext=$1
	new=$2
	files=`echo *$1`
	;;
esac

for f in $files
do
	mv $f `echo $f | sed 's/'"$ext"'$/'"$new"'/'`
done
exit 0

Basically the script will by default handle files in the current directory, but using the -R option will allow it to do so recursively.  You can call it from the command line as

$ chext .old .new

or recursively as

$ chext -R .old .new


Dec 17 2009

Staging Unit Tests using rsync

datchley

Everyone has their own development process and tools.  In most software processes, though, there is a spot for “unit testing” by the developer on the given feature or defect they are working on.  Since I’m primarily involved in web development using PHP these days, this involves only a couple of steps:

  1. design and code the feature/bug fix
  2. push those changes out to our development server and test

The second part is what we’ll address in this article – namely, how I currently get the files I’m coding up to the development environment for testing in an efficient manner.  Warning, if you aren’t using a scripting language like Ruby or PHP, such as Java or something else that needs compiling, then this method will not work for you. However, feel free to continue reading.

As a developer, I want to be able to quickly push my coding changes out for testing during the course of development. This lets me break the given feature or bug fix down into more digestible parts and get continuous feedback on how my design is holding up and whether I need to make any changes or corrections.  Unit testing is not final “Acceptance Testing” of a feature, but a way to ensure that the given code builds and runs without any glaring errors or faults.

At our office we distribute our web applications here much like the open source community distributes their software – using GNU autotools to build a tar.gz package.  This makes installs simple, especially since another group in our company does the actual installations (and frankly, sometimes even this method isn’t simple enough for them, but I won’t mention any names). However, if I’m needing to push files out quickly to development so I can unit test, I don’t want to take the time to build a completely new autotools package of the system I’m working on just to install 1 or 2 files with a few code line changes.  Seems like overkill. Remember, good coders are inherently lazy. I’m pretty sure that’s a Larry Wall quote. I’d link you to it, but you can google just as well as me. ;-)

Luckily, for a number of our systems  the code in our code repository  is structured just like the actual installation would be.  So, if I was working on a project we might have a php/, js/, css/, img/ and other assorted directories in our repository and this is exactly the same structure we’d have in our installation.  Since we’re setup this way, I can easily script the pushing of files out to development for testing without worrying about making a brand new package using a command called rsync.

If you are familiar with ssh or rsh — and you like them — you will love rsync. The rsync command is installed by default on most Linux systems, but if your flavor doesn’t have it it is free to download and install – and the great thing is you don’t have to be root to install it or get it working.  You will however, have to have it installed on each machine you want to rsync between.  In most cases, rsync will use ssh for the remote file transfers, but this can be setup differently if you want.  This means, that if you pass around ssh keys to your servers, then rsync will take advantage of that when it’s copying files.

The important things to note about rsync from our standpoint are that it allows us to copy files to a remote server, takes advantage of ssh (which we like) and it does so quickly using a delta style algorithm.  Basically, we’re using rsync to “mirror” my working development folder directly on the server.  For one of my projects, here is the “sync” shell script that I use to push files out for testing using this method (names changed to protect the innocent):

#!/bin/bash
#----------------------------------------------------------------------

# Default target to something useful
target=$1
TARGET=${target:=username@dev.company.com:/home/username/public_html/exh/}

EXFILE=/tmp/excludefiles.$$
cat >$EXFILE <<EOF
- configure
- Makefile
- **.am
- **.in
- **.cache
- **.log
- .git/***
- m4/***
- build-aux/***
- ext-2.0.2/***
- ext-2.1/***
- sql/***
EOF

echo "Syncing files to location: $TARGET .........."
if rsync --exclude-from=$EXFILE --delete -ravve ssh ./ $TARGET 2>&1; then
 echo "ok"
else
 echo "failed"
fi

rm -f $EXFILE
exit 0

Let’s talk about this in a bit more detail.  The rsync command takes a destination parameter much like ssh and scp, of the form

username@host:/path/to/file/or/folder

This is what we setup in the beginning, allowing me to pass in an arbitrary destination on the command line for the script or, without one it defaults to pushing the files out to my development environment.  I should also mention that this script is in the top level of my working code repository so when I run it from there it will copy my entire directory structure using rsync.

Now that we have a destination target for rsync to use, we also want to tell rsync to ignore certain files and directories. In my case, I don’t want to copy any of the autotools related files (Makefile.am, configure.in, etc.) or certain subdirectories which are only development and configuration related and not actual working code.  We do this by creating a listing of the items we want to exclude in a temporary file.  Here’s a synopsis of the syntax we use to list those file and file/directory name patterns for rsync.  The ‘-’ at the beginning of the line tells rsync to ‘exclude’ the file during syncing, and a ‘+’ would be the opposite.

  • if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the end of the pathname. This is similar to a leading ^ in regular expressions. Thus “/foo” would match a name of “foo” at either the “root of the transfer” (for a global rule) or in the merge-file’s directory (for a per-directory rule).
  • if the pattern ends with a / then it will only match a directory, not a regular file, symlink, or device.
  • rsync chooses between doing a simple string match and wildcard matching by checking if the pattern contains one of these three wildcard characters: ‘*’, ‘?’, and ‘[' .
  • a '*' matches any path component, but it stops at slashes.
  • use '**' to match anything, including slashes.
  • a '?' matches any character except a slash (/).
  • a '[' introduces a character class, such as [a-z] or [[:alpha:]].
  • in a wildcard pattern, a backslash can be used to escape a wildcard character, but it is matched literally when no wildcards are present.
  • if the pattern contains a / (not counting a trailing /) or a “**”, then it is matched against the full pathname, including any leading directories. If the pattern doesn’t contain a / or a “**”, then it is matched only against the final component of the filename. (Remember that the algorithm is applied recursively so “full filename” can actually be any portion of a path from the starting directory on down.)
  • a trailing “dir_name/***” will match both the directory (as if “dir_name/” had been specified) and everything in the directory (as if “dir_name/**” had been specified). This behavior was added in version 2.6.7.

Once we have this file setup, we simply call the rsync command telling it our target and passing in our exclude list.  Then, we remove the temporary exclude list we built and we’re done.  The options I’m using on rsync here are as follows:

  • --delete – I want rsync to remove any files on the “receiving side” that aren’t on my “sending side.”
  • --recurse or -r : I want rsync to recurse into directories and subdirectories to make the copy. duh?
  • --archive or -a : I want to preserve ALL properties of the files I’m copying, perms, owner, times, etc. (use to taste)
  • --verbose or -v : I use this twice, as the more I specify the more gratuitous rsync’s output on what it’s doing will be.
  • --rsh or -e : specify the remote shell to use when copying (we want ssh)

Again, rsync has more options than I could shake a stick at so please check out the man page and do some reading.

In using rsync, I’ve given myself a quick and easily configurable way to copy code and files out to an environment for testing without having to build entire packages.  Hopefully this is useful to you in your case; but, given that your working repository might not mirror the structure of your actual installation this might not be best for your situation.  This is just what works for me, on my current projects; and I’m sure that will change in the future too.