Tuesday, January 18, 2011

Using the GNU tools for find and replace on Windows

There are many ways to do find and replace in multiple files on Windows. I use the familiar GNU find | grep | sed tools.

I use the GnuWin32 version of the unix utilities. I have used UnixUtils, but they are now outdated. The Automated gnuwin32 download tool downloads and manages the gnuwin32 utilities. Read the Readme in the getgnuwin32 distribution folder for instructions.

The use of GNU utilities in the windows command prompt environment is not without pitfalls. You use only double-quotes. There are problems with filenames containing spaces, and with path separators. We handle dodgy file names the same way we do on Linux, by passing around null-delimited lists of file names.

In this example, I'm replacing the ⁄ html character entity with the more common slash /. I have broken it down in to smaller steps, and I save the intermediate results to temp files.

First I find all the html files. I put all the file names in to a temp file. Passing -print0 separates the file names with null strings.

find "C:\The\windows\path to\my directory" -name "*.html" -type f -print0 > "C:\Temp\tmp.txt"

Next, I use xargs to read in all those null-delimited file names, and run grep over them looking for the ⁄ html character entity. I write out the names of files where grep found ⁄

xargs --null --arg-file="C:\Temp\tmp.txt" grep --null -l "⁄" > "C:\Temp\tmp2.txt"

Then, I use sed to replace all instances of ⁄ with /. Note that I have to eascape the & in the replacement.

xargs --null --arg-file="C:\Temp\tmp2.txt" sed -i -e"s/⁄/\//g"

Now I check the results. This grep should now find nothing.

xargs --null --arg-file="C:\Temp\tmp.txt" grep "⁄"