Tuesday 22 November 2011

The power of scripting

Interesting problem last night: I wanted to find out how far it was
between a bunch of places. I had the location post codes buried away
in a stack of form letters in pdf form. So convert those to text,
open in emacs and extract the post codes. I couldn't get multi-line
replace-regexp to work, re-builder successfully highlighted the
line but replace-regexp failed to match the same regex. Oh well, I
just want to hack this out so keyboard macros did the job.

A quick google later and I came up with the static maps API
so threw the post codes at that to get back a JSON object with all the
distances. Then using json.el in ielm, convert that to a lisp
structure, pick that apart to extract the relevant fields and run my
calculations over that, plotting the results using R in org-mode.

We have some fantastic tools at our disposal: this is a fun game.

So, back to the multi-line regex that fails. I've got an input file in
the form:

a
1
b

a
2
b

and want to match the paragraph a..b Using re-builder I can construct
a regex that matches:

"a\\(.*
?\\)*?*b"

But using that regex in replace-regex results in no matches.
Digging in to this it turns out that, because I'm using replace-regexp
interactively the escaping rules are slightly different: I only need a
single backslash before the brackets.

When is a solution good enough? The keyboard macro approach did the
job quickly and easily. I then spent longer trying to debug the
multi-line regexp than it would have taken me to solve the original
problem by hand. But I learnt something in the process so surely that
is time well invested. Besides it was fun!

No comments:

Post a Comment