I hit a very fun bug yesterday while trying to run a script that sends emails to certain subsets of Hacker Schoolers. When I tried to test the script locally, I discovered that one of the tables of the database,
Batch, was missing from my local version. After briefly panicking and making sure that the actual site was still up, I could dig in.
It turns out that my local version of psql was way out of date, and as of a few days ago we’d started using a data type that wasn’t present in my old version. Because of that, creating that particular table failed when I pulled from the production database the night before. The failure was logged, but the output is so verbose that I didn’t notice the problem. Both the diagnosis and the fix here were easy – I went back and read the logs, googled the data type that was raising an error, and then upgraded Postgres.app and psql. That’s when the real trouble started.
The new version of Postgres.app was placed in a new spot on the $PATH, as you’d expect, and the upgrade prompted me to change my
.bashrc, which I did. But the rake tasks we use to manage local copies of the database errored out with this message:
This was pretty clearly a $PATH problem. I tried the usual things first, like sourcing my
.bashrc in the terminal I was using, closing the terminal and opening a new one, etc. None of that worked.
One thing that jumped out to me was the
sh in the error message. That was an indicator that rake wasn’t using bash as a shell – it was using
sh – which means my
.bashrc wasn’t setting the environment. Reading the rake task showed that it was a thin wrapper around lots of system calls via Ruby’s
system("cmd here"). I added the line
system("echo $PATH") and verified that the new location of
pg_restore wasn’t in it.
At this point I found I had lots of questions about the execution context of the rake task. Since I was making system calls and could easily edit the rakefile, I added in the line
system("sh") to drop me into a shell mid-execution. This turned out to be an efficient way to figure out what was going on (and made me feel like a badass hacker).
From within in that shell, I could do
$$ to get that process’s PID, then repeatedly do
ps -ef | grep [PID] to find the parent process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aha! The parent process of the rake task I was running is the spring server, which starts on boot – several days ago, at the time – and doesn’t have the new and updated $PATH information.1 A kick to the spring server (with
kill 2913) forced the server process to restart with the new environment.
It turns out there’s a handy utility called
pstree2 (brew installable) to visualize the tree of processes. This would have saved me a couple of steps of grepping. For example:
1 2 3 4 5
This bug and some related ones have gotten me more interested in operating systems, and I’ve started reading the book Operating Systems: Three Easy Pieces. I’m only a few chapters in, but so far it’s readable, clear, and entertaining. I look forward to building up my mental model of processes and environments as I keep reading it.