Introducing CreepyCoder; stalking programmers through commit messages

2 min read

I've always wondered what commit messages say about a coder - so inspired by Creepy, I decided to find out.

Every time a coder commits something into a source control repository, they make  information available. Ohloh is one site that attempts to extract information from that. To start with, I just wanted to look at the time stamps.

Firstly, looking at the commits I made on Elastik in a more date-orientated manner: lets put them into an iCal file, so we can view them on a calendar.

A couple of commits Monday evening, then lots on tue and wed during office hours? We'll come back to that. How about aggregating them by hour of the day?

No great surprises there, although even I'm surprised I committed something at 6am in the morning (you can't see it in the graph because it only happened once, but I did).

Why is there a marked increase on Tue, Wed and Thu? If you know I only work part time, and those are my normal work days, then this graph and the calendar screenshot make sense.

The basically working proof-of-idea PHP scripts are here.  I did try to build a very modular class structure so it's easy to add other methods of collecting or writing data, but the code is quick and dirty. Be warned.

I did it all on a netbook over one weekend. The 1st version was started on the train on Friday afternoon, and was got working when I got home late on Friday night. On Sat and Sun I was out all day, but I did some coding on it both mornings before I went out and both evenings when I came back.

It's not perfect information. It doesn't tell us how long a programmer spent working on a task before committing it, and with systems like Subversion that need network access you can't tell if the programmer worked off-line and then committed later when they got access. But still, turns out it's pretty trivial to get information about a programmers routine.

This could be extended to other open source activities; looking for comments in an issue tracker for instance, all of which are time-stamped. Or people other than programmers; anyone who contributes to an web forum regularly could find their routine analysed.

Comments welcome!