awm's blog

New blog engine

Trying out Hugo as a new blog engine. If you’re reading this, then I’ve succeeded.

Why you should always understand what's going on under the hood

This is about how I got bitten by DataMapper. No infection and I didn’t get rabies, but it still hurt.

I should mention that overall I think DataMapper is pretty good. It has most of the core features that an ORM should have without unneeded complexity. It beats the pants off of Active Somethingorother when it comes to making clean, backend-independent models.

Unfortunately most of the core features isn’t quite all of the core features, and when you really need that one missing piece, that’s when the trouble starts.

What bit me was the combination of two unrelated things:

I needed an atomic update with conditions attached.
Datamapper has different representations of the Boolean type, even between different SQLs.

The Missing Feature

The glaringly missing feature is the ability to update a value and know whether or not we actually changed anything.

In SQL this would look like this:

UPDATE "quarks" SET "seen" = 't', "whosaw" = 'ders' WHERE "id" = 42 AND "seen" = 'f'

Then I would get the number of affected rows to see if a change was made (i.e. whether or not the quark had already been seen).

So let’s try it with DataMapper:

Quark.all(:id => 42, :seen => false).update(:seen => true, :whosaw => 'ders')

Two problems here. The first problem is that the return value is always true, and we don’t get to see how many rows were updated. This isn’t too big a deal; we can work around it by using Quark.first instead of Quark.all, generating an exception if no records are found.

The second problem is the dealbreaker. Datamapper insists on generating two separate queries for the single update statement:

SELECT "id", "size", "name", "seen", "whosaw" FROM "quarks" WHERE ("id" = 42 AND "seen" = 'f') ORDER BY "id"
UPDATE "quarks" SET "seen" = 't', "whosaw" = 'ders' WHERE "id" = 42

This obviously won’t do, as it’s not thread-safe. Two users running this code at the same time would both believe that they saw the quark first.

The Solution

The solution was to write the update query in SQL.

da = DataMapper.repository(:default).adapter
r = da.execute("UPDATE quarks SET seen='t', whosaw='ders' WHERE id=42 AND seen='f'")
if r.affected_rows > 0
   ... # we saw it first

Kind of defeats the purpose of having an ORM, but it gets the job done. And as it turns out, I’m not the first one to run into this issue.

How I Got Bitten

Little did I know that the internal mapping of a Boolean field varies between SQL implementations. For Sqlite and Postgres, it’s a character field with 't' and 'f' values, whereas for MySQL it’s the integers 1 and 0.

In my case the unit tests all passed, but the live server (with a MySQL backend) started returning 500s.

It’s easy enough to change the query to work with MySQL:

r = da.execute("UPDATE quarks SET seen=1, whosaw='ders' WHERE id=42 AND seen=0")

But then the unit tests fail.

In the end, I wrote this bit of horrible code to keep the tests passing and the live server happy.

t, f = if da.options['scheme'] == 'sqlite'
  ["'t'", "'f'"] # sqlite
else
  [1, 0] # mysql
end

  ...

r = da.execute("UPDATE quarks SET seen=#{t}, whosaw='ders' WHERE id=42 AND seen=#{f}")

(There may be a way to extract t and f directly from the DataMapper internals, but I’m not that good yet.)

Python

$ python
>>> import this

Just try it.

AWS Locale

Every time I start a new EC2 Ubuntu instance, I’m confronted with the following warning when I ssh in:

_____________________________________________________________________
WARNING! Your environment specifies an invalid locale.
 This can affect your user experience significantly, including the
 ability to manage packages. You may install the locales by running:

   sudo apt-get install language-pack-UTF-8
     or
   sudo locale-gen UTF-8

To see all available language packs, run:
   apt-cache search "^language-pack-[a-z][a-z]$"
To disable this message for all users, run:
   sudo touch /var/lib/cloud/instance/locale-check.skip
_____________________________________________________________________

Furthermore, a variety of package installations fail with some complaint related to the locale, the default language, or both. And for some reason the advice to install relevant language packs is not helpful.

It turns out that there are some of environment variables (LANGUAGE, LC_CTYPE and LC_ALL to be specific) that are not set properly.

The advice to install language packs assumes that these environment variables are set to a language that’s not installed. However, in the case of a new EC2 instance, these variables are not set at all.

An easy way to get the warnings to go away is to edit the file /etc/default/locale so that these variables always get set. I’ve found that the default installation only sets LANG.

/etc/default/locale

LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE=en_US.UTF-8
LC_ALL=en_US.UTF-8

As always, it’s also a good idea to make sure you have the latest and greatest packages:

$ sudo apt-get update
$ sudo apt-get upgrade

And finally, while we’re at it, why not set the timezone?

$ sudo dpkg-reconfigure tzdata

Next time I need to set up a new EC2 instance, I’ll come read my own blog and know exactly what to do.

Woke up, fell out of bed

Things I learned this year:

Ruby. Ruby is an interesting language, but I find that the ability to open up classes and modify methods on the fly is too easily abused. Also not a fan of all the alternate method names (e.g. map and collect) – would prefer just to choose one and stick with it.

Git and Github. Made crystal clear with SourceTree.

Rails. Learned to hate it. Was going to write a blog post on why I hate rails, but someone beat me to it.

Rubygems. Learned to hate them too. I’ve wasted a stupid amount of time figuring out that my program is broken because the gems I’m using don’t work the way they’re supposed to. Better to just write it myself. (I actually think the gem system is pretty cool, but quality control, ladies and gentlemen, quality control.) And if you can do something in two lines of ruby code, you don’t need a gem.

Data Mapper. I wish this were maintained better. It’s a handy tool, especially when you’re dealing with simple, table-friendly data. It’s not so good at complex queries, though, and I think that the lofty goal of making it compatible with all different kinds of database engines is hampering its ability to work really well with the most common ones (e.g. SQL).

Python. Python rocks. My next job will be in a python shop.

Django. Django rocks along with python. In my next life (when I get really good at python and am independently wealthy with a lot of free time) I’m going to be a regular contributor to the django project.

Unfuddle and Pivotal Tracker. Say what? What is there to learn there? In fact there’s a lot to learn if you’ve never used an issue tracking system before. I find the Unfuddle UI to be kind of clunky, especially if you’re dealing with a large number of tickets, but it’s much more thorough than Pivotal Tracker in keeping track of comments, ticket disposition, change history, and so on. Maybe some clever person will invent a PT-style front end for Unfuddle.

Heroku. Five stars. The first time I attempted a heroku deployment, it was all black magic and I was lost. Now it’s still black magic, but I’ve learned that I can use it. Deployment is one of those things that gets more and more complicated the more you try to understand it. Heroku allows you to remain ignorant and just have your program run.

Android. On my desktop I have a 2200-page book on how to code for Android. I’m going to tell you I’ve read it all. I’ll be lying, and you won’t believe me. In reality, I’ve learned the basics and can make a simple app. I’m still at the tedious stage where I have to look everything up, and Android programming is already tedious by nature, but I’m getting it done.

Amazon AWS. This probably has the steepest learning curve of all. The documentation is complete and thorough and is also written to an audience of Sysadmin Ph.Ds. So far I’ve learned to use S3 buckets and create EC2 instances.

Things I want to learn next year:

advanced Python
advanced Django
advanced Git
Neo4j
Go
Sass
node.js
backbone.js
how to deploy stuff (why is this so hard, anyway?)
Haskell

Which train is that?

Straight out of college, I went on a summer backpacking trip to Europe. Well, it wasn’t exactly a backpacking trip, as (1) my bag wasn’t a backpack, and (2) I actually spent two months in one place attending a language program, but that’s another story.

My ticket back home was out of London’s Heathrow airport, and since I was in Bavaria at the time, I had a major train + ferry + train journey just to get my flight. I had the journey carefully researched and perfectly timed so that I’d be on the ferry just as my rail pass expired. The journey, though long, was largely without incident, but something that happened on the tube from Liverpool Street to Heathrow left a lasting impression.

I had figured out in advance which train I needed to catch. I’ll guess now that it was the Circle Line to South Kensingon, changing to the Piccadilly Line. I followed all the signs carefully and got myself to the correct platform. I noticed that there were other lines operating on the same track, so I had to make sure that I got on the correct train.

Within a couple of minutes, a train arrived. I looked to see whether or not it was a Circle Line train, but wait. There was no marking whatsoever. No Circle Line, no Metropolitan Line, nothing. And yet, busy Londoners were getting on and off, some were waiting for a different train, and they all obviously knew something about this mystery train that I didn’t – namely what line it was running.

I finally asked a woman who was standing nearby if she could tell me what train this was. Predictably, she looked at me as if I were an idiot. She was silent for a few seconds, and then without moving her head she glanced slightly upward and answered me. “Metropolitan.” End of conversation.

You see, in New York, the line number/letter and destination are clearly visible on the outside of every subway car. In fact, everywhere I’d travelled, every bus, subway, and tram was labeled in this fasion. In New York at the time, looking at the train itself was the only way to know if it was the train you wanted; fancy platform displays were as of yet a thing of the future. It had simply never occurred to me that there was another way.

The train information signs in the London tube station were large enough to read easily but also small enough to miss completely if you didn’t know they were there. Had I gone to London with this one simple bit of information – that there is train information displayed above the platform – I could have saved myself this small embarrassment and the possibility of getting on the wrong train.

This, my friends, is an exact parallel of what’s happening in my workplace nowadays.

More Data Mapper

One of the basic functions of Data Mapper is to remember which attributes in the model have been modified so that it’s easy to determine what (if anything) needs to be updated in the database. Data Mapper checks automatically on a call to #save and only writes what needs to be written. It also provides methods #dirty? and #attribute_dirty?, which tell you whether or not a record or a particular attribute has changed.

Unfortunately, while it’s easy to find out whether or not an attribute has changed, there is no easy way to see what the old value was. It’s obviously keeping the old value somewhere. We know this because when you change an attribute back to the old value, it recognizes that you’ve done so and considers it unchanged.

There is a method called #dirty_attributes, which returns a hash of changed attributes, but the keys to this hash are hashes themselves and in a format that’s used only internally in Data Mapper, making it a needlessly inconvenient method to use. Also, I’d like to avoid #dirty_attributes as it’s not part of the public API.

There is a possible workaround, suggested by some. Override the setter for the attribute and save the old value for later use.

def thing=(newthing)
  @oldthing ||= @thing
  @thing = newthing
end

I’m not going to link to the people who suggested this, though, because it’s a terrible suggestion. Since we’ve overridden Data Mapper’s setter for attribute thing, Data Mapper no longer knows that we’ve changed its value.

1.9.3-p392 :001 > record = Record.get(1)
 => ...
1.9.3-p392 :002 > record.thing
 => "teamaker"
1.9.3-p392 :003 > record.thing = "coffeemaker"
 => "coffeemaker"
1.9.3-p392 :004 > record.attribute_dirty?(:thing)
 => false

Imagine the insidious bugs that could creep in here.

1.9.3-p392 :005 > record.save
 => true
1.9.3-p392 :006 > record = Record.get(1)
 => ...
1.9.3-p392 :003 > record.thing
 => "teamaker"

Simply put, our changes are silently ignored because we’ve stupidly disabled what is arguably Data Mapper’s most important function.

Fortunately, there is a correct way to do this. Instead of setting the attributes directly, we set them using Data Mapper’s #attribute_set.

def thing=(newthing)
  @oldthing ||= @thing
  attribute_set(:thing, newthing)
end

Method #attribute_set keeps track of the changes. It’s what thing= pointed to before we overrode it.

Data Mapper

I’ve been working on a project with Padrino and Data Mapper. So far I’m quite a fan of the Data Mapper way of doing things.

Unfortunately, I’m schooled in the old way – ugly messes of SELECT, JOIN and ORDER BY, intelligible only by SQL gurus and dependent not just on SQL but on a particular variety of such, which in my case would be MySQL.

I label this as unfortunate because my concept of database (e.g. MySQL) heavily influences how I organize the code I write. I often find myself with a “How do you do this in DataMapper?” kind of question, where this is something that I know how to do the old way. After all, DataMapper is generating SQL queries from the models I write, so if it’s easy in SQL, shouldn’t it also be easy in DataMapper?

(Side note: DataMapper doesn’t necessarily generate SQL, but in my current project the backend is SQL and I see the generated queries on the debugging console.)

Recently I’ve a question of this sort that I haven’t been able to solve.

I have a model with an ordering field, which we’ll call position. I want to sort by this field (ascending), except that I want all the zeroes to be at the end. In addition, I’d like the all the records with position=0 sorted by id descending.

In MySQL, I would write:

SELECT * FROM `things` ORDER BY `position`=0, `position`, `id` DESC

Free donuts to the first person who can make a DataMapper version of this.

Rails for Zombies

I got introduced today to an excellent Ruby on Rails tutorial with an entertaining zombie theme. It covered a lot of the basic concepts, many of which I’d skipped over in my haste to dive into a real-live project.

Being who I am, I noticed a couple of inconsistencies between the tutorial videos and the exercises. Actually, this only applies to the level 5 video; the rest seemed to be fine.

In the level 5 video at 2:10, the following match example is given:

match 'new_tweet' => "Tweets#new"

Note the uppercase T on Tweets. This T is uppercase throughout the video.

However, when I tried to do the second exercise for this level, the following answer was rejected:

match 'undead' => "Zombies#undead"

The hints told me to do this:

match 'undead' => "zombies#undead"

with a lower case z, which was accepted. Now I’m confused. Do we need a capital letter here or not?

In the same video at 3:30, the following match example is given:

match 'all' => redirect('/tweets')

Note that there is no leading slash on ‘all’; this format is consistent throughout the video.

However, when I tried to do the third exercise, the following answer was rejected:

match 'undead' => redirect('/zombies')

This time the hints told me to do this:

match '/undead' => redirect('/zombies')

Again, I’m confused. Do we need (or even want) a leading slash here?

Validating data with Mongoid

I’ve been working with Mongoid, which is an object-document-mapper for MongoDB written in Ruby.

Mongo organizes data into collections of documents, just as relational databases such as SQL organize data into tables of records. Reading and writing of documents is done via named classes, one for each collection.

The named class for each collection includes Mongoid::Document to get the database interface methods such as .where, .new, and .save. It also defines the data fields and any custom data handlers.

One very useful feature is the availability of automatic validators which check the format and integrity of your data before allowing it to be saved. There’s a myriad of options, and they are not very well explained in the documentation.

Since the data validators are shared with Active Model, I decided to look for some help there and found this pretty good description of what kind of validation could be done. Unfortunately, it wasn’t clear anywhere how to actually use the validators once they’re defined.

After a bit of hair-pulling, I discovered it’s actually quite simple.

Let’s define a minimal class Iqscore. (The name of the collection will be iqscores; this is a weird behavior of Mongoid whereby class names must be singular and Mongoid will pluralize them for you when naming the collection.)

require "mongoid"

class Iqscore

  include Mongoid::Document

  field :kid, :type => String
  field :iq,  :type => Integer

  validates :kid, :presence => true, :uniqueness => true
  validates :iq, :numericality => true

end

Mongoid provides a valid? method on Iqscore objects. Valid? tells us whether or not the criteria in the validates declarations are met.

1.9.3p385 :008 > x = Iqscore.new({kid: "George", iq: 70})
 => #<Iqscore _id: 512c72d5352420234d000003, kid: "George", iq: 70>
1.9.3p385 :009 > x.valid?
 => true

1.9.3p385 :010 > y = Iqscore.new({kid: "Bill", iq: "unknown"})
 => #<Iqscore _id: 512c736e352420234d000004, kid: "Bill", iq: 0>
1.9.3p385 :011 > y.valid?
 => false

If it doesn’t validate, we can see what’s wrong by looking at the errors property. In this case it tells us that iq is not a number (and it should be). Note that the message “is not a number” is in an array, as it’s possible for there to be multiple messages for a single field.

1.9.3p385 :012 > y.errors
 => #<ActiveModel::Errors:0x98a31cc @base=#<Iqscore _id: 512c736e352420234d000004, kid: "Bill", iq: 0>, @messages={:iq=>["is not a number"]}>

The valid? method is called automatically before any save operation (e.g. save or create), and if it returns false, then the save is not done. Both save and create return true or false to indicate whether the save was done or not.

1.9.3p385 :013 > x.save
 => true
1.9.3p385 :014 > y.save
 => false

At this point George is in our database, but Bill isn’t.

What did I do today?

02 Jul 2015, 11:14

27 Jan 2014, 10:07

The Missing Feature

The Solution

How I Got Bitten

17 Jan 2014, 12:20

10 Jan 2014, 14:55

17 Dec 2013, 11:06

02 May 2013, 10:03

16 Apr 2013, 14:05

16 Apr 2013, 12:19

21 Mar 2013, 12:19

26 Feb 2013, 15:57