Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Using Size to Debug D3 Selections

Yesterday I was learning about the relatively new General Update Pattern in D3, but I couldn’t get it working. I knew I was missing something obvious, but how to figure out what?

Because I was dealing with nested attributes, I was assigning the selections to variables and then merging them later, so ended up with this heavily simplified (and broken) code to display a list of tiles:

 1 let tiles = container.selectAll('.tile').data(data, d => d.id)
 2 
 3 let tilesEnter = tiles.enter()
 4   .append('div')
 5     .attr("class", "tiles")
 6 
 7 tiles.select('.tile').merge(tilesEnter)
 8   .attr('color', d => d.color)
 9 
10 let contentEnter = tilesEnter
11   .append('span')
12     .attr('class', 'content')
13 
14 tiles.select('.content').merge(contentEnter)
15   html(d => d.content)

When I updated the data, the content of the tile in the child element updated, but the color at the root level did not!

I tried a number of debugging approaches, but the one that I found easiest to wrap my head around, and that eventually led me to a solution, was using the size() to verify how many elements where in each selection.

1 console.log("tiles entering", tilesEnter.size())
2 console.log("tiles updating", tiles.select('.tile').size())
3 console.log("content entering", contentEnter.size())
4 console.log("content updating", tiles.select('.content').size())

This allowed me to verify that for the second working case (for data with four elements) that the enter/update selections went from 4 and 0 respectively to 0 and 4 when data was updated. For the first case, the update selection was always zero, and this led me to notice the extra select('.tile') shouldn’t be there for the root case, since we’re already on that selection from the selectAll in the initial setup!

I found logging the entire selection to not be as useful, because it’s confusing what its internal state actually means.

Adding Last-Modified response header to Haskell Servant API

Given the following Servant API (boilerplate redacted for brevity):

1 type MyAPI = "some-api" :> Get '[JSON] NoContent
2 
3 someApi = return NoContent

How do you add a Last-Modified header? As a first attempt, we can use the Header type with addHeader and a UTCTime:

 1 import Data.Time.Clock (UTCTime, getCurrentTime)
 2 
 3 type LastModifiedHeader = Header "Last-Modified" UTCTime
 4 type MyAPI = "some-api" :> Get '[JSON] (Headers '[LastModifiedHeader] NoContent)
 5 
 6 someApi = do
 7   now <- getCurrentTime
 8   addHeader now
 9   return NoContent

Unfortunately, this returns the time in the wrong format!

> curl -I localhost/some-api | grep Last-Modified
Last-Modified: 2018-09-30T19:56:39Z

It should be RFC 1123. We can fix this with a newtype that wraps the formatting functions available in Data.Time.Format:

 1 {-# LANGUAGE GeneralizedNewtypeDeriving #-}
 2 
 3 import Data.ByteString (pack)
 4 import Data.Time.Clock (UTCTime, getCurrentTime)
 5 import Data.Time.Format (formatTime, defaultTimeLocale, rfc1123DateFormat)
 6 
 7 newtype RFC1123Time = RFC1123Time UTCTime
 8   deriving (Show, FormatTime)
 9 
10 instance ToHttpApiData RFC1123Time where
11   toUrlPiece = error "Not intended to be used in URLs"
12   toHeader =
13     let rfc1123DateFormat = "%a, %_d %b %Y %H:%M:%S GMT" in
14     pack . formatTime defaultTimeLocale rfc1123DateFormat
15 
16 type LastModifiedHeader = Header "Last-Modified" RFC1123Time
17 type MyAPI = "some-api" :> Get '[JSON] (Headers '[LastModifiedHeader] NoContent)
18 
19 someApi = do
20   now <- getCurrentTime
21   addHeader $ RFC1123Time now
22   return NoContent
> curl -I localhost/some-api | grep Last-Modified
Last-Modified: Sun, 30 Sep 2018 20:44:16 GMT

If anyone knows a simpler way, please let me know!

Irreverant technical asides

Many implementations reference RFC822 for Last-Modified format. What gives? RFC822 was updated by RFC1123, which only adds a few clauses to tighten up the definition. Most importantly, it updates the year format from 2 digits to 4! Note that Date.Time.Format.rfc882DateFormat is technically incorrect here, specifying a four digit year. Data.Time.Format.RFC822 gets it right.

rfc822DateFormat is also technically incorrect in another way: it uses the %Z format specifier for timezone, which produces UTC on a UTCTime. This is not an allowed value! However, RFC 2616 says “for the purposes of HTTP, GMT is exactly equal to UTC” so GMT can safely be hardcoded here since we know we always have a UTC time.

Using Haskell Servant to Power a Decoupled React Single Page Application

Recently I’ve been experimenting with different ways of building web applications. In particular, I’m interested to what extent it is feasible to start an application with a “pure” API, as distinct from a typical Ruby on Rails application. This approach would limit the backend server to only API endpoints, and restrict it from any kind of HTML generation. All HTML concerns would be pushed to a frontend using something like React.

I published an example application that demostrates this architecture using Servant and React. In this post, I’ll detail some of the issues I came across getting this working.

Authentication

One difficultly I came across was how to handle third-party authentication (via Google OAuth) in this scenario when running the backend and frontend as completely separate services. A typical OAuth flow requires server side calls and interactions that don’t work when the flow is split over two different services.

Google provides an OAuth flow for Web Applications that addresses the first issue. The hard part is how to verify that authentication in the backend.

This OAuth flow provides the client with a JWT containing information about the user, such as their email and granted scopes. This can be verified and trusted on the server using Google’s public key, which needs to be continually fetched from their endpoint to keep it current.

This verification can be done in Servant using a Generalized Authentication handler.

CORS

Requests between applications on different hosts have to negotiate CORS correctly. This could be mitigated by running a reverse proxy in front of both services and presenting them at a single domain, but I wanted to see if I could make it work without this.

A few things are required for correct CORS handling. First, appropriate Access-Control-Allow-Origin headers need to be set on requests. This is best handled with a middleware from the wai-cors package.

That would be sufficient for “simple” requests, but for since our API uses both a non-simple content type (application/json) and the Authorization header, they need to be added to the default policy:

1 corsPolicy = simpleCorsResourcePolicy
2                { corsRequestHeaders = [ "authorization", "content-type" ]
3                }

Also, these non-simple API requests will trigger a CORS preflight, which sends an OPTIONS request to our API. The API needs to be extended to handle these requests. servant-options provides a middleware to do this automatically from an API definition. Unfortunately, servant-options didn’t work out of the box with servant-auth. I needed to provide an instance of HasForeign for AuthProtect. A simple pass-through implementation looks like this:

1 instance (HasForeign lang ftype api) =>
2   HasForeign lang ftype (AuthProtect k :> api) where
3 
4   type Foreign ftype (AuthProtect k :> api) = Foreign ftype api
5 
6   foreignFor lang Proxy Proxy subR =
7     foreignFor lang Proxy (Proxy :: Proxy api) subR

I later extended this to include appropriate metadata so that I could use it to generate clients correctly.

JS Clients

A nice thing about Servant is the ability to auto-generate client wrappers for your API. servant-js provides a number of formats for this, though they weren’t as ergonomic as I was hoping. It doesn’t currently have support for servant-auth nor support for ES6-style exports. Rather than solve this generically, I wrote a custom generator. For fun, it outputs an API class that allows an authorization token to be supplied in the constructor, rather than as an argument to every function:

1 let api = new Api(jwt);
2 api.getEmail();

I’m not sure what the best way to distribute this API is. Currently, the example writes out a file in the frontend’s source tree. This works great for development, but for production I would consider either a dedicated build step in packaging, or serving the JS up directly from the API server.

Aside from this generated client, I didn’t do anything particularly interesting on the React front. The app included in the example is very simple.

Conclusion

This wasn’t a big enough project to draw any serious conclusions about the approach. It is evident however that Servant still has a couple of rough edges when you get outside of the common cases. It took a while to wrap my head around how Servant uses the type system. I found this post and exercise very helpful.

I hadn’t used JWTs before, and they strike me as a pretty neat way to thread authentication through a distributed application.

2017 Contributions

This was a light year for open source contribution, 168 by Github’s count (and some uncounted on Bitbucket). I’ve continued on the RSpec core team, though the project is mostly stable and I haven’t been particularly active. I did find time to experiment with some new things however:

And as always, a smattering of issues to help improve documentation. Leave things better than you found them!

Next year, I’m looking forward to playing around with Firebase, and checking back in on Rust to see how it’s progressing.

  • Posted on December 31, 2017
  • Tagged code

Migrating Enki to Jekyll

I just converted this blog from a dynamic Enki site to a static Jekyll one. I wanted to get rid of the comments, add SSL, and not have to upgrade Rails so often. I prefer composing locally also.

First, I exported all of the posts to lesstile templates using a rake task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
task :export_posts => :environment do
  Post.find_each do |post|
    filename = "%s-%s.lesstile" % [
      post.published_at.strftime("%Y-%m-%d"),
      post.slug
    ]

    dir = "_posts"
    yaml_sep = "---"

    puts filename

    body = <<-EOS
#{yaml_sep}
layout: post
title:  #{post.title.inspect}
date:   #{post.published_at.strftime("%F %T %:z")}
tags:   #{post.tags.map {|x| x.name.downcase }.sort.inspect}
#{yaml_sep}
{% raw %}
#{post.body}
{% endraw %}
    EOS

    File.write(File.join(dir, filename), body)
  end
end

Lesstile is a wrapper around Textile that provides some extra functionality, so a custom converter is also needed. Put the following in _plugins/lesstile.rb (with associated additions to your Gemfile):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
require 'lesstile'
require 'coderay'
require 'RedCloth'

module Jekyll
  class LesstileConverter < Converter
    safe true
    priority :low

    def matches(ext)
      ext =~ /^\.lesstile$/i
    end

    def output_ext(ext)
      ".html"
    end

    def convert(content)
      Lesstile.format_as_xhtml(
        content,
        :text_formatter => lambda {|text|
          RedCloth.new(CGI::unescapeHTML(text)).to_html
        },
        :code_formatter => Lesstile::CodeRayFormatter
      )
    end
  end
end

The permalink configuration option needs to be set to match existing URLs, and to create the tag pages, use the jekyll-archives plugin.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
permalink: "/:year/:month/:day/:title/"

assets:
  digest: true

"jekyll-archives":
  enabled:
    - tags
  layout: 'tag'
  permalinks:
    tag: '/:name/'

gems:
  - jekyll-feed
  - jekyll-assets
  - jekyll-archives

For the archives page, use an empty archives.md in the root directory with a custom layout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{% include head.html %}
{% assign last_month = nil %}
<ul>
{% for post in site.posts %}
  {% assign current_month = post.date | date: '%B %Y' %}
  {% if current_month != last_month %}
    </ul>
    <h3>{{ current_month }}</h3>
    <ul>
  {% endif %}

  <li>
    <a href="{{ post.url }}">{{ post.title }}</a>

    {% if post.tags != empty %}
    ({% for tag in post.tags %}<a href='/{{ tag }}'>{{ tag }}</a>{% if forloop.last %}{% else %}, {% endif %}{% endfor %})
    {% endif %}
  </li>

  {% assign last_month = current_month %}
{% endfor %}
</ul>
{% include footer.html %}

For a full example, including a recommended set of layouts and includes, see the new sources for this site.

New non-tech blog

Have been writing a bit recently, just not here. More politics and book reviews. I’ve separated it out over on Github pages.

  • Posted on October 12, 2014

Dropwizard logger for Ruby and WEBrick

Wouldn’t it be great if instead of webrick logs looking like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> ruby server.rb
[2014-08-17 15:29:10] INFO  WEBrick 1.3.1
[2014-08-17 15:29:10] INFO  ruby 2.1.1 (2014-02-24) [x86_64-darwin13.0]
[2014-08-17 15:29:10] INFO  WEBrick::HTTPServer#start: pid=17304 port=8000
D, [2014-08-17T15:29:11.452223 #17304] DEBUG -- : hello from in the request
localhost - - [17/Aug/2014:15:29:11 PDT] "GET / HTTP/1.1" 200 13
- -> /
E, [2014-08-17T15:29:12.787505 #17304] ERROR -- : fail (RuntimeError)
server.rb:57:in `block in <main>'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `call'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `do_GET'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in `service'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:138:in `service'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:94:in `run'
/Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/server.rb:295:in `block in start_thread'
localhost - - [17/Aug/2014:15:29:12 PDT] "GET /fail HTTP/1.1" 500 6
- -> /fail

They looked like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> ruby server.rb

   ,~~.,''"'`'.~~.
  : {` .- _ -. '} ;
   `:   O(_)O   ;'
    ';  ._|_,  ;`   i am starting the server
     '`-.\_/,.'`

INFO  [2014-08-17 22:28:13,186] webrick: WEBrick 1.3.1
INFO  [2014-08-17 22:28:13,186] webrick: ruby 2.1.1 (2014-02-24) [x86_64-darwin13.0]
INFO  [2014-08-17 22:28:13,187] webrick: WEBrick::HTTPServer#start: pid=17253 port=8000
DEBUG [2014-08-17 22:28:14,738] app: hello from in the request
INFO  [2014-08-17 15:28:14,736] webrick: GET / 200
ERROR [2014-08-17 22:28:15,603] app: RuntimeError: fail
! server.rb:57:in `block in <main>'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `call'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/prochandler.rb:38:in `do_GET'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in `service'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:138:in `service'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/httpserver.rb:94:in `run'
! /Users/xavier/.rubies/cruby-2.1.1/lib/ruby/2.1.0/webrick/server.rb:295:in `block in start_thread'
INFO  [2014-08-17 15:28:15,602] webrick: GET /fail 500

I thought so, hence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
require 'webrick'
require 'logger'

puts <<-BANNER

   ,~~.,''"'`'.~~.
  : {` .- _ -. '} ;
   `:   O(_)O   ;'
    ';  ._|_,  ;`   i am starting the server
     '`-.\\_/,.'`

BANNER

class DropwizardLogger < Logger
  def initialize(label, *args)
    super(*args)
    @label = label
  end

  def format_message(severity, timestamp, progname, msg)
    "%-5s [%s] %s: %s\n" % [
      severity,
      timestamp.utc.strftime("%Y-%m-%d %H:%M:%S,%3N"),
      @label,
      msg2str(msg),
    ]
  end

  def msg2str(msg)
    case msg
    when String
      msg
    when Exception
      ("%s: %s" % [msg.class, msg.message]) +
        (msg.backtrace ? msg.backtrace.map {|x| "\n! #{x}" }.join : "")
    else
      msg.inspect
    end
  end

  def self.webrick_format(label)
    "INFO  [%{%Y-%m-%d %H:%M:%S,%3N}t] #{label}: %m %U %s"
  end
end

server = WEBrick::HTTPServer.new \
  :Port      => 8000,
  :Logger    => DropwizardLogger.new("webrick", $stdout).tap {|x|
                  x.level = Logger::INFO
                },
  :AccessLog => [[$stdout, DropwizardLogger.webrick_format("webrick")]]

$logger = DropwizardLogger.new("app", $stdout)

server.mount_proc '/fail' do |req, res|
  begin
    raise 'fail'
  rescue => e
    $logger.error(e)
  end
  res.body = "failed"
  res.status = 500
end

server.mount_proc '/' do |req, res|
  $logger.debug("hello from in the request")
  res.body = 'Hello, world!'
end

trap 'INT' do
  server.shutdown
end

server.start

Querying consul with range

Disclaimer: this has not been tried in a production environment. It is a weekend hack.

Consul is a highly available, datacenter aware, service discovery mechanism. Range is a query language for selecting information out of arbitrary, self-referential metadata. I combined the two!

Start by firing up a two node consul cluster, per the getting started guide. On the master node, grab the consul branch of grange-server and run it with the following config:

1
2
3
[rangeserver]
loglevel=DEBUG
consul=true

(It could run against any consul agent, but it’s easier to demo on the master node.)

Querying range, we already see a consul cluster, cluster. This is a default service containing the consul servers.

1
2
3
4
5
> export RANGE_HOST=172.20.20.10
> erg "allclusters()"
consul
> erg "%consul"
agent-one

Add a new service to the agents, and it shows up in range!

1
2
3
4
5
6
7
8
9
10
11
n2> curl -v -X PUT --data '{"name": "web", "port": 80}' http://localhost:8500/v1/agent/service/register

> erg "allclusters()"
consul,web
> erg "%web"
agent-two

n1> curl -v -X PUT --data '{"name": "web", "port": 80}' http://localhost:8500/v1/agent/service/register

> erg "%web"
agent-one,agent-two

Though eventually consistent, range is a big improvement over the consul HTTP API for quick ad-hoc queries against your production layout, particularly when combined with other metadata. How many nodes are running redis? What services are running on a particular rack?

This is just a proof of concept for now, but I’m excited about the potential. To be useable it needs to be tested against production sized clusters, better handling of error conditions, and some code review (in particular around handling cluster state changes).

Bash script to keep a git clone synced with a remote

Use the following under a process manager (such as runit) to keep a local git clone in sync with a remote, when a push based solution isn’t an option. Most other versions either neglect to verify remote is correct, or use git pull which can fail if someone has been monkeying with the local version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function update_git_repo() {
  GIT_DIR=$1
  GIT_REMOTE=$2
  GIT_BRANCH=${3:-master}

  if [ ! -d $GIT_DIR ]; then
    CURRENT_SHA=""
    git clone --depth 1 $GIT_REMOTE $GIT_DIR -b $GIT_BRANCH
  else
    CURRENT_REMOTE=$(cd $GIT_DIR && git config --get remote.origin.url || true)

    if [ "$GIT_REMOTE" == "$CURRENT_REMOTE" ]; then
      CURRENT_SHA=$(cat $GIT_DIR/.git/refs/heads/$GIT_BRANCH)
    else
      rm -Rf $GIT_DIR
      exit 0 # Process manager should restart this script
    fi
  fi

  cd $GIT_DIR && \
    git fetch && \
    git reset --hard origin/$GIT_BRANCH

  NEW_SHA=$(cat $GIT_DIR/.git/refs/heads/$GIT_BRANCH)
}

update_git_repo "/tmp/myrepo" "git://example.com/my/repo.git"

sleep 60 # No need for a tight loop

Ruby progress bar, no gems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def import(filename, out = $stdout, &block)
  # Yes, there are gems that do progress bars.
  # No, I'm not about to add another dependency for something this simple.
  width     = 50
  processed = 0
  printed   = 0
  total     = File.read(filename).lines.length.to_f
  label     = File.basename(filename, '.csv')

  out.print "%11s: |" % label

  CSV.foreach(filename, headers: true) do |row|
    yield row

    processed += 1
    wanted = (processed / total * width).to_i
    out.print "-" * (wanted - printed)
    printed = wanted
  end
  out.puts "|"
end
1
2
     file_1: |--------------------------------------------------|
     file_2: |--------------------------------------------------|
  • Posted on March 29, 2014
  • Tagged code, ruby

New in RSpec 3: Verifying Doubles

One of the features I am most excited about in RSpec 3 is the verifying double support1. Using traditional doubles has always made me uncomfortable, since it is really easy to accidentally mock or stub a method that does not exist. This leads to the awkward situation where a refactoring can leave your code broken but with green specs. For example, consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# double_demo.rb
class User < Struct.new(:notifier)
  def suspend!
    notifier.notify("suspended as")
  end
end

describe User, '#suspend!' do
  it 'notifies the console' do
    notifier = double("ConsoleNotifier")

    expect(notifier).to receive(:notify).with("suspended as")

    user = User.new(notifier)
    user.suspend!
  end
end

ConsoleNotifier is defined as:

1
2
3
4
5
6
# console_notifier.rb
class ConsoleNotifier
  def notify!(msg)
    puts msg
  end
end

Note that the method notify! does not match the notify method we are expecting! This is broken code, but the spec still passes:

1
2
3
4
5
> rspec -r./console_notifier double_demo.rb
.

Finished in 0.0006 seconds
1 example, 0 failures

Verifying doubles solve this issue.

Verifying doubles to the rescue

A verifying double provides guarantees about methods that are being expected, including whether they exist, whether the number of arguments is valid for that method, and whether they have the correct visibility. If we change double('ConsoleNotifier') to instance_double('ConsoleNotifier') in the previous spec, it will now ensure that any method we expect is a valid instance method of ConsoleNotifier. So the spec will now fail:

1
2
3
4
5
6
7
8
9
10
11
12
13
> rspec -r./console_notifier.rb double_demo.rb
F

Failures:

  1) User#suspend! notifies the console
     Failure/Error: expect(notifier).to receive(:notify).with("suspended as")
       ConsoleNotifier does not implement:
         notify
    # ... backtrace
         
Finished in 0.00046 seconds
1 example, 1 failure         

Other types of verifying doubles include class_double and object_double. You can read more about them in the documentation.

Isolation

Even though we have a failing spec, we now have to load our dependencies for the privilege. This is undesirable when those dependencies take a long time to load, such as the Rails framework. Verifying doubles provide a solution to this problem: if the dependent class does not exist, it simply operates as a normal double! This is often confusing to people, but understanding it is key to understanding the power of verifying doubles.

Running the spec that failed above without loading console_notifier.rb, it actually passes:

1
2
3
4
5
> rspec double_demo.rb
.

Finished in 0.0006 seconds
1 example, 0 failures

This is the killer feature of verifying doubles. You get both confidence that your specs are correct, and the speed of running them isolation. Typically I will develop a spec and class in isolation, then load up the entire environment for a full test run and in CI.

There are a number of other neat tricks you can do with verifying doubles, such as enabling them for partial doubles and replacing constants, all covered in the documentation.
There really isn’t a good reason to use normal doubles anymore. Install the RSpec 3 beta (via 2.99) to take them for a test drive!

1 This functionality has been available for a while now in rspec-fire. RSpec 3 fully replaces that library, and even adds some more features.

Ruby Style Guide

My coding style has evolved over time, and has always been something I kept in my head. This morning I tried to document it explicitly, so I can point offending pull requests at it. My personal Ruby Style Guide

What is it missing?

  • Posted on July 04, 2013
  • Tagged code, ruby

Writing About Code

I wrote some words about The Mathematical Syntax of Small-step Operational Semantics

It’s the latest in a sequence of experiments on techniques for presenting ideas and code, xspec being another that you may be interested in.

  • Posted on June 29, 2013
  • Tagged code, ruby

How I Test Rails Applications

The Rails conventions for testing provide three categories for your tests:

  • Unit. What you write to test your models.
  • Integration. Used to test the interaction among any number of controllers.
  • Functional. Testing the various actions of a single controller.

This tells you where to put your tests, but the type of testing you perform on each part of the system is the same: load fixtures into the database to get the app into the required state, run some part of the system either directly (models) or using provided harnesses (controllers), then verify the expected output.

This techinque is simple, but is only one of a number of ways of testing. As your application grows, you will need to add other approaches to your toolbelt to enable your test suite to continue providing valuable feedback not just on the correctness of your code, but its design as well.

I use a different set of categories for my tests (taken from the GOOS book):

  • Unit. Do our objects do the right thing, and are they convenient to work with?
  • Integration. Does our code work against code we can’t change?
  • Acceptance. Does the whole system work?

Note that these definitions of unit and integration are radically different to how Rails defines them. That is unfortunate, but these definitions are more commonly accepted across other languages and frameworks and I prefer to use them since it facilitates an exchange of information across them. All of the typical Rails tests fall under the “integration” label, leaving two new levels of testing to talk about: unit and acceptance.

Unit Tests

“A test is not a unit test if it talks to the database, communicates across a network, or touches the file system.” – Working with Legacy Code, p. 14

This type of test is typically referred to in the Rails community as a “fast unit test”, which is unfortunate since speed is far from the primary benefit. The primary benefit of unit testing is the feedback it provides on the dependencies in your design. “Design unit tests” would be a better label.

This feedback is absolutely critical in any non-trivial application. Unchecked dependency is crippling, and Rails encourages you not to think about it (most obviously by implicitly autoloading everything).

By unit testing a class you are forced to think about how it interacts with other classes, which leads to simpler dependency trees and simpler programs.

Unit tests tend to (though don’t always have to) make use of mocking to verify interactions between classes. Using rspec-fire is absolutely critical when doing this. It verifies your mocks represent actual objects with no extra effort required in your tests, bridging the gap to statically-typed mocks in languages like Java.

As a guideline, a single unit test shouldn’t take more than 1ms to run.

Acceptance Tests

A Rails integration test doesn’t exercise the entire system, since it uses a harness and doesn’t use the system from the perspective of a user. As one example, you need to post form parameters directly rather than actually filling out the form, making the test both brittle in that if you change your HTML form the test will still pass, and incomplete in that it doesn’t actually load the page up in a browser and verify that Javascript and CSS are not intefering with the submission of the form.

Full system testing was popularized by the cucumber library, but cucumber adds a level of indirection that isn’t useful for most applications. Unless you are actually collaborating with non-technical stakeholders, the extra complexity just gets in your way. RSpec can easily be written in a BDD style without extra libraries.

Theoretically you should only be interacting with the system as a black box, which means no creating fixture data or otherwise messing with the internals of the system in order to set it up correctly. In practice, this tends to be unweildy but I still maintain a strict abstraction so that tests read like black box tests, hiding any internal modification behind an interface that could be implemented by black box interactions, but is “optimized” to use internal knowledge. I’ve had success with the builder pattern, also presented in the GOOS book, but that’s another blog post (i.e. build_registration.with_hosting_request.create).

A common anti-pattern is to try and use transactional fixtures in acceptance tests. Don’t do this. It isn’t executing the full system (so can’t test transaction level functionality) and is prone to flakiness.

An acceptance test will typically take seconds to run, and should only be used for happy-path verification of behaviour. It makes sure that all the pieces hang together correctly. Edge case testing should be done at the unit or integration level. Ideally each new feature should have only one or two acceptance tests.

File Organisation.

I use spec/{unit,integration,acceptance} folders as the parent of all specs. Each type of spec has it’s own helper require, so unit specs require unit_helper rather than spec_helper. Each of those helpers will then require other helpers as appropriate, for instance my rails_helper looks like this (note the hack required to support this layout):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)

# By default, rspec/rails tags all specs in spec/integration as request specs,
# which is not what we want. There does not appear to be a way to disable this
# behaviour, so below is a copy of rspec/rails.rb with this default behaviour
# commented out.
require 'rspec/core'

RSpec::configure do |c|
  c.backtrace_clean_patterns << /vendor\//
  c.backtrace_clean_patterns << /lib\/rspec\/rails/
end

require 'rspec/rails/extensions'
require 'rspec/rails/view_rendering'
require 'rspec/rails/adapters'
require 'rspec/rails/matchers'
require 'rspec/rails/fixture_support'
require 'rspec/rails/mocks'
require 'rspec/rails/module_inclusion'
# require 'rspec/rails/example' # Commented this out
require 'rspec/rails/vendor/capybara'
require 'rspec/rails/vendor/webrat'

# Added the below, we still want access to some of the example groups
require 'rspec/rails/example/rails_example_group'
require 'rspec/rails/example/controller_example_group'
require 'rspec/rails/example/helper_example_group'

Controllers specs go in spec/integration/controllers, though I’m trending towards using poniard that allows me to test controllers in isolation (spec/unit/controllers).

Helpers are either unit or integration tested depending on the type of work they are doing. If it is domain level logic it can be unit tested (though I tend to use presenters for this, which are also unit tested), but for helpers that layer on top of Rails provided helpers (like link_to or content_tag) they should be integration tested to verify they are using the library in the correct way.

I have used this approach on a number of Rails applications over the last 1-2 years and found it leads to better and more enjoyable code.

Blocking (synchronous) calls in Goliath

Posting for my future self. A generic function to run blocking code in a deferred thread and resume the fiber on completion, so as not to block the reactor loop.

1
2
3
4
5
6
7
8
9
10
def blocking(&f)
  fiber = Fiber.current
  result = nil
  EM.defer(f, ->(x){
    result = x
    fiber.resume
  })
  Fiber.yield
  result
end

Usage

1
2
3
4
5
6
class MyServer < Goliath::API
  def response(env)
    blocking { sleep 1 }
    [200, {}, 'Woken up']
  end
end
A pretty flower Another pretty flower