ariejan de vroom

Rake task to sync your assets to Amazon S3/Cloudfront

1 January 2011

With my move to Heroku I felt bad about having Heroku’s app servers serve static content for me. It’s not really a problem, but I just like to use the best tool available for the job.

Because is a rack app, it has a public directory with all static assets in once place. There are, however, a few problems that need adressing. ~

These are the problems I want to resolve:

Keep my S3 Bucket in sync with my public directory

The first and foremost is to keep my S3 bucket in sync with the content of public. I don’t care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3.

Don’t re-upload the entire public directory with every deployment

Over time the size of public has grown. New images are added all the time. I don’t want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files.

Hook the S3 sync into my current deployment rake task

My current rake deploy task should be able to call assets:deploy or something to trigger an asset sync.

Minimal configuration

I don’t want to configure anything, if possible.

The script

Well, this is the rake task I currently use:

require 's3'
require 'digest/md5'
require 'mime/types'

## These are some constants to keep track of my S3 credentials and
## bucket name. Nothing fancy here.

AWS_BUCKET = "my_bucket"

## This defines the rake task `assets:deploy`.
namespace :assets do
  desc "Deploy all assets in public/**/* to S3/Cloudfront"
  task :deploy, :env, :branch do |t, args|

## Minify all CSS files

## Use the `s3` gem to connect my bucket
    puts "== Uploading assets to S3/Cloudfront"

    service =
      :access_key_id => AWS_ACCESS_KEY_ID,
      :secret_access_key => AWS_SECRET_ACCESS_KEY)
    bucket = service.buckets.find(AWS_BUCKET)

## Needed to show progress
    STDOUT.sync = true

## Find all files (recursively) in ./public and process them.
    Dir.glob("public/**/*").each do |file|

## Only upload files, we're not interested in directories
      if File.file?(file)

## Slash 'public/' from the filename for use on S3
        remote_file = file.gsub("public/", "")

## Try to find the remote_file, an error is thrown when no
## such file can be found, that's okay.
          obj = bucket.objects.find_first(remote_file)
          obj = nil

## If the object does not exist, or if the MD5 Hash / etag of the
## file has changed, upload it.
        if !obj || (obj.etag != Digest::MD5.hexdigest(
            print "U"

## Simply create a new object, write the content and set the proper
## mime-type. `` will upload and store the file to S3.
            obj =
            obj.content = open(file)
            obj.content_type = MIME::Types.type_for(file).to_s
          print "."
    STDOUT.sync = false # Done with progress output.

    puts "== Done syncing assets"

This rake task is hooked into my rake deploy:production script and generates the following output (I added a new file just to show you what happens.)

$ rake deploy:production
(in /Users/ariejan/Code/Sites/ariejannet)
Deploying master to production
== Minifying CSS
== Done
== Uploading assets to S3/Cloudfront
== Done syncing assets

Updating ariejannet-production with branch master
Counting objects: 40, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (30/30), 4.24 KiB, done.
Total 30 (delta 17), reused 0 (delta 0)

-----> Heroku receiving push


It’s very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There’s no support for file deletions and error handling is very poor at this time. Also, public is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in public are not used (except robots.txt and favicon.ico)