Rake task to sync your assets to Amazon S3/Cloudfront

With my move to Heroku I felt bad about having Heroku’s app servers serve static content for me. It’s not really a problem, but I just like to use the best tool available for the job.

Because Ariejan.net is a rack app, it has a public directory with all static assets in once place. There are, however, a few problems that need adressing. ~

These are the problems I want to resolve:

Keep my S3 Bucket in sync with my public directory

The first and foremost is to keep my S3 bucket in sync with the content of public. I don’t care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3.

Don’t re-upload the entire public directory with every deployment

Over time the size of public has grown. New images are added all the time. I don’t want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files.

Hook the S3 sync into my current deployment rake task

My current rake deploy task should be able to call assets:deploy or something to trigger an asset sync.

Minimal configuration

I don’t want to configure anything, if possible.

The script

Well, this is the rake task I currently use:

 1require 's3'
 2require 'digest/md5'
 3require 'mime/types'
 4
 5## These are some constants to keep track of my S3 credentials and
 6## bucket name. Nothing fancy here.
 7
 8AWS_ACCESS_KEY_ID = "xxxxx"
 9AWS_SECRET_ACCESS_KEY = "yyyyy"
10AWS_BUCKET = "my_bucket"
11
12
13## This defines the rake task `assets:deploy`.
14namespace :assets do
15  desc "Deploy all assets in public/**/* to S3/Cloudfront"
16  task :deploy, :env, :branch do |t, args|
17
18## Minify all CSS files
19    Rake::Task[:minify].execute
20
21## Use the `s3` gem to connect my bucket
22    puts "== Uploading assets to S3/Cloudfront"
23
24    service = S3::Service.new(
25      :access_key_id => AWS_ACCESS_KEY_ID,
26      :secret_access_key => AWS_SECRET_ACCESS_KEY)
27    bucket = service.buckets.find(AWS_BUCKET)
28
29## Needed to show progress
30    STDOUT.sync = true
31
32## Find all files (recursively) in ./public and process them.
33    Dir.glob("public/**/*").each do |file|
34
35## Only upload files, we're not interested in directories
36      if File.file?(file)
37
38## Slash 'public/' from the filename for use on S3
39        remote_file = file.gsub("public/", "")
40
41## Try to find the remote_file, an error is thrown when no
42## such file can be found, that's okay.
43        begin
44          obj = bucket.objects.find_first(remote_file)
45        rescue
46          obj = nil
47        end
48
49## If the object does not exist, or if the MD5 Hash / etag of the
50## file has changed, upload it.
51        if !obj || (obj.etag != Digest::MD5.hexdigest(File.read(file)))
52            print "U"
53
54## Simply create a new object, write the content and set the proper
55## mime-type. `obj.save` will upload and store the file to S3.
56            obj = bucket.objects.build(remote_file)
57            obj.content = open(file)
58            obj.content_type = MIME::Types.type_for(file).to_s
59            obj.save
60        else
61          print "."
62        end
63      end
64    end
65    STDOUT.sync = false # Done with progress output.
66
67    puts
68    puts "== Done syncing assets"
69  end
70end

This rake task is hooked into my rake deploy:production script and generates the following output (I added a new file just to show you what happens.)

 1    $ rake deploy:production
 2    (in /Users/ariejan/Code/Sites/ariejannet)
 3    Deploying master to production
 4    == Minifying CSS
 5    == Done
 6    == Uploading assets to S3/Cloudfront
 7    ......................................U.........
 8    == Done syncing assets
 9
10    Updating ariejannet-production with branch master
11    Counting objects: 40, done.
12    Delta compression using up to 4 threads.
13    Compressing objects: 100% (27/27), done.
14    Writing objects: 100% (30/30), 4.24 KiB, done.
15    Total 30 (delta 17), reused 0 (delta 0)
16
17    -----> Heroku receiving push

Conclusion

It’s very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There’s no support for file deletions and error handling is very poor at this time. Also, public is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in public are not used (except robots.txt and favicon.ico)

Tags: amazon s3 cloudfront hosting cloud

Written by