+++ date = "2011-01-01" title = "Rake task to sync your assets to Amazon S3/Cloudfront" tags = ["amazon", "s3", "cloudfront", "hosting", "cloud"] slug = "rake-task-to-sync-your-assets-to-amazon-s3cloudfront" +++ With my move to Heroku I felt bad about having Heroku's app servers serve static content for me. It's not really a problem, but I just like to use the best tool available for the job. Because _Ariejan.net_ is a rack app, it has a `public` directory with all static assets in once place. There are, however, a few problems that need adressing. ~ These are the problems I want to resolve: #### Keep my S3 Bucket in sync with my public directory #### The first and foremost is to keep my S3 bucket in sync with the content of `public`. I don't care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3. #### Don't re-upload the entire public directory with every deployment #### Over time the size of `public` has grown. New images are added all the time. I don't want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files. #### Hook the S3 sync into my current deployment rake task #### My current rake deploy task should be able to call `assets:deploy` or something to trigger an asset sync. #### Minimal configuration #### I don't want to configure anything, if possible. ### The script ### Well, this is the rake task I currently use: ``` ruby require 's3' require 'digest/md5' require 'mime/types' ## These are some constants to keep track of my S3 credentials and ## bucket name. Nothing fancy here. AWS_ACCESS_KEY_ID = "xxxxx" AWS_SECRET_ACCESS_KEY = "yyyyy" AWS_BUCKET = "my_bucket" ## This defines the rake task `assets:deploy`. namespace :assets do desc "Deploy all assets in public/**/* to S3/Cloudfront" task :deploy, :env, :branch do |t, args| ## Minify all CSS files Rake::Task[:minify].execute ## Use the `s3` gem to connect my bucket puts "== Uploading assets to S3/Cloudfront" service = S3::Service.new( :access_key_id => AWS_ACCESS_KEY_ID, :secret_access_key => AWS_SECRET_ACCESS_KEY) bucket = service.buckets.find(AWS_BUCKET) ## Needed to show progress STDOUT.sync = true ## Find all files (recursively) in ./public and process them. Dir.glob("public/**/*").each do |file| ## Only upload files, we're not interested in directories if File.file?(file) ## Slash 'public/' from the filename for use on S3 remote_file = file.gsub("public/", "") ## Try to find the remote_file, an error is thrown when no ## such file can be found, that's okay. begin obj = bucket.objects.find_first(remote_file) rescue obj = nil end ## If the object does not exist, or if the MD5 Hash / etag of the ## file has changed, upload it. if !obj || (obj.etag != Digest::MD5.hexdigest(File.read(file))) print "U" ## Simply create a new object, write the content and set the proper ## mime-type. `obj.save` will upload and store the file to S3. obj = bucket.objects.build(remote_file) obj.content = open(file) obj.content_type = MIME::Types.type_for(file).to_s obj.save else print "." end end end STDOUT.sync = false # Done with progress output. puts puts "== Done syncing assets" end end ``` This rake task is hooked into my `rake deploy:production` script and generates the following output (I added a new file just to show you what happens.) ``` shell $ rake deploy:production (in /Users/ariejan/Code/Sites/ariejannet) Deploying master to production == Minifying CSS == Done == Uploading assets to S3/Cloudfront ......................................U......... == Done syncing assets Updating ariejannet-production with branch master Counting objects: 40, done. Delta compression using up to 4 threads. Compressing objects: 100% (27/27), done. Writing objects: 100% (30/30), 4.24 KiB, done. Total 30 (delta 17), reused 0 (delta 0) -----> Heroku receiving push ``` ### Conclusion ### It's very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There's no support for file deletions and error handling is very poor at this time. Also, `public` is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in `public` are not used (except `robots.txt` and `favicon.ico`)