+++ date = "2007-08-21" title = "Using Iconv to convert UTF-8 to ASCII (on Linux)" tags = ["General", "RubyOnRails", "Features", "Ruby"] slug = "using-iconv-to-convert-utf-8-to-ascii-on-linux" description = "Text encoding is a mess. This will help you convert UTF-8 to ASCII on Linux using iconv." +++ There are situations where you want to remove all the UTF-8 goodness from a string (mostly because of legacy systems you're working with). Now, this is rather easy to do. I'll give you an example: `çéß` Should be converted to `cess`. On my mac, I can simply use the following snippet to convert the string: ``` ruby s = "çéß" s = Iconv.iconv('ascii//translit', 'utf-8', s).to_s # returns "c'ess" s.gsub(/\W/, '') # return "cess" ``` Very nice and all, but when I deploy to my Debian 4.0 linux system, the I get an error that tells me that invalid characters were present. Why? Because the Mac has unicode goodness built-in. Linux does not (in most cases). So, how do you go about solving this? Easy! Get unicode support! ``` shell sudo apt-get install unicode ``` Now, try again. ## Bonus If you want to convert a sentence (or anything else with spaces in it), you'll notice that spaces are removed by the gsub command. I solve this by splitting up the string first into words. Convert the words and then joining the words together again. ``` ruby words = s.split(" ") words = words.collect do |word| word = Iconv.iconv('ascii//translit', 'utf-8', word).to_s word = word.gsub(/\W/,'') end words.join(" ") ``` Like this? Why not write a mix-in for String?