devroom.io/content/posts/2007-08-21-using-iconv-to-convert-utf-8-to-ascii-on-linux.md

+++
date = "2007-08-21"
title = "Using Iconv to convert UTF-8 to ASCII (on Linux)"
tags = ["General", "RubyOnRails", "Features", "Ruby"]
slug = "using-iconv-to-convert-utf-8-to-ascii-on-linux"
+++

There are situations where you want to remove all the UTF-8 goodness from a string (mostly because of legacy systems you're working with). Now, this is rather easy to do. I'll give you an example:
<pre>çéß</pre>
Should be converted to
<pre>cess</pre>
On my mac, I can simply use the following snippet to convert the string:
<pre lang="ruby">s = "çéß"
s = Iconv.iconv('ascii//translit', 'utf-8', s).to_s # returns "c'ess"
s.gsub(/\W/, '') # return "cess"</pre>
Very nice and all, but when I deploy to my Debian 4.0 linux system, the I get an error that tells me that invalid characters were present. Why? Because the Mac has unicode goodness built-in. Linux does not (in most cases).

So, how do you go about solving this? Easy! Get unicode support!
<pre>sudo apt-get install unicode</pre>
Now, try again.

<strong>Bonus</strong>

If you want to convert a sentence (or anything else with spaces in it), you'll notice that spaces are removed by the gsub command. I solve this by splitting up the string first into words. Convert the words and then joining the words together again.
<pre lang="ruby">words = s.split(" ")
words = words.collect do |word|
    word = Iconv.iconv('ascii//translit', 'utf-8', word).to_s
    word = word.gsub(/\W/,'')
end
words.join(" ")</pre>
Like this? Why not write a mix-in for String?
Add migrated Hugo site 2015-03-26 11:28:08 +00:00			`+++`
			`date = "2007-08-21"`
			`title = "Using Iconv to convert UTF-8 to ASCII (on Linux)"`
			`tags = ["General", "RubyOnRails", "Features", "Ruby"]`
			`slug = "using-iconv-to-convert-utf-8-to-ascii-on-linux"`
			`+++`

			`There are situations where you want to remove all the UTF-8 goodness from a string (mostly because of legacy systems you're working with). Now, this is rather easy to do. I'll give you an example:`
			`<pre>çéß</pre>`
			`Should be converted to`
			`<pre>cess</pre>`
			`On my mac, I can simply use the following snippet to convert the string:`
			`<pre lang="ruby">s = "çéß"`
			`s = Iconv.iconv('ascii//translit', 'utf-8', s).to_s # returns "c'ess"`
			`s.gsub(/\W/, '') # return "cess"</pre>`
			`Very nice and all, but when I deploy to my Debian 4.0 linux system, the I get an error that tells me that invalid characters were present. Why? Because the Mac has unicode goodness built-in. Linux does not (in most cases).`

			`So, how do you go about solving this? Easy! Get unicode support!`
			`<pre>sudo apt-get install unicode</pre>`
			`Now, try again.`

			`<strong>Bonus</strong>`

			`If you want to convert a sentence (or anything else with spaces in it), you'll notice that spaces are removed by the gsub command. I solve this by splitting up the string first into words. Convert the words and then joining the words together again.`
			`<pre lang="ruby">words = s.split(" ")`
			`words = words.collect do \|word\|`
			`word = Iconv.iconv('ascii//translit', 'utf-8', word).to_s`
			`word = word.gsub(/\W/,'')`
			`end`
			`words.join(" ")</pre>`
			`Like this? Why not write a mix-in for String?`