36 lines
1.5 KiB
Markdown
36 lines
1.5 KiB
Markdown
---
|
|
title: "Using Iconv to convert UTF-8 to ASCII (on Linux)"
|
|
kind: article
|
|
slug: using-iconv-to-convert-utf-8-to-ascii-on-linux
|
|
created_at: 2007-08-21
|
|
tags:
|
|
- General
|
|
- RubyOnRails
|
|
- Features
|
|
- Ruby
|
|
---
|
|
|
|
There are situations where you want to remove all the UTF-8 goodness from a string (mostly because of legacy systems you're working with). Now, this is rather easy to do. I'll give you an example:
|
|
<pre>çéß</pre>
|
|
Should be converted to
|
|
<pre>cess</pre>
|
|
On my mac, I can simply use the following snippet to convert the string:
|
|
<pre lang="ruby">s = "çéß"
|
|
s = Iconv.iconv('ascii//translit', 'utf-8', s).to_s # returns "c'ess"
|
|
s.gsub(/\W/, '') # return "cess"</pre>
|
|
Very nice and all, but when I deploy to my Debian 4.0 linux system, the I get an error that tells me that invalid characters were present. Why? Because the Mac has unicode goodness built-in. Linux does not (in most cases).
|
|
|
|
So, how do you go about solving this? Easy! Get unicode support!
|
|
<pre>sudo apt-get install unicode</pre>
|
|
Now, try again.
|
|
|
|
<strong>Bonus</strong>
|
|
|
|
If you want to convert a sentence (or anything else with spaces in it), you'll notice that spaces are removed by the gsub command. I solve this by splitting up the string first into words. Convert the words and then joining the words together again.
|
|
<pre lang="ruby">words = s.split(" ")
|
|
words = words.collect do |word|
|
|
word = Iconv.iconv('ascii//translit', 'utf-8', word).to_s
|
|
word = word.gsub(/\W/,'')
|
|
end
|
|
words.join(" ")</pre>
|
|
Like this? Why not write a mix-in for String? |