Handling international and Unicode text correctly in modern programming languages remains a poorly understood topic.
Below are some slides I presented to the Splunk Seattle office regarding the bare minimum you need to know to avoid corrupting international text. Examples are primarily given in the Python language, since that is commonly used at Splunk.