Encoding differences between tr (ubuntu) and tr (mac) -


echo – | tr ’ x 

on mac: correctly prints –

on ubuntu 10.10: prints xx�

why?

for reference, first character 'en dash' (u+2013), second 'right single quotation mark' (u+2019)

edit: use perl instead. magic incantation use is:

perl -csd -p -mutf8 -e 'tr/a-zÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÇa-zäëïöüáéíóúàèìòùâèìòùñçßœ/ /cs' 

it means: except letters think of, replace character space, , squeeze spaces.

gnu coreutils tr has issues multi-byte characters - considers them sequence of single byte characters instead.

taking account, see on linux "expected":

$ echo – | od -t x1 0000000 e2 80 93 0a 0000004 $ echo ’ | od -t x1 0000000 e2 80 99 0a 0000004 $ echo – | tr ’ x | od -t x1 0000000 58 58 93 0a 0000004 

edit:

for set of utf-8 compatible utilities, might want have @ heirloom toolchest. contains tr implementation apparently has utf-8 support, although have not tested it.


Comments

Popular posts from this blog

c# - How to set Z index when using WPF DrawingContext? -

razor - Is this a bug in WebMatrix PageData? -

visual c++ - Using relative values in array sorting ( asm ) -