Length of string in Perl independent of character encoding -


the length function assumes chinese characters more 1 character. how determine length of string in perl independent of character encoding (treat chinese characters 1 character)?

the length function operates on characters, not octets (aka bytes). definition of character depends on encoding. chinese characters still single characters (if encoding correctly set!) take more 1 octet of space. so, length of string in perl dependent on character encoding perl thinks string in; string length independent of character encoding simple byte length.

make sure string in question flagged utf-8 , encoded in utf-8. example, yields 3:

$ perl -e 'print length("长")' 

whereas yields 1:

$ perl -e 'use utf8; print length("长")' 

as does:

$ perl -e 'use encode; print length(encode::decode("utf-8", "长"))' 

if you're getting chinese characters file, make sure binmode $fh, ':utf8' file before reading or writing it; if you're getting data database, make sure database returning strings in utf-8 format (or use encode you).

i don't think have have in utf-8, need ensure string flagged having correct encoding. i'd go utf-8 front (and sideways) though that's lingua franca unicode , make things easier if use everywhere.

you might want spend time reading perlunicode man page if you're going dealing non-ascii data.


Comments

Popular posts from this blog

c# - How to set Z index when using WPF DrawingContext? -

razor - Is this a bug in WebMatrix PageData? -

visual c++ - Using relative values in array sorting ( asm ) -