[an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive]
 
[an error occurred while processing this directive] [an error occurred while processing this directive]
Skåne Sjælland Linux User Group - http://www.sslug.dk Home   Subscribe   Mail Archive   Forum   Calendar   Search
MhonArc Date: [Date Prev] [Date Index] [Date Next]   Thread: [Date Prev] [Thread Index] [Date Next]   MhonArc
 

Re: [LOCALE] UTF-8 er noget værre slam



On Wed, Jul 06, 2005 at 08:09:00AM +0200, Lars Aronsson wrote:
> Keld Jørn Simonsen wrote:
> > Hvordan så med behandling af strenge, hvor der indgår strengkonstanter?
> > fx noget ala strcmp(str,"rødgrød")
> 
> Funktionen strcmp() beror enligt POSIX av locale, d.v.s. att 
> strcmp("ost", "öl") ger olika resultat enligt svensk och tysk 
> kollationsordning.  Sedan beror resultatet av vilket charset din 
> source code är skriven i och hur din kompilator hanterar detta och 
> hur detta förhåller sig till vald locale.
> 
> Världen är idag mycket mer komplicerad än på 1970-talet när C och 
> Unix skapades, och striden stod mellan ASCII och EBCDIC.  En 
> förenkling, som bland annat Wikipedia tillämpar, är att överallt 
> köra Unicode ("legacy free") och helt undvika blandningar med de 
> historiska ASCII och ISO 8859.  Idag verkar det som om den 
> vanligaste externa representationen av Unicode är UTF-8.

Ja, Lars, det er hvad vi har diskuteret her, og Jacob siger det er noget
skidt med utf-8. Anyway, mit spørgsmål var i generisk forstand. Jeg
kender kun lidt til java, men hvis javas interne strengbehandling er i
16-bit UCS-2 og lagring af tegnkonstanter er i utf-8, sker der så en
automatisk konvertering af tegnkonstanterne når de processers og skal
bruges sammen med de interne strenge? 

Du siger at utf-8 er den almindeligste form af Unicode, men både
Microsoft og Apple platforme bruger som regel noget 16 bits, utf-16
eller blot ucs-2 både internt men også som ekstern repræsentation i
filnavne og i filer. Og apple bruger normaliseringsform NFD -
decomposed. Så 16-bits repræsentation af Unicode er meget udbredt, givet
udbredelsen af MS og Apple platforme.

Hilsen
Keld


 
Home   Subscribe   Mail Archive   Index   Calendar   Search

 
 
Questions about the web-pages to <www_admin>. Last modified 2005-08-10, 20:55 CEST [an error occurred while processing this directive]
This page is maintained by [an error occurred while processing this directive]MHonArc [an error occurred while processing this directive] # [an error occurred while processing this directive] *