Difference between revisions of "Unicode"

From ThorstensHome
Jump to: navigation, search
Line 1: Line 1:
 
Programming [[html2mediawiki]] showed some severe problems if you are using sites that contain umlauts like ä or ö. So I [http://wiki.linuxquestions.org/wiki/UniCode deep-dived into unicode] programming and want you to be able to use my findings.
 
Programming [[html2mediawiki]] showed some severe problems if you are using sites that contain umlauts like ä or ö. So I [http://wiki.linuxquestions.org/wiki/UniCode deep-dived into unicode] programming and want you to be able to use my findings.
  
 +
Clearly, [http://www.joelonsoftware.com/articles/Unicode.html every text file has an encoding], that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. [http://en.wikipedia.org/wiki/Unicode Unicode] defines every character in the world.
 +
 +
Here is some practice: Install the text editor yudit and store a file containing
 +
hellö world
 +
in file.txt.
  
 
     QFile inputfile(args->url(0).fileName());
 
     QFile inputfile(args->url(0).fileName());

Revision as of 11:19, 26 December 2008

Programming html2mediawiki showed some severe problems if you are using sites that contain umlauts like ä or ö. So I deep-dived into unicode programming and want you to be able to use my findings.

Clearly, every text file has an encoding, that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. Unicode defines every character in the world.

Here is some practice: Install the text editor yudit and store a file containing

hellö world

in file.txt.

   QFile inputfile(args->url(0).fileName());
   inputfile.open(QIODevice::ReadOnly);
   inputfilecontent = inputfile.read(inputfile.bytesAvailable());
   kDebug() << "inputfilecontent.data()[0]"<<(byte)inputfilecontent.data()[0];
   kDebug() << "inputfilecontent.data()[1]"<<(byte)inputfilecontent.data()[1];

ü UTF8 encoded delivers

195 
188