Does Mc Support Supplementary Characters?

jredfox · April 26, 2018

I noticed utf-8 everywhere but, utf-8 doesn't support supplementary characters which means it isn't supporting all asian characters and other characters. Does mc java edition support it though? If so I should have to upgrade my line library.

https://wiki.sei.cmu.edu/confluence/display/java/STR01-J.+Do+not+assume+that+a+Java+char+fully+represents+a+Unicode+code+point

Edited April 26, 2018 by jredfox

Draco18s · April 26, 2018

Sure. Save the file as UTF-16.

Or you can do it like this:

int codePoint = 0x1D3C;
char[] charPair = Character.toChars(codePoint);
deg = new String(charPair);

jredfox · April 27, 2018

4 hours ago, Draco18s said:
Sure. Save the file as UTF-16.

Or you can do it like this:
int codePoint = 0x1D3C;
char[] charPair = Character.toChars(codePoint);
deg = new String(charPair);

String s = "\uDBFF\uDFFF"; describes a surrogate pair with unicodes max value of 1,114,111 which I have no clue how to handle for my library application as I have no way to identify if the char at index is attached to a surrogate pair.

From what I have seen the pair is two separate chars so I would need more then just the code you gave me to add support but, it's a start.

I do save my files in utf-8 the problem is parsing the strings since I expect charAt(index) to give me the full char and the char to be max value of 1,114,111 when really it's more legacy Unicode of 65,000 resulting in a very complex problem. I think it's a java issue since when unicode upgraded they should have just forced applications to use new chars. Yes they proboly would have to recompile them but, if they dealt with chars to begin with and/or stored them as when casting to int not short then they would have been fine. Especially since java doesn't support unsigned short which means short was out of bounds to begin with for casting char to a number value.

If you can help me identify how to detect a pair then I might just be able to easily fix my library in about 8 solid hours of coding and testing with chars out of range from normal multi language plain

Edited April 27, 2018 by jredfox

Draco18s · April 27, 2018

You need to handle each character pair as a separate object, then combine them later.

e.g.

deg = new String(charPair1) + new String(charPair2);

jredfox · April 27, 2018

8 minutes ago, Draco18s said:

You need to handle each character pair as a separate object, then combine them later.

e.g.

deg = new String(charPair1) + new String(charPair2);

but, my readers don't currently care what value the char is at it just makes the line into a string and when done returns an array of strings.
https://pastebin.com/T3H1wgJH

From there my config object says ok this is a string that's not a comment parse it into a line object. That's where the trouble begins because everywhere almost I use charAt(index) from the string to filter, examine and parse the string into a specified ILine for example my LineBase.
https://github.com/jredfox/evilnotchlib/blob/master/src/main/java/com/EvilNotch/lib/util/Line/LineBase.java

I need to identify and then handle the pair. do the pairs only show up if it's outside the default plain in unicode (greater then 16 bit)? or are they always pairs what's going on?

I also need to make the equivalent methods for string.trim() since it appears it doesn't work with chars outside of standard range

Edited April 27, 2018 by jredfox

jredfox · April 27, 2018

Well luckfully the things I was comparing in the string filters were only chars to begin with so I think my lib is good but, java should really increase the char value since it's already as an int anyways and ints are 1-4 bytes I believe so it would be better for readability as well as less indexes in the string saving memory. Especially since two indexes of two bytes = 4 bytes which means it would be the same amount as the int max value the only difference is readability memory and process less for manipulating strings and char data as well as less bytes on the char for higher unicode values.

Edited April 27, 2018 by jredfox

jredfox · April 28, 2018

21 hours ago, diesieben07 said:

You are mixing very many things together here.

A Java char represents a UTF-16 surrogate. A unicode codepoint is 16 bits (and so is a Java char). You can't "increase" the value, this is how UTF-16 works.

Also, I don't know where you got the fact that UTF-8 does not support supplementary characters from, but that is bullshit. UTF-8 can encode all of unicode, just like UTF-16.

No I am really not mixing things up I am saying a char should represent a full character that's it's point but, it doesn't always you can check it and make mass work around but, the fact is java needs to do a full update since it's been 8+ years since their last full Unicode update with chars. Not talking about the crap of hey this char may or may not be a full char update.

Edited April 28, 2018 by jredfox

Sign In

Does Mc Support Supplementary Characters?

Recommended Posts

jredfox

Link to comment

Share on other sites

Draco18s

Link to comment

Share on other sites

jredfox

Link to comment

Share on other sites

Draco18s

Link to comment

Share on other sites

jredfox

Link to comment

Share on other sites

jredfox

Link to comment

Share on other sites

jredfox

Link to comment

Share on other sites

Join the conversation

Announcements

Recently Browsing

Posts

Topics

Who's Online (See full list)

Browse

Activity

Important Information