Jump to content

Recommended Posts

Posted

Sure. Save the file as UTF-16.

Or you can do it like this:

int codePoint = 0x1D3C;
char[] charPair = Character.toChars(codePoint);
deg = new String(charPair);

 

Apparently I'm a complete and utter jerk and come to this forum just like to make fun of people, be confrontational, and make your personal life miserable.  If you think this is the case, JUST REPORT ME.  Otherwise you're just going to get reported when you reply to my posts and point it out, because odds are, I was trying to be nice.

 

Exception: If you do not understand Java, I WILL NOT HELP YOU and your thread will get locked.

 

DO NOT PM ME WITH PROBLEMS. No help will be given.

Posted (edited)
4 hours ago, Draco18s said:

Sure. Save the file as UTF-16.

Or you can do it like this:


int codePoint = 0x1D3C;
char[] charPair = Character.toChars(codePoint);
deg = new String(charPair);

 

String s = "\uDBFF\uDFFF"; describes a surrogate pair with unicodes max value of 1,114,111 which I have no clue how to handle for my library application as I have no way to identify if the char at index is attached to a surrogate pair.


From what I have seen the pair is two separate chars so I would need more then just the code you gave me to add support but, it's a start.

I do save my files in utf-8 the problem is parsing the strings since I expect charAt(index) to give me the full char and the char to be max value of 1,114,111 when really it's more legacy Unicode of 65,000 resulting in a very complex problem. I think it's a java issue since when unicode upgraded they should have just forced applications to use new chars. Yes they proboly would have to recompile them but, if they dealt with chars to begin with and/or stored them as when casting to int not short then they would have been fine. Especially since java doesn't support unsigned short which means short was out of bounds to begin with for casting char to a number value.

If you can help me identify how to detect a pair then I might just be able to easily fix my library in about 8 solid hours of coding and testing with chars out of range from normal multi language plain

Edited by jredfox
Posted

You need to handle each character pair as a separate object, then combine them later.

e.g.

deg = new String(charPair1) + new String(charPair2);

Apparently I'm a complete and utter jerk and come to this forum just like to make fun of people, be confrontational, and make your personal life miserable.  If you think this is the case, JUST REPORT ME.  Otherwise you're just going to get reported when you reply to my posts and point it out, because odds are, I was trying to be nice.

 

Exception: If you do not understand Java, I WILL NOT HELP YOU and your thread will get locked.

 

DO NOT PM ME WITH PROBLEMS. No help will be given.

Posted (edited)
8 minutes ago, Draco18s said:

You need to handle each character pair as a separate object, then combine them later.

e.g.

deg = new String(charPair1) + new String(charPair2);

but, my readers don't currently care what value the char is at it just makes the line into a string and when done returns an array of strings.
https://pastebin.com/T3H1wgJH

From there my config object says ok this is a string that's not a comment parse it into a line object. That's where the trouble begins because everywhere almost I use charAt(index) from the string to filter, examine and parse the string into a specified ILine for example my LineBase.
https://github.com/jredfox/evilnotchlib/blob/master/src/main/java/com/EvilNotch/lib/util/Line/LineBase.java

I need to identify and then handle the pair. do the pairs only show up if it's outside the default plain in unicode (greater then 16 bit)? or are they always pairs what's going on?

I also need to make the equivalent methods for string.trim() since it appears it doesn't work with chars outside of standard range

Edited by jredfox
Posted (edited)

Well luckfully the things I was comparing in the string filters were only chars to begin with so I think my lib is good but, java should really increase the char value since it's already as an int anyways and ints are 1-4 bytes I believe so it would be better for readability as well as less indexes in the string saving memory. Especially since two indexes of two bytes = 4 bytes which means it would be the same amount as the int max value the only difference is readability memory and process less for manipulating strings and char data as well as less bytes on the char for higher unicode values.

Edited by jredfox
Posted (edited)
21 hours ago, diesieben07 said:

You are mixing very many things together here.

A Java char represents a UTF-16 surrogate. A unicode codepoint is 16 bits (and so is a Java char). You can't "increase" the value, this is how UTF-16 works.

 

Also, I don't know where you got the fact that UTF-8 does not support supplementary characters from, but that is bullshit. UTF-8 can encode all of unicode, just like UTF-16.

No I am really not mixing things up I am saying a char should represent a full character that's it's point but, it doesn't always you can check it and make mass work around but, the fact is java needs to do a full update since it's been 8+ years since their last full Unicode update with chars. Not talking about the crap of hey this char may or may not be a full char update.

Edited by jredfox

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Announcements



×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.