<p>The Java programming language uses UTF-16. For convenience, JNI provides methods that work with "modified UTF-8" encoding
<p>The Java programming language uses UTF-16. For convenience, JNI provides methods that work with <a href="http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8">Modified UTF-8</a> as well. The
as well. (Some VMs use the modified UTF-8 internally to store strings; ours do not.) The
modified encoding is useful for C code because it encodes \u0000 as 0xc0 0x80 instead of 0x00.
modified encoding only supports the 8- and 16-bit forms, and stores ASCII NUL values in a 16-bit encoding.
The nice thing about this is that you can count on having C-style zero-terminated strings,
The nice thing about it is that you can count on having C-style zero-terminated strings,
suitable for use with standard libc string functions. The down side is that you cannot pass
suitable for use with standard libc string functions. The down side is that you cannot pass
arbitrary UTF-8 data into the VM and expect it to work correctly.</p>
arbitrary UTF-8 data into the VM and expect it to work correctly.</p>
@@ -235,11 +234,11 @@ are C-style pointers to primitive data rather than local references. They
are guaranteed valid until Release is called, which means they are not
are guaranteed valid until Release is called, which means they are not
released when the native method returns.</p>
released when the native method returns.</p>
<p><strong>Data passed to NewStringUTF must be in "modified" UTF-8 format</strong>. A
<p><strong>Data passed to NewStringUTF must be in Modified UTF-8 format</strong>. A
common mistake is reading character data from a file or network stream
common mistake is reading character data from a file or network stream
and handing it to <code>NewStringUTF</code> without filtering it.
and handing it to <code>NewStringUTF</code> without filtering it.
Unless you know the data is 7-bit ASCII, you need to strip out high-ASCII
Unless you know the data is 7-bit ASCII, you need to strip out high-ASCII
characters or convert them to proper "modified" UTF-8 form. If you don't,
characters or convert them to proper Modified UTF-8 form. If you don't,
the UTF-16 conversion will likely not be what you expect. The extended
the UTF-16 conversion will likely not be what you expect. The extended
JNI checks will scan strings and warn you about invalid data, but they
JNI checks will scan strings and warn you about invalid data, but they
won't catch everything.</p>
won't catch everything.</p>
@@ -321,10 +320,10 @@ and <code>GetStringChars</code> that may be very helpful when all you want
to do is copy data in or out. Consider the following:</p>
to do is copy data in or out. Consider the following:</p>
<pre>
<pre>
jbyte* data = env->GetByteArrayElements(array, NULL);
jbyte* data = env->GetByteArrayElements(array, NULL);