Use UTF-8 Without BOM for Java Source File


Today, I use Notepad++ to create a java source file, save it as UTF-8, use ANT to compile it and run it. But Ant failed with error:
SimpleClass.java:1: illegal character: \191
public class SimpleClass {
  ^
2 errors

The problem is that encoding UTF-8 is in fact UTF-8 with BOM for notepad++.
When javac or java read files encoded in UTF-8, Java assumes the UTF8 don't have a BOM so if the BOM is present it won't be discarded and it will be seen as data.

This is why it reports illegal character.

To fix this problem, just change the encoding to UTF-8 without BOM.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

So for Notepad++, by default UTF-8 should be UTF-8 without BOM, and add another encoding UTF-8 with BOM in case user needs it.
For us, don't use UTF-8 with BOM if possible.

Resource:
Handle UTF8 file with BOM

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)