到底什么是BOM?

到底什么是BOM?

摘要

到底什么是BOM?

什么是BOM?

What Is BOM (Byte Order Mark)? BOM is the informal name of the special Unicode character U+FEFF “ZERO WIDTH NO-BREAK SPACE”, when it is used to prepend to a stream of Unicode characters as a “signature”. This signature tells the receiver of this stream to be ready to process Unicode characters and pay attention to the serialization order of the encoding octets.

When this BOM character, U+FEFF, is serialized in UTF-8 encoding, it becomes an octet sequence of EF BB BF (\xEFBBBF).

如何写入BOM

BufferedWriter w = new BufferedWriter(new FileWriter(targetfile));
w.write('\ufeff');
w.close();

如何去除BOM

private static final byte[] UTF_BOM = new byte[]{(byte) 0xEF,(byte) 0xBB,(byte) 0xBF};

/**
 * 判断并移除UTF-8的BOM头
*/
public static InputStream utf8filte(InputStream in) {
    try {
        PushbackInputStream pis = new PushbackInputStream(in,3);
        byte[] header = new byte[3];
        pis.read(header,0,3);
        if(header[0] != UTF_BOM[0] || header[1] != UTF_BOM[1] || header[2] != UTF_BOM[2]) {
            pis.unread(header,0,3);
        }
        return pis;
    } catch (IOException e) {
        throw Lang.wrapThrow(e);
    }
}