Package org.mozilla.universalchardet
Class UniversalDetector
- java.lang.Object
-
- org.mozilla.universalchardet.UniversalDetector
-
public class UniversalDetector extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classUniversalDetector.InputState
-
Field Summary
Fields Modifier and Type Field Description private java.lang.StringdetectedCharsetprivate booleandoneprivate CharsetProberescCharsetProberprivate booleangotDataprivate UniversalDetector.InputStateinputStateprivate bytelastCharprivate CharsetListenerlistenerstatic floatMINIMUM_THRESHOLDprivate booleanonlyPrintableASCIIprivate CharsetProber[]probersstatic floatSHORTCUT_THRESHOLDprivate booleanstart
-
Constructor Summary
Constructors Constructor Description UniversalDetector()UniversalDetector(CharsetListener listener)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voiddataEnd()Marks end of data reading.static java.lang.StringdetectCharset(java.io.File file)Gets the charset of a File.static java.lang.StringdetectCharset(java.io.InputStream inputStream)Gets the charset of content from InputStream.static java.lang.StringdetectCharset(java.nio.file.Path path)Gets the charset of a Path.static java.lang.StringdetectCharsetFromBOM(byte[] buf)private static java.lang.StringdetectCharsetFromBOM(byte[] buf, int offset)java.lang.StringgetDetectedCharset()CharsetListenergetListener()voidhandleData(byte[] buf)Feed the detector with more datavoidhandleData(byte[] buf, int offset, int length)Feed the detector with more databooleanisDone()voidreset()Resets detector to be used again.voidsetListener(CharsetListener listener)
-
-
-
Field Detail
-
SHORTCUT_THRESHOLD
public static final float SHORTCUT_THRESHOLD
- See Also:
- Constant Field Values
-
MINIMUM_THRESHOLD
public static final float MINIMUM_THRESHOLD
- See Also:
- Constant Field Values
-
inputState
private UniversalDetector.InputState inputState
-
done
private boolean done
-
start
private boolean start
-
gotData
private boolean gotData
-
onlyPrintableASCII
private boolean onlyPrintableASCII
-
lastChar
private byte lastChar
-
detectedCharset
private java.lang.String detectedCharset
-
probers
private CharsetProber[] probers
-
escCharsetProber
private CharsetProber escCharsetProber
-
listener
private CharsetListener listener
-
-
Constructor Detail
-
UniversalDetector
public UniversalDetector()
-
UniversalDetector
public UniversalDetector(CharsetListener listener)
- Parameters:
listener- a listener object that is notified of the detected encocoding. Can be null.
-
-
Method Detail
-
isDone
public boolean isDone()
-
getDetectedCharset
public java.lang.String getDetectedCharset()
- Returns:
- The detected encoding is returned. If the detector couldn't determine what encoding was used, null is returned.
-
setListener
public void setListener(CharsetListener listener)
-
getListener
public CharsetListener getListener()
-
handleData
public void handleData(byte[] buf)
Feed the detector with more data- Parameters:
buf- The buffer containing the data
-
handleData
public void handleData(byte[] buf, int offset, int length)Feed the detector with more data- Parameters:
buf- Buffer with the dataoffset- initial position of data in buflength- length of data
-
detectCharsetFromBOM
public static java.lang.String detectCharsetFromBOM(byte[] buf)
-
detectCharsetFromBOM
private static java.lang.String detectCharsetFromBOM(byte[] buf, int offset)
-
dataEnd
public void dataEnd()
Marks end of data reading. Finish calculations.
-
reset
public final void reset()
Resets detector to be used again.
-
detectCharset
public static java.lang.String detectCharset(java.io.File file) throws java.io.IOExceptionGets the charset of a File.- Parameters:
file- The file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException- if some IO error occurs
-
detectCharset
public static java.lang.String detectCharset(java.nio.file.Path path) throws java.io.IOExceptionGets the charset of a Path.- Parameters:
path- The path to file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException- if some IO error occurs
-
detectCharset
public static java.lang.String detectCharset(java.io.InputStream inputStream) throws java.io.IOExceptionGets the charset of content from InputStream.- Parameters:
inputStream- InputStream containing text file- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException- if some IO error occurs
-
-