|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ibm.icu.text.BreakIterator
public abstract class BreakIterator
A class that locates boundaries in text. This class defines a protocol for objects that break up a piece of natural-language text according to a set of criteria. Instances or subclasses of BreakIterator can be provided, for example, to break a piece of text into words, sentences, or logical characters according to the conventions of some language or group of languages. We provide five built-in types of BreakIterator:
Examples:
Creating and using text boundaries
Print each element in orderpublic static void main(String args[]) { if (args.length == 1) { String stringToExamine = args[0]; //print each word in order BreakIterator boundary = BreakIterator.getWordInstance(); boundary.setText(stringToExamine); printEachForward(boundary, stringToExamine); //print each sentence in reverse order boundary = BreakIterator.getSentenceInstance(Locale.US); boundary.setText(stringToExamine); printEachBackward(boundary, stringToExamine); printFirst(boundary, stringToExamine); printLast(boundary, stringToExamine); } }
Print each element in reverse orderpublic static void printEachForward(BreakIterator boundary, String source) { int start = boundary.first(); for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) { System.out.println(source.substring(start,end)); } }
Print first elementpublic static void printEachBackward(BreakIterator boundary, String source) { int end = boundary.last(); for (int start = boundary.previous(); start != BreakIterator.DONE; end = start, start = boundary.previous()) { System.out.println(source.substring(start,end)); } }
Print last elementpublic static void printFirst(BreakIterator boundary, String source) { int start = boundary.first(); int end = boundary.next(); System.out.println(source.substring(start,end)); }
Print the element at a specified positionpublic static void printLast(BreakIterator boundary, String source) { int end = boundary.last(); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
Find the next wordpublic static void printAt(BreakIterator boundary, int pos, String source) { int end = boundary.following(pos); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
public static int nextWordStartAfter(int pos, String text) { BreakIterator wb = BreakIterator.getWordInstance(); wb.setText(text); int last = wb.following(pos); int current = wb.next(); while (current != BreakIterator.DONE) { for (int p = last; p < current; p++) { if (Character.isLetter(text.charAt(p)) return last; } last = current; current = wb.next(); } return BreakIterator.DONE; }(The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)
CharacterIterator
Field Summary | |
---|---|
static int |
DONE
DONE is returned by previous() and next() after all valid boundaries have been returned. |
static int |
KIND_CHARACTER
|
static int |
KIND_LINE
|
static int |
KIND_SENTENCE
|
static int |
KIND_TITLE
|
static int |
KIND_WORD
|
Constructor Summary | |
---|---|
protected |
BreakIterator()
Default constructor. |
Method Summary | |
---|---|
Object |
clone()
Clone method. |
abstract int |
current()
Return the iterator's current position. |
abstract int |
first()
Return the first boundary position. |
abstract int |
following(int offset)
Sets the iterator's current iteration position to be the first boundary position following the specified position. |
static Locale[] |
getAvailableLocales()
Returns a list of locales for which BreakIterators can be used. |
static ULocale[] |
getAvailableULocales()
Returns a list of locales for which BreakIterators can be used. |
static BreakIterator |
getCharacterInstance()
Returns a new instance of BreakIterator that locates logical-character boundaries. |
static BreakIterator |
getCharacterInstance(Locale where)
Returns a new instance of BreakIterator that locates logical-character boundaries. |
static BreakIterator |
getCharacterInstance(ULocale where)
Returns a new instance of BreakIterator that locates logical-character boundaries. |
static BreakIterator |
getLineInstance()
Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
static BreakIterator |
getLineInstance(Locale where)
Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
static BreakIterator |
getLineInstance(ULocale where)
Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
ULocale |
getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null. |
static BreakIterator |
getSentenceInstance()
Returns a new instance of BreakIterator that locates sentence boundaries. |
static BreakIterator |
getSentenceInstance(Locale where)
Returns a new instance of BreakIterator that locates sentence boundaries. |
static BreakIterator |
getSentenceInstance(ULocale where)
Returns a new instance of BreakIterator that locates sentence boundaries. |
abstract CharacterIterator |
getText()
Returns a CharacterIterator over the text being analyzed. |
static BreakIterator |
getTitleInstance()
Returns a new instance of BreakIterator that locates title boundaries. |
static BreakIterator |
getTitleInstance(Locale where)
Returns a new instance of BreakIterator that locates title boundaries. |
static BreakIterator |
getTitleInstance(ULocale where)
Returns a new instance of BreakIterator that locates title boundaries. |
static BreakIterator |
getWordInstance()
Returns a new instance of BreakIterator that locates word boundaries. |
static BreakIterator |
getWordInstance(Locale where)
Returns a new instance of BreakIterator that locates word boundaries. |
static BreakIterator |
getWordInstance(ULocale where)
Returns a new instance of BreakIterator that locates word boundaries. |
boolean |
isBoundary(int offset)
Return true if the specfied position is a boundary position. |
abstract int |
last()
Return the last boundary position. |
abstract int |
next()
Advances the iterator forward one boundary. |
abstract int |
next(int n)
Advances the specified number of steps forward in the text (a negative number, therefore, advances backwards). |
int |
preceding(int offset)
Sets the iterator's current iteration position to be the last boundary position preceding the specified position. |
abstract int |
previous()
Advances the iterator backward one boundary. |
static Object |
registerInstance(BreakIterator iter,
Locale locale,
int kind)
Register a new break iterator of the indicated kind, to use in the given locale. |
static Object |
registerInstance(BreakIterator iter,
ULocale locale,
int kind)
Register a new break iterator of the indicated kind, to use in the given locale. |
abstract void |
setText(CharacterIterator newText)
Sets the iterator to analyze a new piece of text. |
void |
setText(String newText)
Sets the iterator to analyze a new piece of text. |
static boolean |
unregister(Object key)
Unregister a previously-registered BreakIterator using the key returned from the register call. |
Methods inherited from class java.lang.Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DONE
public static final int KIND_CHARACTER
public static final int KIND_WORD
public static final int KIND_LINE
public static final int KIND_SENTENCE
public static final int KIND_TITLE
Constructor Detail |
---|
protected BreakIterator()
Method Detail |
---|
public Object clone()
clone
in class Object
public abstract int first()
public abstract int last()
public abstract int next(int n)
n
- The number of boundaries to advance over (if positive, moves
forward; if negative, moves backwards).
public abstract int next()
public abstract int previous()
public abstract int following(int offset)
offset
- The character position to start searching from.
public int preceding(int offset)
offset
- The character position to start searching from.
public boolean isBoundary(int offset)
offset
- the offset to check.
public abstract int current()
public abstract CharacterIterator getText()
public void setText(String newText)
newText
- A String containing the text to analyze with
this BreakIterator.public abstract void setText(CharacterIterator newText)
newText
- A CharacterIterator referring to the text
to analyze with this BreakIterator (the iterator's current
position is ignored, but its other state is significant).public static BreakIterator getWordInstance()
public static BreakIterator getWordInstance(Locale where)
where
- A locale specifying the language of the text to be
analyzed.
public static BreakIterator getWordInstance(ULocale where)
where
- A locale specifying the language of the text to be
analyzed.
public static BreakIterator getLineInstance()
public static BreakIterator getLineInstance(Locale where)
where
- A Locale specifying the language of the text being broken.
public static BreakIterator getLineInstance(ULocale where)
where
- A Locale specifying the language of the text being broken.
public static BreakIterator getCharacterInstance()
public static BreakIterator getCharacterInstance(Locale where)
where
- A Locale specifying the language of the text being analyzed.
public static BreakIterator getCharacterInstance(ULocale where)
where
- A Locale specifying the language of the text being analyzed.
public static BreakIterator getSentenceInstance()
public static BreakIterator getSentenceInstance(Locale where)
where
- A Locale specifying the language of the text being analyzed.
public static BreakIterator getSentenceInstance(ULocale where)
where
- A Locale specifying the language of the text being analyzed.
public static BreakIterator getTitleInstance()
getWordInstance()
public static BreakIterator getTitleInstance(Locale where)
getWordInstance()
where
- A Locale specifying the language of the text being analyzed.
public static BreakIterator getTitleInstance(ULocale where)
getWordInstance()
where
- A Locale specifying the language of the text being analyzed.
public static Object registerInstance(BreakIterator iter, Locale locale, int kind)
iter
- the BreakIterator instance to adopt.locale
- the Locale for which this instance is to be registeredkind
- the type of iterator for which this instance is to be registered
public static Object registerInstance(BreakIterator iter, ULocale locale, int kind)
iter
- the BreakIterator instance to adopt.locale
- the Locale for which this instance is to be registeredkind
- the type of iterator for which this instance is to be registered
public static boolean unregister(Object key)
key
- the registry key returned by a previous call to registerInstance
public static Locale[] getAvailableLocales()
public static ULocale[] getAvailableULocales()
public final ULocale getLocale(ULocale.Type type)
Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.
type
- type of information requested, either ULocale.VALID_LOCALE
or ULocale.ACTUAL_LOCALE
.
ULocale
,
ULocale.VALID_LOCALE
,
ULocale.ACTUAL_LOCALE
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |