The Ostext search provides a simple search-engine with a powerful query language. The query language is based on logical operators and other operators to get the best search results.
Ostext uses Lucene for text indexing, which provides a rich query language. Much of the information on this page is derived from the Query Parser Syntax page of the Lucene documentation.
A query is broken up into terms and operators. There are three types of terms: Single Terms, Phrases, and Subqueries.
A Single Term is a single word such as “settings” or “cancel”.
A Phrase is a group of words surrounded by double quotes such as “press cancel”.
A Subquery is a query surrounded by parentheses such as “(press ok)”.
Multiple terms can be combined together with boolean operators to form complex queries (see below).
Ostext supports single and multiple character wildcard searches within single terms (but not within phrase queries).
To perform a single character wildcard search use the “?” symbol.
To perform a multiple character wildcard search use the “*” symbol.
The single character wildcard search looks for string that match the term with the “?” replaced by any single character. For example, to search for “text” or “test” you can use the search:
Multiple character wildcard searches look for 0 or more characters when matching strings against terms. For example, to search for test, tests or tester, you can use the search:
You can use “?”, “*” or both at any place of the term:
It searches for “write”, “wrote”, “written”, “rewrite”, “rewrote” and so on.
Ostext supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. A fuzzy search use the tilde, “~”, symbol at the end of a Single word Term. For example to search for a term similar in spelling to “roam” use the fuzzy search:
This search will find terms like foam and roams. Additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:
The default that is used if the parameter is not given is 0.5.
Ostext supports finding words from a phrase that are within a specified word distance in a string. To do a proximity search use the tilde, “~”, symbol at the end of the phrase. For example to search for a “press” and “button” within 10 words of each other in a document use the search:
1. ”press button”~10
Boosting a Term
Ostext provide the relevance level of matching documents based on the terms found. To boost the relevance of a term use the caret, “^”, symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting individual terms. For example, if you are searching for
1. cancel continue
and you want the term “cancel” to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:
1. cancel^4 continue
This will make matching entry with the term cancel appear more relevant. You can also boost phrase terms and subqueries.
Boolean operators allow terms to be combined through logic operators. Ostext supports AND, OR and NOT as boolean operators.
If the AND/OR/NOT operator is used, then an AND or OR operator must be present between all query terms. Each term may also be preceded by NOT operator. The AND operator has higher precedence than the OR operator.
The AND operator means that all terms in the “AND group” must match some part of the searched field(s).
To search for documents that contain “keyboard” and “mouse” use the query:
1. ”keyboard” AND “mouse”
The OR operator divides the query into several optional terms.
To search for documents that contain “keyboard” or “mouse” use the query:
1. ”keyboard” OR “mouse”
The NOT operator excludes documents that contain the term after NOT. But an “AND group” which contains only terms with the NOT operator gives an empty result set instead of a full set of indexed documents.
To search for documents that contain “keyboard” but not “mouse” use the query:
1. ”keyboard” AND NOT “mouse”