Top
Interstage Big DataComplex Event Processing Server V1.1.0 Developer's Reference
FUJITSU Software

2.5.5 Keyword Search

This section explains the condition expressions that compare input event items with keywords.


2.5.5.1 Pattern Search

Various conditions can be specified for patterns. Complex conditions, such as searches for partial matches and word searches, can be described for searches.


The following types of pattern search are available:

Classification

Type

Pattern search (string)

String match specification

Prefix match specification

Suffix match specification

Free Character Specification

Character Interval Specification

Partial Character Specification

Character Range Specification

Numeric Range Specification

Pattern search (word)

Word match specification

Word interval specification

Logical conjunction, logical disjunction, and negation in pattern searches

Logical conjunction

Logical disjunction

Negation


Pattern Search format

The format used by the pattern search is shown below.

Point

  • A pattern search is enclosed within quotation marks (") or single quotation marks (').

  • The handling of upper-case and lower-case single-byte alphabetics in search target strings can be specified by the rule definition (ANKmix option). The handling of upper-case and lower-case double-byte alphabetics can be specified by the rule definition (KNJmix option). Refer to "2.9 Options" for information on the ANKmix and KNJmix options.

Note

Quotation marks (") and single quotation marks (') cannot be used together.

Pattern format

Pattern format is shown below.


2.5.5.1.1 Pattern search (string)

The format of a pattern search (string) is shown below.

Point

Characters to be excluded as search targets can be specified in rule definitions (SkipChar option). Refer to "2.9 Options" for information on the SkipChar option.

String match specification

Finds out whether the value of an element node includes the specified keyword.

Example

Search for data that includes the string "Fujitsu" in the element value indicated by /root/text.

/root/text = 'Fujitsu'

Prefix match specification

Finds out whether the specified keywords exist at the start of an element node's value.

Example

Search for data that begins with the string "Fujitsu" in the element value indicated by /root/text.

/root/text = '^Fujitsu'

Suffix match specification

Finds out whether the specified keywords exist at the end of an element node's value.

Example

Search for data that ends with the string "Fujitsu" in the element value indicated by /root/text.

/root/text = 'Fujitsu$'

Free character specification

Find out whether the value of an element node and the value of a text node include a keyword that contains free characters.

Free characters included in keywords can be specified in four ways, as shown in the following table.

Symbol

Explanation

Can be used consecutively

.

Any one arbitrary character

Yes

.?

Zero or one arbitrary character

Yes

.+

One or more arbitrary characters

No

.*

Zero or more arbitrary characters

No

Note

If symbols that cannot be used consecutively are used consecutively, CEP engine startup fails.

Example

Search for data that includes the strings "Fujitsu" and "company" in the element value indicated by /root/text, provided the number of characters between these strings is 0 or more.

/root/text = 'Fujitsu.*company'

Information

Free character specifications can be combined. The following table shows examples of how combinations of free character specifications evaluate to TRUE or FALSE for different data. These results assume that "=" has been specified as the comparison operator.

Keyword example

Data example

AB

AXB

AYYB

AZZZB

'A.'

Y

Y

Y

Y

'A.B'

x

Y

x

x

'A.?B'

Y

Y

x

x

'A.+B'

x

Y

Y

Y

'A.*B'

Y

Y

Y

Y

'A..?B'

x

Y

Y

x

'A..+B'

x

x

Y

Y

'A..*B'

x

Y

Y

Y

'A.?.+B'

x

Y

Y

Y

'A.?.*B'

Y

Y

Y

Y

Y: TRUE

x: FALSE


Character interval specification

Finds out whether the two specified keywords appear in succession in an element node's value within an interval of the specified number of characters. The numeric value of character interval specifications must be from 0 through 1024.

Note

  • Character interval specifications can only be specified once in string searches.

  • Free character specifications cannot be specified immediately before or after character interval specifications.

Example

Search for data that includes the strings "alcohol" and "concentration" in the element value indicated by /root/text, provided the number of characters between these strings is 10 or less.

/root/text = 'alcohol,10C,concentration'

Partial character specification

Finds out whether the value of an element node and the value of a text node contain the specified keyword.

Part of the keyword consists of one of multiple strings.

Note

Depending on the number of characters specified, a large amount of memory can be used. Insufficient memory can cause the search response to deteriorate. Note that, if a memory overflow is detected, an error message is output and the input event is discarded (processing of the next input event continues).

Refer to Section 6.3.4, "Tuning" in the User's Guide for information on the memory estimation method.

Example

Search for data that includes any of the strings-"Jon Smith", "John Smith", or "Jonathon Smith"-in the element value indicated by /root/text.

/root/text = 'Jo(n|hn|nathon) Smith'

Character range specification

Finds out whether the value of an element node includes the specified keyword where part of the keyword consists of any character in a specific range.

The character code value of the start character must be smaller than the character code value of the end character. Both the start character (character 1) and the end character (character 2) must be single ASCII characters and must not be control characters.

Note

Depending on the specified character range, a large amount of memory can be used. Insufficient memory can cause the search response to deteriorate. Note that, if a memory overflow is detected, an error message is output and the input event is discarded (processing of the next input event continues).

Refer to Section 6.3.4, "Tuning" in the User's Guide for information on the memory estimation method.

Example

Search for data that includes the strings "classA", "classB", and "classC" in the element value indicated by /root/text.

/root/text = 'class[A-C]'

Numeric range specification

Finds out whether the value of an element node includes the specified keyword where part of the keyword consists of any numeric value in a specific range.

The start numeric value (numeric value 1) and the end numeric value (numeric value 2) must be specified using single-byte numbers. These values must be from 0 through 999. Also, the start numeric value must be smaller than the end numeric value.

Point

Correct search results can be obtained if characters are specified before and after the numeric value.

Note

Depending on the specified numeric range, a large amount of memory can be used. Insufficient memory can cause the search response to deteriorate. Note that, if a memory overflow is detected, an error message is output and the input event is discarded (processing of the next input event continues).

Refer to Section 6.3.4, "Tuning" in the User's Guide for information on the memory estimation method.

Example

Search for data that includes the strings "alcohol 9%", "alcohol 10%", and "alcohol 11%" in the element values indicated by /root/text.

/root/text = 'alcohol [9,11]%'

2.5.5.1.2 Pattern search (word)

The format of pattern search (word) is shown below.


Point

  • The word delimiter character can be specified in the rule definitions (SeparateChar option). Refer to "2.9 Options" for information on the SeparateChar option.

  • ASCII characters (except for the word delimiter character) can be described in word searches.

Word match specification

Finds out whether the value of an element node and the value of the text node contain any individual words that match the specified keyword. For word searches, strings separated by the delimiter are considered as individual words.

Example

Search for data containing the word "the" in the element value indicated by /root/text.

/root/text = '\<the\>'

The string "the" in "mother" will evaluate to FALSE because it occurs within a larger word.


Word interval specification

Finds out whether the two keywords appear in succession in an element node's value within an interval of the specified number of words.

Numeric values specified for word interval specifications must be from 0 through 1024.

Example

Search for data that includes the words "search" and "AsIs" in the element value indicated by /root/text, provided the number of words between these two words is 10 or less.

/root/text = '\<search\>,10W,\<AsIs\>'

Note

Word interval specifications can be used only once in word searches.


2.5.5.1.3 Logical conjunction, logical disjunction, and negation in pattern searches

This section explains pattern searches (logical conjunction, logical disjunction, and negation).


Logical conjunction

Finds out whether the value of element nodes specified in a path expression includes all the specified patterns.

Example

Evaluates to TRUE if the value of the element node represented by '/root/text' includes the strings "fast" and "search".

/root/text = 'fast&search'

Logical disjunction

Finds out whether the value of an element node specified in a path expression includes any of the specified patterns.

Example

Evaluates to TRUE if the value of the element node represented by '/root/text' includes either the string "fast" or the string "search".

/root/text = 'fast|search'

Negation

Finds out whether the value of an element node specified in a path expression includes none of the specified patterns.

Example

Evaluates to TRUE if the value of the element node represented by '/root/text' includes neither the string "fast" nor the string "search".

/root/text = '~(fast|search)'

Point

  • For pattern searches, you can use logical conjunction, logical disjunction, and negation in combination. When this happens, the order of evaluation is Negation > Logical conjunction > Logical disjunction.

  • Parentheses "(" and ")" may also be used to specify the order of evaluation. Conditions in parentheses are evaluated preferentially.

2.5.5.2 String Search

In a string search, a search is performed for events in which the element value exactly matches the value specified in the string or for events in which the element value is in the size relationship. As strings can be used for size comparisons, string searches can be used to search for mixed values containing both numerals and characters.


The format used by the string search is shown below.


String format is shown below.


A string search involves complete match and size comparison.


Complete match

Finds out if the value of an element node is equal to the string.

Example

Search for data equivalent to the string "North Sydney, Australia" indicated by the element value in /root/area.

/root/area == 'North Sydney, Australia'

Size comparison

This compares the size of the element value with the string in the encoding value, in sequence from the left of the string to the right.

Note

  • It is not possible to specify the "//" path operator at the end of a path expression when the string exactly matches or when performing a size comparison.

  • It is not possible to specify the "*" path element at the end of a path expression when the string exactly matches or when performing a size comparison.

  • It is not possible to specify "$_" in the item expression when the string exactly matches or when performing a size comparison.

  • When performing a string comparison, any element value to be searched within an XML event must have the same number of digits as the string specified in the keyword.


Point

  • Characters to be excluded as search targets can be specified in rule definitions (SkipChar option).

  • The handling of upper-case and lower-case single-byte alphabets in search target strings can be specified by the rule definition (ANKmix option). The handling of upper-case and lower-case double-byte alphabets can be specified by the rule definition (KNJmix option).

  • Refer to "2.9 Options" for details.

2.5.5.3 Numeric Search

In a numeric search, a search is performed by extracting the numeric part from an element value and searching for events in which the extracted value matches a specified numeric value or for events in which the extracted value is in the size relationship. As the numeric portion of the element value is extracted automatically, this search can be used to extract numeric values that have been written in a variety of ways.

In addition, it is also possible to specify a numeric function on the left side of the comparison operator to perform comparisons with numeric values.


The format used by the numeric search is shown below.


Numeric literal format is shown below.


Number

For numbers, specify a digit from 0 through 9. There is no limit to the number of digits that may be specified.

Spaces may not be specified in a numeric literal, with the exception of spaces in a prefix or suffix.

The first string in the above format found from the element value will be treated as a numeric value.

Any commas (,) appearing in the integer part are ignored. If a decimal point is specified, the decimal places include all characters up to the first instance of a non-numeric character.


Example

This example evaluates to TRUE if the numeric component extracted from the value of the element node represented by '/doc/money' matches 1000.

/doc/money = 1000

In the following examples, the value of the element node specified in the path expression contains multiple numeric values. In such cases, only the first numeric value is extracted.

Event A

<money>ABC123,456@789</money>

123456 is extracted.


Event B

<money>123456 7890123</money>

123456 is extracted.


Event C

<money>1,500yen</money>

1500 is extracted.


If the search data does not contain a valid numeric value string, the conditions evaluate to FALSE.

The following search target string does not contain a valid numeric value string.

<money></money>

Point

  • The number of digits in a numeric value specified as the keyword need not match the value of the element node specified in a path expression.

  • There is no need to make the number of digits in the integer or decimal part of element node values consistent across multiple XML events.

    EventA

    <money>1000.1</money>

    EventB

    <money>2000.05</money>

    EventC

    <money>10.5</money>

Note

  • The '//' path operator cannot be specified at the end of a path expression when performing numeric search.

  • The '*' path element cannot be specified at the end of a path expression when performing numeric search.

  • In numeric search, "$_" cannot be specified as an item expression.

Example

Search for data greater than 1000 in the element value indicated by /root/money.

/root/money > 1000

See