To make data in a repository available for searching, the ATG platform creates XHTML representations of the repository items; these XHTML documents are then indexed by ATG Search. To create the XHTML documents, you specify the properties to include in them through the IndexingOutputConfig component’s XML definition file. It’s important to create this definition file carefully. If you include a lot of superfluous properties, your index will be unnecessarily large, which means it will require more memory than it should and take longer to search. In general, the larger the XHTML documents are, the larger the index will be. On the other hand, if you omit important properties, your index might not return the desired results.
Determining Which Properties to Include:
To include properties in the index, you list them in the definition file as property elements within text-properties or meta-properties elements. The values of properties included in text-properties elements are searchable text, while the values of properties included in meta-properties elements are metadata that can be used in constraints.
For example, in a Commerce product catalog, you’d typically want a property containing a product description to be searchable, but you’d typically want a price property to be usable as a constraint (e.g., to constrain the results to include only those products whose price is less than $100).
If you want a property to be both searchable and usable as a constraint, it must appear within both the text-properties and metaproperties elements. For example, you might use a manufacturer property this way, enabling searching for a specific manufacturer, or constraining the results to exclude items from a certain manufacturer.
The key to producing optimal XHTML documents is including the right set of properties – ensuring that you don’t include properties that you don’t need, and don’t omit ones that you do.
Guidelines for Text Properties:
Some guidelines for determining which properties to include as text properties, and which ones to omit:
- Include properties whose values contain text that users are likely to search for. For product catalogs, typical properties might be description, longDescription, color, and brand.
- Don’t include multiple properties that contain the same data. For example, if a product’s description and longDescription properties always contain the same text, include only one of these properties in the index. (Similarly, if the description property always contains a subset of the text in the longDescription property – such as the first sentence – then include longDescription and omit description.)
- Don’t include properties that have values that users are unlikely to search for. These include date and Boolean values, and some types of numeric values. (Note, however, that these properties are often appropriate for metadata.)
- Don’t include properties that may lead to irrelevant or undesired results. For example, suppose you have a Shoes category with two subcategories, Men’s Shoes and Women’s Shoes. If the description property of the Shoes category is “Men’s and women’s shoes,” and you include ancestorCategories.description in the index, searches for “men’s shoes” will return women’s shoes as well as men’s, because the ancestorCategories.description property for each item in Men’s Shoes will contain the phrase “women’s shoes.”
- Be careful not to confuse the name of a property with its values. For example, you might be inclined to include a Boolean property named onSale, on the assumption that users may include “on sale” in search queries. But the resulting index will not include onSale (the name of the property), it will include true and false (the values of the property), so searching for “on sale” will not have the desired effect. Keep in mind that these are just guidelines, and you may need to deviate from them depending on the needs of your site. For example, you may want to include a Boolean property as a text property if you use a custom property accessor to translate true and false into searchable strings. Or there may be certain numeric properties (e.g., product codes) that you may want to make available for searching.
Guidelines for Metadata Properties:
Some guidelines for determining which properties to include as metadata properties, and which ones to omit:
- Include any properties that you want to use for creating facets (e.g., faceting by size).
- Include any properties that you want to use in search configuration rankings and rules (e.g., rank by brand).
- Include any properties that you want to use as sort criteria (e.g., sort by salePrice).
- Include any properties that you want to use in global constraints (e.g., using catalogId to restrict results to items in the custom catalog assigned to the user).
- Don’t include properties that do not match any of these criteria. In particular, you should not include long text fields (such as longDescription) as metadata properties. Although it is possible to include these properties in constraints (e.g., “do not return results whose longDescription contains Acme”), such constraints are very inefficient, and these properties can increase the index size significantly.
Using Filters to Reduce the Size of the Index:
You may be able to reduce the size of your XHTML documents by filtering the property values to remove redundant entries. For example, suppose each XHTML document represents a product with several child SKUs. You might include the SKUs’ salePrice property in the index as a metadata property, so it can be used for faceting. Depending on the product, many of the SKUs may have the same value for salePrice.
So the resulting entries in an XHTML document might look something like this:
By filtering out redundant entries, you can reduce this to:
To automatically perform this filtering, specify the unique filter in the XML definition file, using the filter attribute of the property element:
This setting invokes an instance of the atg.repository.search.indexing.filter.UniqueFilter class.
Note that you do not need to create a Nucleus component to use this filter. As a general rule, it is a good idea to specify the unique filter for a property if multiple items in an XHTML document may have identical values for that property.
If you specify this filter for a property and every value of that property in an XHTML document is unique (or if only one item with that property appears in the document), the unique filter will have no effect on the resulting XHTML (either negative or positive). However, executing this filter increases processing time to create the document, so it is a good idea to specify it only for properties that will benefit from it.