The primary challenge faced by any Search Engine is to determine from the user's query what it is that they intend and which document or product best satisfies this intent. Typically, this involves parsing the search string and parsing the documents and trying to match the search string against the document or an index of the documents based upon the content of the document or metadata associated with the document or product. This is an indirect approach at best. The words describing the user intent do not always explicitly match the words found in the document or product description, and the user may use different words to describe things than the content or metadata author. In addition, language is inherently ambiguous with one word having many meanings and many different words to describe the same thing. Furthermore, determining which parts of a large document is most important or most relevant is not always apparent. For some searches, such as product catalog searches, there may be very little textual description at all. Perhaps only a title, or brief product description, although these are not always created with search in mind, and may be created by the product manufacturer leaving the website owner very little control over content. In some cases metadata is available, but these are not always adequate to anticipate every query and are difficult and costly to acquire and to maintain. Despite all these challenges, having an effective search capability is an absolute imperative for most businesses, since a customer or user cannot buy your product or read your content if they cannot find it.
Intelligent Systems has several technologies to address these challenges.
Organizing and labeling content or products with rich metadata that reflects the semantic meaning of the content, rather than just the surface expression of that meaning reflected by the content itself, can significantly improve search results and introduce consistency into how content is organized and accessed. The technology discussed in the Semantic Web section are ideal for this purpose. By organizing products and content into well structured RDF taxonomies and tagging content and products with RDF semantic attributes, the meaning of content becomes clearer and less ambiguous, allowing search algorithms to more easily identify items that are most relevant. In addition, if this semantic metadata is reflected in semantic markup on the published page, external search engines such as Google that are optimized for semantic markup can more easily find the page and the search page rank of the page will be higher. In general, content that is well organized and has content which explicitly and unambiguously reflects its meaning will fare better from an SEO point of view.
Of course, defining rich semantic metadata for all content and products is not always easy and can be time consuming and costly. Tools such as those described in the Semantic Web section can help make this more efficient, but it still involves potentially significant manual effort. An alternative is to employ some of the techniques and tools described in the Crowdsourcing section. In addition to the cost saving and scalability that can be realized via the economy of scale crowdsourcing offers, introducing crowdsourcing on your website or catalog itself can allow you to harness your customer base itself to identify the best way to classify or describe your content or products. In this way, the metadata describing you products will match the way your customers think about our products and the vocabulary they use to describe them.
Natural Language, Text Analysis, and Machine Learning
The techniques described in several of the sections (e.g. Machine Learning, Text Mining, Document Classification) can have huge impact on improving search results. Part-of-speech tagging and word-sense disambiguation can help reduce the ambiguity of matching words and phrases. Various statistical techniques can be used to determine synonyms, abbreviations, and phrases with equivalent or similar meaning. The significance of words and phrases in a document, and therefore the relevance to a query can be determined via a number of statistical techniques. Documents can be classified and taxonomies can be constructed via machine learning techniques. Concepts can be extracted and allow matches to be performed at the concept or meaning level rather than surface syntax. In addition to improving the performance of search, these techniques can often be less costly and more scalable since they typically involve automatic analysis rather than manual editing or tagging.
In addition to the metadata and linguistic techniques described above, the act of searching itself can be harnessed with the help of Machine Learning and statistical analysis to improve search as it is used and help users find exactly the content or product they are looking for. By analyzing existing search history and identifying the actual search terms that users have used previously and which document or product they selected from the resulting search results, Intelligent Systems has been able to develop adaptive search engines that learn the relationship between search terms and target documents or products. In this way, we can learn how customers think about products and content in their own words and return the exact items that they are looking for when they use such queries. Using Machine Learning and statistical methods, these associations can be generalized to identify associations between queries and products even when the precise query has not been encountered before. Furthermore, such an adaptive approach improves as it is used and can learn new associations as new content or products are added. Intelligent Systems has had significant success with this approach and has obtained significant and measurable improvements in search performance over popular commercial search engines.
|Copyright ©1997-2015 Intelligent Systems. All rights reserved.|