Sentence Analyzer - enabling business and enterprise applications to handle sentences and text | |||||||||||||||
| Things that are not obvious from the demo. | |||||||||||||||
|
Additional compressed patterns : The sentence analyzer generates secondary patterns other than the full XML. A very compressed description of the sentence, in a way. This kind of a short description can be used when dealing with a very large number of sentences (say in a big document), where the intention is to only parse the document only if it passes certain identification tests. A Google-like indexing using sentences in documents ? : Here is how word-based indexing and search works (Google/Lucene etc.) usually work - take a document or web-page, collect all its distinct words, and index the word vs. the document or web-page or web-link (just like the index at the back of your textbook). Do this for many documents, and soon you have one word and a list of documents where it occurs. An user searches and fetches the entire set of documents or web-pages with word(s). For a moment, imagine that instead of word-tokenizing, the pithy sentence-patterns (mentioned above) were used to index a document ? If a search user entered an entire sentence, can a declarative answer to an inquistive question, or the exact matching sentence in the document pop out ? Thus helping in more relevant or focussed document searches ? I do not know, untrodden ground, but seems worth a try :-) Finding the relevant zone inside a big document ? : Offering the right section of the searched document to an user, (that matches some benchmark sentences), is a good idea. It helps the user zone in into exactly the desired place in a large document. This could help a lot of people whose task it is to read big documents. How simple is the dictionary used ? : As simple as this line duck,NOUN;. Followed by (preferably not preceded by) duck,VERB; . There is a good provision to switch between the two, upto say a 100 such switches, but it slows down the process and has been avoided in this demo. Can the dictionary be replaced with another ? : Yes, not only possible but also recommended, since the one I have used seems to be of 1910 vintage. Also, it makes a lot of sense to put a small domain-specific dictionary overriding the generic one. For example, each "documentType" demonstrated in the exe at text2data.net could have a small custom dictionary. How can sentiments and further rich flavors/colors get associated with certain words and phrases ? : There can be arbitary characters after the word, like say duck:WBG+- , where WBG stands for white/black/grey and +- stands for strong/weak (any or all can be present or missing, can occur more than once, whatever). A peppering of the sentence with such arbitrary codes supplies a different way of analysing the flavor of a sentence. Not available in this demo. | |||||||||||||||
| More view/try pages here. | |||||||||||||||
|
|
||||||||||||||
More reading for those interested... | |||||||||||||||
| 1. Things that are not obvious from the demo | |||||||||||||||
| 2. Business products and possibilities | |||||||||||||||
| 3. So what !! Universal grammar has been in use for decades now ... | |||||||||||||||
| 4. The inevitable comparisons, to what already exists out there. | |||||||||||||||
| 5. What is a sentence, to future application builders ? | |||||||||||||||
| 6. The genesis and design principles story | |||||||||||||||
| 7. Arbitrary listing of business usages | |||||||||||||||
| 8. Extensions, additions, customizations possible in the toolkit | |||||||||||||||
| 9. Important : Combining with the document extraction tool at text2data.net, benefits | |||||||||||||||
| Contact at : kinshuk_in @ yahoo dot. com | |||||||||||||||