Sentence Analyzer - enabling business and enterprise applications to handle sentences and text

Things that are not obvious from the demo.

Additional compressed patterns : The sentence analyzer generates secondary patterns other than the full XML. A very compressed description of the sentence, in a way. This kind of a short description can be used when dealing with a very large number of sentences (say in a big document), where the intention is to only parse the document only if it passes certain identification tests.

A Google-like indexing using sentences in documents ? : Here is how word-based indexing and search works (Google/Lucene etc.) usually work - take a document or web-page, collect all its distinct words, and index the word vs. the document or web-page or web-link (just like the index at the back of your textbook). Do this for many documents, and soon you have one word and a list of documents where it occurs. An user searches and fetches the entire set of documents or web-pages with word(s). For a moment, imagine that instead of word-tokenizing, the pithy sentence-patterns (mentioned above) were used to index a document ? If a search user entered an entire sentence, can a declarative answer to an inquistive question, or the exact matching sentence in the document pop out ? Thus helping in more relevant or focussed document searches ? I do not know, untrodden ground, but seems worth a try :-)

Finding the relevant zone inside a big document ? : Offering the right section of the searched document to an user, (that matches some benchmark sentences), is a good idea. It helps the user zone in into exactly the desired place in a large document. This could help a lot of people whose task it is to read big documents.

How simple is the dictionary used ? : As simple as this line duck,NOUN;. Followed by (preferably not preceded by) duck,VERB; . There is a good provision to switch between the two, upto say a 100 such switches, but it slows down the process and has been avoided in this demo.

Can the dictionary be replaced with another ? : Yes, not only possible but also recommended, since the one I have used seems to be of 1910 vintage. Also, it makes a lot of sense to put a small domain-specific dictionary overriding the generic one. For example, each "documentType" demonstrated in the exe at text2data.net could have a small custom dictionary.

How can sentiments and further rich flavors/colors get associated with certain words and phrases ? : There can be arbitary characters after the word, like say duck:WBG+- , where WBG stands for white/black/grey and +- stands for strong/weak (any or all can be present or missing, can occur more than once, whatever). A peppering of the sentence with such arbitrary codes supplies a different way of analysing the flavor of a sentence. Not available in this demo.

More view/try pages here.
Generic use cases
Back to home/basic analyzer
Comparing sentences, several modes
Find/search/sort/filter
Crunching a big text
Wildcard usages in pure structure mode
Business use cases
Handling Notes section in annual reports
Crunching of a Presidential speech
Executive profiles
Project statuses
USPTO events alerter
Customer reviews
Back to home/basic analyzer

More reading for those interested...

1. Things that are not obvious from the demo
2. Business products and possibilities
3. So what !! Universal grammar has been in use for decades now ...
4. The inevitable comparisons, to what already exists out there.
5. What is a sentence, to future application builders ?
6. The genesis and design principles story
7. Arbitrary listing of business usages
8. Extensions, additions, customizations possible in the toolkit
9. Important : Combining with the document extraction tool at text2data.net, benefits
Contact at : kinshuk_in @ yahoo dot. com