Sentence Analyzer - enabling business and enterprise applications to handle sentences and text | |||||||||||||||
| Combining with the document extraction tool at text2data.net, benefits. | |||||||||||||||
|
Information extraction from ANY document : The text2data.net tool (download the exe demo), is meant to capture every kind of data from a semi-structured document (SEC documents 10K DEF-14, USPTO docs, LinkedIn etc. have been demonstarted, but actually it can handle almost any formal document with minor configuration efforts). Its key idea is to exploit "structural boundaries" inside the text. Content based capture : Certain parts of the document may not have an explicit structural boundary, and yet designated places may contain data. There could be somewhat implicit "content boundaries", which are not identically repeated from document instance to document instance, but neverthless, holds for the document as a type or class, once viewed through the lenses of sentence analyzer. Background/foreground problem : Then again, once inside a structural container, there is sometimes a need for reconfirmation from the nature of the text itself - "is this the right container that has my desired data ?" . This kind of confirmation need especially occurs when there are multiple yet similar structurally defined containers, yet each containing dissimilar text data. Metadata extraction : The identification of an unknown document as a "type", or gaining extra metadata from it, is yet another area where the sentence analyzer can augment the text2data.net tool. For example, identify even the document instance i.e. "this instance is an annual report of pqr type for xyz company for abc date". In massive document extraction architectures, a quick identification is very crucial. Extraction confidence : Lastly, converting any text into data with greater confidence requires repeatedly doing the data extraction by multiple ways and arriving at the same data everytime. So an additional data extraction by using content+ structure instead of structure alone is additional comfort. Combination power : These two demonstrations of two (quite different) methodologies, i.e. this sentence analyzer and the text2data.net tool, create a text-to-data extraction combo that can be quite unbeatable in terms of extraction yeild and data confidence. So, a combination of the two distinct methods, structural content-based, can create an unsurpassable information extraction business scenario. | |||||||||||||||
| More view/try pages here. | |||||||||||||||
|
|
||||||||||||||
More reading for those interested... | |||||||||||||||
| 1. Things that are not obvious from the demo | |||||||||||||||
| 2. Business products and possibilities | |||||||||||||||
| 3. So what !! Universal grammar has been in use for decades now ... | |||||||||||||||
| 4. The inevitable comparisons, to what already exists out there. | |||||||||||||||
| 5. What is a sentence, to future application builders ? | |||||||||||||||
| 6. The genesis and design principles story | |||||||||||||||
| 7. Arbitrary listing of business usages | |||||||||||||||
| 8. Extensions, additions, customizations possible in the toolkit | |||||||||||||||
| 9. Important : Combining with the document extraction tool at text2data.net, benefits | |||||||||||||||
| Contact at : kinshuk_in @ yahoo dot. com | |||||||||||||||