Document Analysis with Content and Query based System
Abstract – A bulk data is generated in different organization which is in textual format. In such text structured information is get shadowed in unstructured text. Collections of huge, large textual data contains significant amount of structured information, which remains hidden in unstructured text. Relevant information is always difficult to find in these documents. Current algorithms working on constructing information from raw data , but they are not cost effective and sometimes shows impure result set especially when they are working on text with lacking of knowledge about exact arrangement of text data. We proposed two new technique that facilitates the generation of structured metadata by identifying documents that are likely to contain information of user interest and this information is going to be useful for querying the database find exact information/document. Here people will likely to assign metadata related to documents which they upload which will easily help the users in retrieving the documents. Our approach relies on the idea that humans are more likely to add the necessary metadata while creating any document, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a part of the system major modules discover structured attributes and interesting knowledge or features about the document , by using 2 techniques jointly utilizing the
• Content of the text and the
• Query Value
Such algorithms fetching knowledge out of raw data are considering words and their frequency count but not the phrases or typical sequence of words. As a part of our contribution we introduce a technique i.e. phrase extraction. This technique extract typical sequence of words to construct knowledge from raw data.
Keywords: CADS Technique, Information Extraction Algorithm, Attribute Suggestion.