Topic Detection Using Keyword Clustering

Download Document/Synopsis

To find prominent topic in a collection of documents. We here propose a system to detect topic from a collection of document. We use an efficient method to discover topic in a collection of documents known as topic model. A topic model is a type of statistical model for discovering topics from collection of documents. One would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. Our proposed system captures this intuition in a mathematical framework and will examine topic of particular set of documents. Here the system will extract keywords and will use clustering algorithm in order to discover topic from particular set of documents. System will extract keywords which occur often and will cluster this keywords using clustering algorithm and will detect topic from a collection of documents. This system takes co occurrence of terms into account which gives best result. This system can be useful for web crawlers and for web users. This system will help the web users to easily search information for particular topic. When the user will search for particular topic, system will extract various keywords from the set of documents which will match topic name mentioned by the web user and will cluster the keywords and will provide topic related information to the user. Web users will get information quickly for respective topic they are searching for.


Advantages
  • This system takes co occurrence of terms into account which gives best result.
  • This system will help the web users to easily search information for particular topic.
  • Web users will get information quickly for respective topic they are searching for.

Disadvantages
  • This system extracts words rather than phrases. If system extracts phrases topic detection will be faster.

-->