Biomedical Data Mining For Web Page Relevance Checking

Download Document/Synopsis
Data mining is a technique used to mine out useful data and patterns from large data sets and make the most use of obtained results. Web mining and data mining go hand in hand when creating web mining systems. Web mining includes text mining methodologies that allow for usage reading from and classification based on unstructured data. Text mining allows us to detect patterns, keywords and relevant information in unstructured texts. Web mining and data mining systems each have their own uses. Data mining algorithms are efficient at manipulating organized data sets, while web mining algorithms are widely used to scan and mine from unorganized and unstructured web pages and text data available on the internet. Websites created in various platforms have different data structures and are difficult to read for a single algorithm. Since it is not feasible to build a different algorithm to suit various web technology we need to use efficient web mining algorithms to mine this huge amount of web data. Web pages are made up of HTML (Hyper text markup language) In various arrangements and have images, videos etc intermixed on a single web page. So we here propose to use smartly designed web mining algorithms to mine textual information on web pages and detect their relevancy to biomedical sector. In this way we may judge web pages and check their relevancy to the biomedical field. This system proves useful in many biomedical sectors and even search engines to classify web pages into the biomedical structure. Their relevancy to the field help classify and sort them appropriately for the sector.


Advantages
  • This system helps to get relevant bio-medical webpages quickly.
  • User does not have to put more efforts to search webpages related to biomedical.
  • User gets quickly the result he is searching for.
  • This system saves time of the user.
Disadvantages
  • If internet connection fails, system won’t work
  • The system can rate a webpage with low percentage even if the webpage is more relevant to bio-medical

-->