Explain Web Mining Taxonomy.
- Web mining is the application of data mining techniques to discover patterns, structures, and knowledge from the Web. According to analysis targets, web mining can be organized into three main areas: web content mining, web structure mining, and web usage mining.
Web Mining research can be classified into three categories:
1)Web content mining
a)Web Page Content Mining
b)Search Result Mining
2) Web Structure Mining
a) Using Links
b) Using Generalization
1)Web content mining
- Web content mining analyzes web content such as text, multimedia data, and structured data (within web pages or linked across web pages). This is done to understand the content of web pages, provide scalable and informative keyword-based page indexing, entity/concept resolution, web page relevance and ranking, web page content summaries, and other valuable information related to web search and analysis.
- Web content mining has been studied extensively by researchers, search engines, and other web service companies. Web content mining can build links across multiple web pages for individuals; therefore, it has the potential to inappropriately disclose personal the development of techniques to protect personal privacy on the Web.
- Web content mining refers to the discovery of useful information from Web contents, including text, images, audio, video, etc.
a)Web Page Content Mining
- Web Page Summarization
- WebLog (Lakshmanan et.al. 1996),
- WebOQL(Mendelzon et.al. 1998) ...:
- Web Structuring query languages;
- Can identify information within given web pages
•Ahoy! (Etzioni et.al. 1997):Uses heuristics to distinguish personal home pages from other web pages
•ShopBot (Etzioni et.al. 1997): Looks for product prices within web pages
b)Search Result Mining
Search Engine Result
- Summarization
- •Clustering Search Result (Leouski and Croft, 1996, Zamir and Etzioni, 1997):Categorizes documents using phrases in titles and snippets
2) Web Structure Mining
- Web structure mining studies the model underlying the link structures of the Web. It has been used for search engine result ranking and other Web applications.
- Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web. It extracts our patterns from hyperlinks, where a hyperlink is a structural component that connects a web page to another location. It can also mine the document structure within a page (e.g., analyze the treelike structure of page structures to describe HTML or XML tag usage). Both kinds of web structure mining help us understand web content and may also help transform web content into relatively structured data sets.
a) Using Links
•PageRank (Brin et al., 1998)
•CLEVER (Chakrabarti et al., 1998)
- Use interconnections between web pages to give weight to pages.
b) Using Generalization
•MLDB (1994), VWV (1998)
- Uses a multi-level database representation of the Web. Counters (popularity) and link lists are used for capturing structure.
3)Web usage mining
Web usage mining focuses on using data mining techniques to analyze search logs to find interesting patterns. One of the main applications of Web usage mining is its use to learn user profiles. Web usage mining is the process of extracting useful information (e.g., user click streams) from server logs. It finds patterns related to general or particular groups of users; understands users' search patterns, trends, and associations; and predicts what users are looking for on the Internet. It helps improve search efficiency and effectiveness, as well as promotes products or related information to different groups of users at the right time. Web search companies routinely conduct web usage mining to improve their quality of service.
a)General Access Pattern Tracking
•Web Log Mining (Zaïane, Xin and Han, 1998)
- Uses KDD techniques to understand general access patterns and trends.
- Can shed light on better structure and grouping of resource providers.
b)Customized Usage Tracking
•Adaptive Sites (Perkowitz and Etzioni, 1997)
- Analyzes access patterns of each user at a time.
- Web site restructures itself automatically by learning from user access patterns.
Comments
Post a Comment