Explain Web Mining Taxonomy.

  • Web mining is the application of data mining techniques to discover patterns, structures, and knowledge from the Web. According to analysis targets, web mining can be organized into three main areas: web content mining, web structure mining, and web usage mining.

 

Web Mining research can be classified into three categories:

1)Web content mining

a)Web Page Content Mining

b)Search Result Mining

2) Web Structure Mining

a) Using Links

b) Using Generalization

3)Web usage mining 

a)General Access Pattern Tracking

b)Customized Usage Tracking

1)Web content mining

  • Web content mining analyzes web content such as text, multimedia data, and structured data (within web pages or linked across web pages). This is done to understand the content of web pages, provide scalable and informative keyword-based page indexing, entity/concept resolution, web page relevance and ranking, web page content summaries, and other valuable information related to web search and analysis. 
  • Web content mining has been studied extensively by researchers, search engines, and other web service companies. Web content mining can build links across multiple web pages for individuals; therefore, it has the potential to inappropriately disclose personal the development of techniques to protect personal privacy on the Web. 
  • Web content mining refers to the discovery of useful information from Web contents, including text, images, audio, video, etc.

a)Web Page Content Mining

  • Web Page Summarization
  • WebLog (Lakshmanan et.al. 1996),
  • WebOQL(Mendelzon et.al. 1998) ...:
  • Web Structuring query languages;
  • Can identify information within given web pages

•Ahoy! (Etzioni et.al. 1997):Uses heuristics to distinguish personal home pages from other web pages

•ShopBot (Etzioni et.al. 1997): Looks for product prices within web pages


b)Search Result Mining

Search Engine Result

  • Summarization
  • •Clustering Search Result (Leouski and Croft, 1996, Zamir and Etzioni, 1997):Categorizes documents using phrases in titles and snippets


2) Web Structure Mining

  • Web structure mining studies the model underlying the link structures of the Web. It has been used for search engine result ranking and other Web applications.
  • Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web. It extracts our patterns from hyperlinks, where a hyperlink is a structural component that connects a web page to another location. It can also mine the document structure within a page (e.g., analyze the treelike structure of page structures to describe HTML or XML tag usage). Both kinds of web structure mining help us understand web content and may also help transform web content into relatively structured data sets.

a) Using Links

•PageRank (Brin et al., 1998)

•CLEVER (Chakrabarti et al., 1998)

- Use interconnections between web pages to give weight to pages.


b) Using Generalization

•MLDB (1994), VWV (1998)

- Uses a multi-level database representation of the Web. Counters (popularity) and link lists are used for capturing structure.


3)Web usage mining 

Web usage mining focuses on using data mining techniques to analyze search logs to find interesting patterns. One of the main applications of Web usage mining is its use to learn user profiles. Web usage mining is the process of extracting useful information (e.g., user click streams) from server logs. It finds patterns related to general or particular groups of users; understands users' search patterns, trends, and associations; and predicts what users are looking for on the Internet. It helps improve search efficiency and effectiveness, as well as promotes products or related information to different groups of users at the right time. Web search companies routinely conduct web usage mining to improve their quality of service.

a)General Access Pattern Tracking

•Web Log Mining (Zaïane, Xin and Han, 1998)

- Uses KDD techniques to understand general access patterns and trends.

- Can shed light on better structure and grouping of resource providers.


b)Customized Usage Tracking

•Adaptive Sites (Perkowitz and Etzioni, 1997)

- Analyzes access patterns of each user at a time.

- Web site restructures itself automatically by learning from user access patterns.

Comments

Popular posts from this blog

Pure Versus Partial EC

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Short note on E-Government Architecture