OK, to get started, you must first understand that data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and generalizing it to useful information - information that can be used to increase revenue, or both . Data mining software is one of many analytic tools for data analysis. It allows users to analyze data from different sizes or angles, classify them and summarize identified relationships. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
Now the amount of raw data stored in corporate databases is exploding. Starting with trillions of transactions in terms of selling and buying credit cards on pixel images of galaxies, databases are currently measured in gigabytes and terabytes. (One terabyte = one trillion bytes. A terabyte is equivalent to about 2 million books!) For example, Wal-Mart downloads 20 million transactions every day from the point of view of sales into a massive parallel A&T system with 483 processors running a centralized database. However, the raw data does not provide much information. In today's fiercely competitive business environment, companies need to quickly turn these terabytes of raw data into significant information about their customers and markets to guide their marketing, investment and management strategies.
You should now understand that association management is an important data mining model. Its mining algorithms detect all associations of elements (or rules) in the data that meet the minimum requirements of minimum support (minsup) and minimum confidence (minconf). Minsup controls the minimum number of cases that a rule should cover. Minconf controls the predictive power of the rule. Since only one minsup is used for the entire database, the model implicitly assumes that all elements in the data are of the same nature and / or have similar frequencies in the data. However, this rarely happens in real world applications. In many applications, some elements appear very often in the data, while others rarely appear. If minsup is set too high, those rules that contain rare elements will not be found. To find rules that involve both frequent and rare items, minsup must be set very low. This can cause a combinatorial explosion, as these frequent elements will be connected to each other in all possible ways. This dilemma is called the rare item problem. This article proposes a new method for solving this problem. This method allows the user to specify several minimum supports to reflect the nature of the elements and their various frequencies in the database. In the process of developing rules, different rules may need to meet different minimum supports, depending on which elements are in the rules.
Given the set of transactions T (database), the problem of mine association rules is to discover all association rules that have support and trust that exceed the minimum support specified by the user (called minsup) and the minimum confidence (called minconf).
I hope that once you understand the basics of data mining, the answer to this question will become obvious.
source share