Data mining creates few model to identify patterns on attributes which is inscribed on dataset. Some of those patterns will create “descriptive” results, whereas some pattern creates “Predictive” results.
Generally, data mining identifies four kind of main patterns:
- Associations : groups same or similar events, for example a purchase of same item at supermarket.
- Predictions : forecast future events based on historical record. For example: predict future world cup winner based on winning streak pattern.
- Clusters : groups objects based on known characters, for example create a customer differentiation based on purchase history.
- Sequential relationships : find a relation between events, for example : predict a customer whom purchased a car will purchase spare parts a year later.
These general patterns are extracted from data manually for hundred years, but increased data volume on modern times demands an automated approach. The automated (or semi – automated) approach which analyzes very large amount of dataset is called Data Mining.
Three main data mining categories are :
- Prediction : generally this is a method to forecast future events. The source of this forecasts is the information which provides a data from pasts. Then the data is analyzed thoroughly in order to aid the forecasting. classification on data mining is a method which analyze data on database in order to produce a model which is used to predict future behavior. Some of the classification tool which is commonly used are “Neural Networks” and “Decision Tree”. “Neural Networks” involves mathematical structure development which wields learning ability from past experience. “Decision tree” classifies data to a unlimited class amount based on values and input variables.
- Association : association is a popular data mining technique which finds a relation between variables on a large database. The example of the usage of “association rule” is a bar code scanner.
- Clustering : “clustering” divides and groups objects or events which is available on structured dataset to a segments. Clustering uses specific or dedicated algorithm to identify similar occurrence based on their characters.
No comments:
Post a Comment