Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Exploratory Data Analysis | Additional Applications of ARM
Association Rule Mining

Exploratory Data AnalysisExploratory Data Analysis

We have already discussed how association rule mining algorithms like Apriori and FP-growth can be applied in market basket analysis. However, ARM can also be utilized to address more specialized tasks. Now, we will provide a concise overview of additional tasks that can be tackled using ARM.

Association Rule Mining (ARM) can be utilized in classification and regression tasks to augment the exploratory data analysis (EDA) process and uncover latent patterns or relationships within our feature dataset.

EDA, or exploratory data analysis, is the process of summarizing and visualizing data to understand its main characteristics, uncover patterns, detect anomalies, and formulate hypotheses for further investigation. It involves techniques such as descriptive statistics, data visualization, and data mining to gain insights and inform subsequent analysis.

By employing ARM, we can identify associations or "if-then" relationships among variables, which can be valuable for making predictions or deriving insights from the data.

Example

Let's consider a Heart Disease Classification dataset: it contains information about some medical features of the human organism. We will perform ARM to detect some hidden patterns in it:

What conclusions can we make?

  1. If a patient has thalassemia type 3 (thal_3), they are likely to be male (sex) with a confidence of 87.07%. This suggests a strong association between thal_3 and being male;
  2. If a patient has both slope type 2 (slope_2) and restecg type 1 (restecg_1), they are likely to have a heart disease (target) with a confidence of 80.36%. This indicates a strong association between slope_2, restecg_1, and having a heart disease;
  3. If a patient has both thalassemia type 2 (thal_2) and restecg type 1 (restecg_1), they are likely to have a heart disease (target) with a confidence of 84.75%. This suggests a strong association between thal_2, restecg_1, and having a heart disease;
  4. If a patient has both slope type 2 (slope_2) and thalassemia type 2 (thal_2), they are likely to have a heart disease (target) with a confidence of 85.45%. This indicates a strong association between slope_2, thal_2, and having a heart disease;
  5. All lift values are greater than 1 for the provided rules. This indicates that the antecedents and consequents occur together more frequently than expected if they were independent. In other words, the occurrence of the antecedents increases the likelihood of the consequents, suggesting a positive association between the variables.

Using rules 2-3, we can even perform rule-based classification - if the patient has some particular feature values - we can classify heart disease without using ML approaches.

Tudo estava claro?

Seção 3. Capítulo 1
course content

Conteúdo do Curso

Association Rule Mining

Exploratory Data AnalysisExploratory Data Analysis

We have already discussed how association rule mining algorithms like Apriori and FP-growth can be applied in market basket analysis. However, ARM can also be utilized to address more specialized tasks. Now, we will provide a concise overview of additional tasks that can be tackled using ARM.

Association Rule Mining (ARM) can be utilized in classification and regression tasks to augment the exploratory data analysis (EDA) process and uncover latent patterns or relationships within our feature dataset.

EDA, or exploratory data analysis, is the process of summarizing and visualizing data to understand its main characteristics, uncover patterns, detect anomalies, and formulate hypotheses for further investigation. It involves techniques such as descriptive statistics, data visualization, and data mining to gain insights and inform subsequent analysis.

By employing ARM, we can identify associations or "if-then" relationships among variables, which can be valuable for making predictions or deriving insights from the data.

Example

Let's consider a Heart Disease Classification dataset: it contains information about some medical features of the human organism. We will perform ARM to detect some hidden patterns in it:

What conclusions can we make?

  1. If a patient has thalassemia type 3 (thal_3), they are likely to be male (sex) with a confidence of 87.07%. This suggests a strong association between thal_3 and being male;
  2. If a patient has both slope type 2 (slope_2) and restecg type 1 (restecg_1), they are likely to have a heart disease (target) with a confidence of 80.36%. This indicates a strong association between slope_2, restecg_1, and having a heart disease;
  3. If a patient has both thalassemia type 2 (thal_2) and restecg type 1 (restecg_1), they are likely to have a heart disease (target) with a confidence of 84.75%. This suggests a strong association between thal_2, restecg_1, and having a heart disease;
  4. If a patient has both slope type 2 (slope_2) and thalassemia type 2 (thal_2), they are likely to have a heart disease (target) with a confidence of 85.45%. This indicates a strong association between slope_2, thal_2, and having a heart disease;
  5. All lift values are greater than 1 for the provided rules. This indicates that the antecedents and consequents occur together more frequently than expected if they were independent. In other words, the occurrence of the antecedents increases the likelihood of the consequents, suggesting a positive association between the variables.

Using rules 2-3, we can even perform rule-based classification - if the patient has some particular feature values - we can classify heart disease without using ML approaches.

Tudo estava claro?

Seção 3. Capítulo 1
some-alt