Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Classical Anonymization Techniques | Foundations of Data Privacy
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Data Privacy and Differential Privacy Fundamentals

bookClassical Anonymization Techniques

Understanding how to protect sensitive information in datasets is a cornerstone of data privacy. Before the rise of differential privacy, several classical anonymization techniques were developed to reduce the risk of re-identifying individuals in shared data. The three most influential are k-anonymity, l-diversity, and t-closeness. Each aims to mask identities by modifying or grouping data, but they differ in approach and strength.

K-anonymity ensures that any record in a released dataset is indistinguishable from at least k-1 other records based on a set of identifying attributes, called quasi-identifiers. In other words, an attacker cannot confidently link a record to a unique individual if at least k records share the same quasi-identifier values.

L-diversity builds upon k-anonymity by ensuring that sensitive attributes within each group of indistinguishable records are diverse enough. This protects against situations where all records in a k-anonymous group have the same sensitive value, which would still reveal private information.

T-closeness further strengthens privacy by requiring that the distribution of sensitive attributes within each group is close to their distribution in the overall dataset. This prevents attackers from learning too much about sensitive values even within diverse groups.

The main differences between these techniques lie in the types of attacks they mitigate. K-anonymity prevents straightforward re-identification, l-diversity guards against attribute disclosure from homogeneous groups, and t-closeness limits information gain about sensitive attributes.

K-anonymity
expand arrow
AgeZipcodeDisease
2512345Flu
2512345Cold
2512345Allergy

Here, quasi-identifiers Age and Zipcode are the same for three records, achieving 3-anonymity.

L-diversity
expand arrow
AgeZipcodeDisease
3054321Cancer
3054321Flu
3054321Cold

This group is not only 3-anonymous but also 3-diverse, as there are three different diseases in the group, reducing the risk of deducing the sensitive value.

T-closeness
expand arrow
AgeZipcodeDisease
4067890Allergy
4067890Allergy
4067890Cold

If Allergy makes up 2/3 of this group but only 10% of the overall dataset, the group does not satisfy t-closeness, as the sensitive value's distribution is too different from the overall dataset.

Note
Study More

Classical anonymization techniques like k-anonymity, l-diversity, and t-closeness can be vulnerable to modern attacks, such as those exploiting background knowledge or linking with external datasets. These limitations motivate the development of stronger privacy models, including differential privacy.

1234567891011121314
import pandas as pd # Synthetic dataset data = pd.DataFrame({ "Age": [25, 25, 25, 30, 30, 30, 40, 40, 40], "Zipcode": [12345, 12345, 12345, 54321, 54321, 54321, 67890, 67890, 67890], "Disease": ["Flu", "Cold", "Allergy", "Cancer", "Flu", "Cold", "Allergy", "Allergy", "Cold"] }) # Group data by quasi-identifiers for k-anonymity (k=3) k_anonymous_groups = data.groupby(["Age", "Zipcode"]) for name, group in k_anonymous_groups: print(f"Group: Age={name[0]}, Zipcode={name[1]}") print(group, end="\n\n")
copy

1. Which of the following best defines k-anonymity?

2. How does l-diversity differ from t-closeness?

3. Why might k-anonymity, l-diversity, and t-closeness fail to protect privacy in some cases?

question mark

Which of the following best defines k-anonymity?

Select the correct answer

question mark

How does l-diversity differ from t-closeness?

Select the correct answer

question mark

Why might k-anonymity, l-diversity, and t-closeness fail to protect privacy in some cases?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

bookClassical Anonymization Techniques

Desliza para mostrar el menú

Understanding how to protect sensitive information in datasets is a cornerstone of data privacy. Before the rise of differential privacy, several classical anonymization techniques were developed to reduce the risk of re-identifying individuals in shared data. The three most influential are k-anonymity, l-diversity, and t-closeness. Each aims to mask identities by modifying or grouping data, but they differ in approach and strength.

K-anonymity ensures that any record in a released dataset is indistinguishable from at least k-1 other records based on a set of identifying attributes, called quasi-identifiers. In other words, an attacker cannot confidently link a record to a unique individual if at least k records share the same quasi-identifier values.

L-diversity builds upon k-anonymity by ensuring that sensitive attributes within each group of indistinguishable records are diverse enough. This protects against situations where all records in a k-anonymous group have the same sensitive value, which would still reveal private information.

T-closeness further strengthens privacy by requiring that the distribution of sensitive attributes within each group is close to their distribution in the overall dataset. This prevents attackers from learning too much about sensitive values even within diverse groups.

The main differences between these techniques lie in the types of attacks they mitigate. K-anonymity prevents straightforward re-identification, l-diversity guards against attribute disclosure from homogeneous groups, and t-closeness limits information gain about sensitive attributes.

K-anonymity
expand arrow
AgeZipcodeDisease
2512345Flu
2512345Cold
2512345Allergy

Here, quasi-identifiers Age and Zipcode are the same for three records, achieving 3-anonymity.

L-diversity
expand arrow
AgeZipcodeDisease
3054321Cancer
3054321Flu
3054321Cold

This group is not only 3-anonymous but also 3-diverse, as there are three different diseases in the group, reducing the risk of deducing the sensitive value.

T-closeness
expand arrow
AgeZipcodeDisease
4067890Allergy
4067890Allergy
4067890Cold

If Allergy makes up 2/3 of this group but only 10% of the overall dataset, the group does not satisfy t-closeness, as the sensitive value's distribution is too different from the overall dataset.

Note
Study More

Classical anonymization techniques like k-anonymity, l-diversity, and t-closeness can be vulnerable to modern attacks, such as those exploiting background knowledge or linking with external datasets. These limitations motivate the development of stronger privacy models, including differential privacy.

1234567891011121314
import pandas as pd # Synthetic dataset data = pd.DataFrame({ "Age": [25, 25, 25, 30, 30, 30, 40, 40, 40], "Zipcode": [12345, 12345, 12345, 54321, 54321, 54321, 67890, 67890, 67890], "Disease": ["Flu", "Cold", "Allergy", "Cancer", "Flu", "Cold", "Allergy", "Allergy", "Cold"] }) # Group data by quasi-identifiers for k-anonymity (k=3) k_anonymous_groups = data.groupby(["Age", "Zipcode"]) for name, group in k_anonymous_groups: print(f"Group: Age={name[0]}, Zipcode={name[1]}") print(group, end="\n\n")
copy

1. Which of the following best defines k-anonymity?

2. How does l-diversity differ from t-closeness?

3. Why might k-anonymity, l-diversity, and t-closeness fail to protect privacy in some cases?

question mark

Which of the following best defines k-anonymity?

Select the correct answer

question mark

How does l-diversity differ from t-closeness?

Select the correct answer

question mark

Why might k-anonymity, l-diversity, and t-closeness fail to protect privacy in some cases?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2
some-alt