Apprendre Re-Identification Risks and Privacy Attacks

Glissez pour afficher le menu

Re-identification is a significant risk in the field of data privacy, especially when datasets that have been "anonymized" are released for research, analysis, or public use. Even after removing direct identifiers such as names or social security numbers, attackers can often use a combination of seemingly innocuous attributes — like ZIP code, birthdate, or gender — to uniquely identify individuals. These attributes, while not unique on their own, can become unique when combined, making re-identification possible.

Attackers exploit what is known as auxiliary information — data from external sources that can be cross-referenced with the anonymized dataset. For example, public voter records or social media profiles may contain attributes that, when matched with the anonymized data, enable the identification of individuals. This process is called a privacy attack, and it demonstrates that simply removing direct identifiers is not enough to guarantee privacy.

Definition

Quasi-identifiers are sets of attributes in a dataset that, while not unique identifiers by themselves, can be combined with external information to re-identify individuals. Examples include combinations like ZIP code, birthdate, and gender.


              123456789101112131415
            
import pandas as pd

# Create a synthetic dataset
data = {
    "zip_code": ["02138", "02139", "02138", "02140"],
    "birthdate": ["1980-05-12", "1975-09-23", "1990-11-02", "1980-05-12"],
    "gender": ["F", "M", "F", "M"]
}
df = pd.DataFrame(data)

# Check for unique combinations of quasi-identifiers
unique_rows = df.groupby(["zip_code", "birthdate"]).size().reset_index(name='count')

print("Combinations of ZIP code and birthdate:")
print(unique_rows)

1. Which of the following best explains why re-identification is possible in anonymized datasets?

2. How does auxiliary information contribute to privacy attacks?

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 1

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 1. Chapitre 1