Election Survey - Data Anonymization and Privacy Analysis

Security and Privacy

Skills Gained

Data Anonymization & Privacy Techniques
Disclosure Risk Assessment (K-Anonymity, L-Diversity)
Statistical Testing (Chi-Square)
Sensitive Data Handling
Privacy–Utility Trade-off Analysis

2024
This project focused on protecting sensitive voter information through systematic anonymization while preserving the dataset’s analytical value. We applied multiple privacy-preserving transformations, including recoding demographic variables, age binning, and removal of direct identifiers, to minimize re-identification risk without compromising usability. To evaluate privacy protection, we assessed disclosure risk using K-anonymity and L-diversity metrics. The anonymized dataset improved from a K-anonymity of 0 in the raw data to 2, while the proportion of groups meeting the 2-diversity threshold increased substantially, indicating stronger resistance to inference attacks. We further analyzed the trade-off between privacy and utility using Chi-square tests, confirming that key statistical relationships between demographic attributes and voting behavior remained consistent after anonymization. The results demonstrated that the applied techniques successfully balanced confidentiality and analytical relevance.

Return