2024
This project focused on protecting sensitive voter information through systematic anonymization while preserving the dataset’s analytical value. We applied multiple privacy-preserving transformations, including recoding demographic variables, age binning, and removal of direct identifiers, to minimize re-identification risk without compromising usability. To evaluate privacy protection, we assessed disclosure risk using K-anonymity and L-diversity metrics. The anonymized dataset improved from a K-anonymity of 0 in the raw data to 2, while the proportion of groups meeting the 2-diversity threshold increased substantially, indicating stronger resistance to inference attacks. We further analyzed the trade-off between privacy and utility using Chi-square tests, confirming that key statistical relationships between demographic attributes and voting behavior remained consistent after anonymization. The results demonstrated that the applied techniques successfully balanced confidentiality and analytical relevance.