Factors Affecting Bird Extinction
2023
In this project, I and my classmates explored the environmental and biological factors that contribute to the extinction of bird species. By analyzing data on migratory patterns, population size, and habitat types, we built multiple linear regression models to predict extinction risks for various bird species. The project highlighted how migratory birds with smaller populations face a significantly higher risk of extinction compared to larger, non-migratory species.
First Year of Bachelors
2022/2023
Predicting a skin lesion’s state
2023
In this project, our team developed a machine learning models to classify skin lesions, focusing on identifying types linked to skin cancer. Using the PAD-UFES-20 dataset, which includes clinical images and metadata of various skin lesions, we preprocessed images with Multi-Otsu thresholding and Snake Contour segmentation to isolate key features like color, asymmetry, and texture. We tested models including K-Nearest Neighbors, Nearest Centroid, Logistic Regression, and Random Forest. Through grid and random search, we refined model performance, using metrics like accuracy, F1-score, and AUC-ROC to evaluate robustness and reliability. The final model shows potential as a clinical diagnostic tool, achieving optimal accuracy with Random Forest after tuning.
Data Science in Research, Business and Society
Public Transport & CO2 Emissions
2022
This project explored how public transport systems in Denmark, Estonia, Romania, and Germany can be improved to boost usage and reduce CO2 emissions. Using interviews and online reviews, my classmates and I identified common issues such as delays, outdated infrastructure, and limited networks, along with positives like affordable fares. We proposed country-specific improvements, focusing on the need for electric transportation and better infrastructure to lower emissions and make public transport more appealing. This research demonstrates how effective public transport systems can help combat global warming by reducing CO2 output.
- Data Modeling and Statistical Analysis
- Regression Analysis and ANOVA
- Outlier Management
- Data Visualization with ggplot2 in R
- Data Preprocessing & Image Segmentation
- Machine Learning Modeling (KNN, Logistic Regression, Nearest Centroid, Random Forest)
- Feature Extraction (Asymmetry & Color)
- Model Evaluation & Hyperparameter Tuning (Cross-validation, Accuracy, F1-score, and AUC-ROC)
- Critical Analysis & Research Writing
- Interviewing and Data Collection
- Qualitative Data Analysis
- Critical Thinking
- Research Writing