From Hard to Soft Clustering in Machine Learning: Evaluating Deep Learning Feature Extractors for Ambiguous Multi-Label Classification

Authors

Rashanjot Kaur and Eugene Pinsky

Machine Learning Research

Evaluates deep learning feature extractors ConvNeXt, ViT, DINOv2 for soft clustering in ambiguous multi-label classification. Tests 30 configurations across 190 artists and 7 art movements. Soft clustering methods (FCM, GMM) surface 10–19% multi-movement assignments that hard clustering suppresses.

Deep LearningClusteringMulti-Label ClassificationConvNeXtViTDINOv2FCMGMMK-Means++

The crisis

Hard classification forces binary decisions on inherently ambiguous data a mismatch that propagates error through downstream systems
In cultural heritage, medical imaging, and multi-label content moderation, hard labels erase valid ambiguity
Machine learning pipelines built on hard clustering export that ambiguity loss silently systems appear confident where uncertainty is real
Soft clustering recovers the ambiguity signal: 10–19% of art works belong meaningfully to multiple movements hard clustering marks all of them as errors

About this research

Most classification pipelines assign a single hard label to each input, even when genuine ambiguity exists. For multi-label domains art history, medical imaging, content classification this is a category error. This paper evaluates deep learning feature extractors as representations for soft clustering, asking whether richer feature spaces surface ambiguity that simpler representations and hard clustering suppress. The evaluation covers 30 configurations across three feature extractors (ConvNeXt, ViT, DINOv2) and three clustering methods (FCM, GMM, K-Means++) on a corpus of 190 artists and 7 art movements. Soft clustering methods recover 10–19% multi-movement assignments cases where an artist's style spans multiple movements and a hard label would be wrong.

Research question

Do deep learning feature extractors (ConvNeXt, ViT, DINOv2) surface recoverable ambiguity in multi-label classification that hard clustering suppresses, and does feature extractor choice determine the degree of ambiguity recovery?

Methodology

30-configuration evaluation grid across 3 feature extractors × 3 clustering methods; corpus of 190 artists across 7 art movements (10–19% ground-truth multi-movement); feature extraction via ConvNeXt, ViT, and DINOv2 pretrained models; soft clustering via FCM and GMM; hard baseline via K-Means++; evaluation metrics for ambiguity recovery rate, cluster separation, and multi-label assignment accuracy; ablation by extractor and clustering method.

Key findings

(Under Review)

References

Zadeh (1965) Fuzzy sets, Information and Control
Caron et al. (2021) Emerging properties in self-supervised vision transformers (DINO), ICCV
Liu et al. (2022) A ConvNet for the 2020s (ConvNeXt), CVPR
Dosovitskiy et al. (2020) An image is worth 16×16 words (ViT), ICLR

UNDER REVIEW

Submitted to: MAKE · MDPI · Q1 · IF 6.0 · WoS & Scopus

Suggested citation

Kaur, R. & Pinsky, E. (2026)

Roles & contributors

Team

Lead Researcher / First Author

Filled

Rashanjot Kaur

Designed evaluation pipeline, configuration grid, feature extraction experiments, soft clustering analysis, and paper. First author.

Skills: Deep Learning, Clustering, ML Evaluation, Feature Extraction, Research Design

Faculty Advisor / Co-author

Filled

Prof. Eugene Pinsky

Academic advisor and co-author. Supervised research methodology and paper submission.

Skills: Machine Learning, Research Methodology, Academic Mentorship

Faculty advisor

Prof. Eugene Pinsky

← Previous: Operational Resilience Under Carbon Constraints: A Socio-Technical Multi-Agentic Approach to Global Supply Chains Next: Elementary and Robust Distribution Shape Analysis via Mean Absolute Deviations and Quantile-Based Quadrature Approximations →