Imbalanced Data

Datasets where some categories or outcomes are much more common than others. Think of fraud detection where 99.9% of transactions are legitimate and only 0.1% are fraudulent. This imbalance can make models biased toward predicting the common case, so data scientists use special techniques like oversampling, undersampling, or weighted algorithms to address the imbalance.

← Back to Glossary