Master Pandas value_counts(): Count Unique Values in Python

Want to efficiently analyze the frequency of values in your Pandas DataFrames? The value_counts() function is your friend. This article dives into how to use it, complete with practical examples. Learn how to count unique values, normalize results, and handle NaN values like a pro.

Pandas Logo

What is Pandas `value_counts()`?

Pandas value_counts() is a powerful tool for determining the frequency of unique values within a Pandas Series or Index. It's like a built-in histogram, giving you a quick overview of your data's distribution.

Counts the occurrences of each unique value.
Returns a Series with unique values as the index and their frequencies as values.
Excludes NA values by default, keeping your analysis clean.

Syntax of `value_counts()`

The basic syntax is straightforward:

Index.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Let's break down the parameters:

normalize: If True, returns relative frequencies (proportions) instead of raw counts.
sort: If True, sorts the results by frequency.
ascending: If True, sorts in ascending order.
bins: Groups values into bins; useful for numerical data.
dropna: If True, excludes NaN values from the counts.

Example 1: Counting Student Names

Let's start with a simple example using student names.

import pandas as pd

# Create an index of student names
idx = pd.Index(['Harry', 'Mike', 'Arther', 'Nick', 'Harry', 'Arther'], name='Student')
print(idx)

Pandas Index Output

Now, let's use value_counts() to see how many times each name appears.

# Count unique values in the index
counts = idx.value_counts()
print(counts)

Value Counts Output

Harry and Arther both appear twice.
Nick and Mike each appear once.
The result is sorted in descending order of frequency.

Example 2: Counting Numerical Data

value_counts() also works with numerical data. Consider an index of integer values:

import pandas as pd

# Create an index of numbers
idx = pd.Index([21, 10, 30, 40, 50, 10, 50])
print(idx)

Numerical Index Output

Now, let's see the distribution of these numbers.

# Count unique values
counts = idx.value_counts()
print(counts)

Numerical value counts output

10 and 50 each appear twice.
21, 30, and 40 each appear once.

Beyond the Basics

value_counts() offers more than just basic counting. Here's how to leverage its advanced features:

Normalization: Get proportions instead of raw counts with normalize=True. Useful for comparing distributions across different datasets.
Handling NaN Values: Control whether to include NaN values with dropna=False. Important for understanding missing data patterns.
Binning: Group numerical data into bins using the bins parameter. Simplifies the analysis of continuous data.

Why is `value_counts()` Important?

Understanding data distributions is crucial for:

Data Cleaning: Identifying outliers and anomalies.
Feature Engineering: Creating new features based on value frequencies.
Statistical Analysis: Gaining insights into data patterns and trends.
Decision Making: Making informed choices based on data frequency.