Can only compare identically labeled series objects

I’ve been working with pandas to handle a data set and do some operation and analysis.

For a purpose, I have compare two datasets – especially particular fields.

I have faced the following while comparing two fields in my flow.

Can only compare identically-labeled DataFrame objects

For example, I had my code comparison like below

import pandas as pd
dataframe01 = pd.DataFrame(....)
dateframe02 = pd.DataFrame(....)

# Trying to compare and print
print(dataframe01 == dataframe02)

While attempting this approach, I have faced this issue.

From the error message, all I could to understand is labels aren’t matched.

Solution

After referring some docs or online content, I have tried the following approach to compare the data.

print(dataframe01.equals(dataframe02))

This will help us to check whether both the data frames are perfectly matching or not.

There are other options to ignore index labels as well, you can use it based on your needs.

Thanks for reading!

I’ve been working with pandas to handle a data set and do some operation and analysis.

For a purpose, I have compare two datasets – especially particular fields.

I have faced the following while comparing two fields in my flow.

Can only compare identically-labeled DataFrame objects

For example, I had my code comparison like below

import pandas as pd
dataframe01 = pd.DataFrame(....)
dateframe02 = pd.DataFrame(....)

# Trying to compare and print
print(dataframe01 == dataframe02)

While attempting this approach, I have faced this issue.

From the error message, all I could to understand is labels aren’t matched.

Solution

After referring some docs or online content, I have tried the following approach to compare the data.

print(dataframe01.equals(dataframe02))

This will help us to check whether both the data frames are perfectly matching or not.

There are other options to ignore index labels as well, you can use it based on your needs.

Thanks for reading!

If you try to compare DataFrames with different indexes using the equality comparison operator ==, you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==.

For example, df1.equals(df2), which ignores the indexes.

Alternatively, you can use reset_index to reset the indexes back to the default 0, 1, 2, ... For example, df1.reset_index(drop=True).equals(df2.reset_index(drop=True)).

This tutorial will go through the error find detail and how to solve it with code examples.


ValueError: Can only compare identically-labeled DataFrame objects

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to compare is the correct type, DataFrame, but the DataFrames have the inappropriate indexes for comparison.

Example

Let’s look at an example of two DataFrames that we want to compare. Each DataFrame contains the bodyweight and maximum bench presses in kilograms for six lifters. The indexes for the two DataFrames are different.

import pandas as pd

df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

print(df1)

print(df2)

Let’s run this part of the program to see the DataFrames:

    Bodyweight (kg)  Bench press (kg)
lifter_1               76               135
lifter_2               84               150
lifter_3               93               170
lifter_4              106               140
lifter_5              120               180
lifter_6               56               155
          Bodyweight (kg)  Bench press (kg)
lifter_A               76               145
lifter_B               84               120
lifter_C               93               180
lifter_D              106               220
lifter_E              120               175
lifter_F               56               110e

Let’s compare the DataFrames using the equality operator:

print(df1 == df2)

Let’s run the code to see the result:

ValueError: Can only compare identically-labeled DataFrame objects

The ValueError occurs because the first DataFrame has indexes: ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'] and the second DataFrame has indexes: ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'].

Solution #1: Use DataFrame.equals

To solve this error, we can use the DataFrame.equals function. The equals function allows us compare two Series or DataFrames to see if they have the same shape or elements. Let’s look at the revised code:

print(df1.equals(df2))

Let’s run the code to see the result:

False

Solution #2: Use DataFrame.equals with DataFrame.reset_index()

We can drop the indexes of the DataFrames using the reset_index() method, then we can compare the DataFrames. To drop the indexes, we need to set the parameter drop = True. Let’s look at the revised code:

df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93, 106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76, 84, 93, 106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print(df1)
print(df2)

Let’s look at the DataFrames with their indexes dropped:

   Bodyweight (kg)  Bench press (kg)
0               76               145
1               84               120
2               93               180
3              106               220
4              120               175
5               56               110
   Bodyweight (kg)  Bench press (kg)
0               76               145
1               84               120
2               93               180
3              106               220
4              120               175
5               56               110

There are two ways we can compare the DataFrames:

  • The whole DataFrame
  • Row-by-row comparison

Entire DataFrame Comparison

We can use the equals() method to see if all elements are the same in both DataFrame objects. Let’s look at the code:

print(df1.equals(df2))

Let’s run the code to see the result:

True

Row-by-Row DataFrame Comparison

We can check that individual rows are equal using the equality operator once the DataFrames indexes are reset. Let’s look at the code:

print(df1 == df2)

Let’s run the code to see the result:

   Bodyweight (kg)  Bench press (kg)
0             True              True
1             True              True
2             True              True
3             True              True
4             True              True
5             True              True

Note that the comparison is done row-wise for each column independently.

Solution #3: Use numpy.array_equal

We can also use numpy.array_equal to check if two arrays have the same shape and elements. We can extract arrays from the DataFrame using .values. Let’s look at the revised code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

print(np.array_equal(df1.values, df2.values))

Let’s run the code to see the result:

False

We can use array_equal to compare individual columns. Let’s look at the revised code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[135, 150, 170, 140, 180, 155]},
index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'])

df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56],
'Bench press (kg)':[145, 120, 180, 220, 175, 110]},
index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'])

# Get individual columns of DataFrames using iloc
df1_bodyweight = df1.iloc[:,0]
df1_bench = df1.iloc[:,1]

df2_bodyweight = df2.iloc[:,0]
df2_bench = df2.iloc[:,1]

# Compare bodyweight and bench columns separately 

print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values))
print(np.array_equal(df1_bench.values, df2_bench.values))

Let’s run the code to see the result:

True
False

The above result informs us that the first column contains the same elements between the two DataFrames, the second column contains different elements between the two DataFrames.

Summary

Congratulations on reading to the end of this tutorial! The ValueError: Can only compare identically-labeled DataFrame objects occurs when trying to compare two DataFrames with different indexes. You can either reset the indexes using reset_index() or use the equals() function which ignores the indexes. You can also use the NumPy method array_equal to compare the two DataFrames’ columns.

For further reading on errors involving Pandas, go to the articles:

  • How to Solve Pandas TypeError: empty ‘dataframe’ no numeric data to plot
  • How to Solve Python ValueError: You are trying to merge on object and int64 columns

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Have fun and happy researching

Can only compare identically labeled series objects

Suf

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!

How do you fix can only compare identically

If you try to compare DataFrames with different indexes using the equality comparison operator == , you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==. For example, df1. equals(df2) , which ignores the indexes.

Can only compare identically

Can only compare identically-labeled series objects: It is Value Error, occurred when we compare 2 different DataFrames (Pandas 2-D Data Structure). If we compare DataFrames which are having different labels or indexes then this error can be thrown.

How do I compare objects in Pandas?

Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare two Pandas series?

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.