Can only compare identically labeled series objects

I’ve been working with pandas to handle a data set and do some operation and analysis.

For a purpose, I have compare two datasets – especially particular fields.

I have faced the following while comparing two fields in my flow.

Can only compare identically-labeled DataFrame objects

For example, I had my code comparison like below

import pandas as pd dataframe01 = pd.DataFrame(....) dateframe02 = pd.DataFrame(....) # Trying to compare and print print(dataframe01 == dataframe02)

While attempting this approach, I have faced this issue.

From the error message, all I could to understand is labels aren’t matched.

Solution

After referring some docs or online content, I have tried the following approach to compare the data.

print(dataframe01.equals(dataframe02))

This will help us to check whether both the data frames are perfectly matching or not.

There are other options to ignore index labels as well, you can use it based on your needs.

Thanks for reading!

I’ve been working with pandas to handle a data set and do some operation and analysis.

For a purpose, I have compare two datasets – especially particular fields.

I have faced the following while comparing two fields in my flow.

Can only compare identically-labeled DataFrame objects

For example, I had my code comparison like below

import pandas as pd dataframe01 = pd.DataFrame(....) dateframe02 = pd.DataFrame(....) # Trying to compare and print print(dataframe01 == dataframe02)

While attempting this approach, I have faced this issue.

From the error message, all I could to understand is labels aren’t matched.

Solution

After referring some docs or online content, I have tried the following approach to compare the data.

print(dataframe01.equals(dataframe02))

This will help us to check whether both the data frames are perfectly matching or not.

There are other options to ignore index labels as well, you can use it based on your needs.

Thanks for reading!

If you try to compare DataFrames with different indexes using the equality comparison operator ==, you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==.

For example, df1.equals(df2), which ignores the indexes.

Alternatively, you can use reset_index to reset the indexes back to the default 0, 1, 2, ... For example, df1.reset_index(drop=True).equals(df2.reset_index(drop=True)).

This tutorial will go through the error find detail and how to solve it with code examples.

ValueError: Can only compare identically-labeled DataFrame objects

In Python, a value is a piece of information stored within a particular object. We will encounter a ValueError in Python when using a built-in operation or function that receives an argument that is the right type but an inappropriate value. The data we want to compare is the correct type, DataFrame, but the DataFrames have the inappropriate indexes for comparison.

Example

Let’s look at an example of two DataFrames that we want to compare. Each DataFrame contains the bodyweight and maximum bench presses in kilograms for six lifters. The indexes for the two DataFrames are different.

import pandas as pd df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) print(df1) print(df2)

Let’s run this part of the program to see the DataFrames:

Bodyweight (kg) Bench press (kg) lifter_1 76 135 lifter_2 84 150 lifter_3 93 170 lifter_4 106 140 lifter_5 120 180 lifter_6 56 155 Bodyweight (kg) Bench press (kg) lifter_A 76 145 lifter_B 84 120 lifter_C 93 180 lifter_D 106 220 lifter_E 120 175 lifter_F 56 110e

Let’s compare the DataFrames using the equality operator:

print(df1 == df2)

Let’s run the code to see the result:

ValueError: Can only compare identically-labeled DataFrame objects

The ValueError occurs because the first DataFrame has indexes: ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6'] and the second DataFrame has indexes: ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F'].

Solution #1: Use DataFrame.equals

To solve this error, we can use the DataFrame.equals function. The equals function allows us compare two Series or DataFrames to see if they have the same shape or elements. Let’s look at the revised code:

print(df1.equals(df2))

Let’s run the code to see the result:

False

Solution #2: Use DataFrame.equals with DataFrame.reset_index()

We can drop the indexes of the DataFrames using the reset_index() method, then we can compare the DataFrames. To drop the indexes, we need to set the parameter drop = True. Let’s look at the revised code:

df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93, 106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76, 84, 93, 106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) df1 = df1.reset_index(drop=True) df2 = df2.reset_index(drop=True) print(df1) print(df2)

Let’s look at the DataFrames with their indexes dropped:

Bodyweight (kg) Bench press (kg) 0 76 145 1 84 120 2 93 180 3 106 220 4 120 175 5 56 110 Bodyweight (kg) Bench press (kg) 0 76 145 1 84 120 2 93 180 3 106 220 4 120 175 5 56 110

There are two ways we can compare the DataFrames:

  • The whole DataFrame
  • Row-by-row comparison

Entire DataFrame Comparison

We can use the equals() method to see if all elements are the same in both DataFrame objects. Let’s look at the code:

print(df1.equals(df2))

Let’s run the code to see the result:

True

Row-by-Row DataFrame Comparison

We can check that individual rows are equal using the equality operator once the DataFrames indexes are reset. Let’s look at the code:

print(df1 == df2)

Let’s run the code to see the result:

Bodyweight (kg) Bench press (kg) 0 True True 1 True True 2 True True 3 True True 4 True True 5 True True

Note that the comparison is done row-wise for each column independently.

Solution #3: Use numpy.array_equal

We can also use numpy.array_equal to check if two arrays have the same shape and elements. We can extract arrays from the DataFrame using .values. Let’s look at the revised code:

import pandas as pd import numpy as np df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) print(np.array_equal(df1.values, df2.values))

Let’s run the code to see the result:

False

We can use array_equal to compare individual columns. Let’s look at the revised code:

import pandas as pd import numpy as np df1 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[135, 150, 170, 140, 180, 155]}, index = ['lifter_1', 'lifter_2', 'lifter_3', 'lifter_4', 'lifter_5', 'lifter_6']) df2 = pd.DataFrame({'Bodyweight (kg)':[76,84, 93,106, 120, 56], 'Bench press (kg)':[145, 120, 180, 220, 175, 110]}, index = ['lifter_A', 'lifter_B', 'lifter_C', 'lifter_D', 'lifter_E', 'lifter_F']) # Get individual columns of DataFrames using iloc df1_bodyweight = df1.iloc[:,0] df1_bench = df1.iloc[:,1] df2_bodyweight = df2.iloc[:,0] df2_bench = df2.iloc[:,1] # Compare bodyweight and bench columns separately print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values)) print(np.array_equal(df1_bench.values, df2_bench.values))

Let’s run the code to see the result:

True False

The above result informs us that the first column contains the same elements between the two DataFrames, the second column contains different elements between the two DataFrames.

Summary

Congratulations on reading to the end of this tutorial! The ValueError: Can only compare identically-labeled DataFrame objects occurs when trying to compare two DataFrames with different indexes. You can either reset the indexes using reset_index() or use the equals() function which ignores the indexes. You can also use the NumPy method array_equal to compare the two DataFrames’ columns.

For further reading on errors involving Pandas, go to the articles:

  • How to Solve Pandas TypeError: empty ‘dataframe’ no numeric data to plot
  • How to Solve Python ValueError: You are trying to merge on object and int64 columns

For further reading on Pandas, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Have fun and happy researching

Suf

Suf is a research scientist at Moogsoft, specializing in Natural Language Processing and Complex Networks. Previously he was a Postdoctoral Research Fellow in Data Science working on adaptations of cutting-edge physics analysis techniques to data-intensive problems in industry. In another life, he was an experimental particle physicist working on the ATLAS Experiment of the Large Hadron Collider. His passion is to share his experience as an academic moving into industry while continuing to pursue research. Find out more about the creator of the Research Scientist Pod here and sign up to the mailing list here!

How do you fix can only compare identically

If you try to compare DataFrames with different indexes using the equality comparison operator == , you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==. For example, df1. equals(df2) , which ignores the indexes.

Can only compare identically

Can only compare identically-labeled series objects: It is Value Error, occurred when we compare 2 different DataFrames (Pandas 2-D Data Structure). If we compare DataFrames which are having different labels or indexes then this error can be thrown.

How do I compare objects in Pandas?

Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare two Pandas series?

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

Toplist

Latest post

TAGs