Jumpstart your EDA with Pandas-Profiling

Bring your data to life with just one line of code

Panda profile.jpg from Wikimedia Commons

So what does it do?

As the creators describe it:

All of it happens with just one line of code.

Before we get to that, I will run through a few of the usual EDA steps for comparison:

# import our good friends pandas and seaborn
import pandas as pd
import seaborn as sns
sns.pairplot(df, dropna = True)
Well, that does show us a lot of plots… but what can we see?

Now let’s try this with Pandas Profiling.

There are several installation methods documented here. I will use pip install:

pip install pandas_profiling
import pandas as pd
import pandas_profiling
# read in the heart.csv
heart_df = pd.read_csv('heart.csv')
heart_df.profile_report()
from pandas_profiling import ProfileReportprofile = ProfileReport(heart_df, title='Heart Data Profiling    Report', explorative= True, minimal=False)# to notebook
profile.to_notebook_iframe()
# output to file
profile.to_file(output_file="heart_output.html")

Let’s take a closer look at the report sections

‘Overview’ provides a statistical breakdown of the DataFrame contents.

In conclusion

Does Pandas Profiling replace all of the other methods of EDA?

Data Scientist, Musician, Cocktail Maker — — Let’s connect linkedin.com/in/dannmorr/