Seaborn - Function Reference

Seaborn Cheatsheet

The Seaborn cheatsheet provides a quick reference to all its fundamental topics. Seaborn is built on top of the Matplotlib library, which is used for data visualization. By learning this cheat sheet, you can get the idea of plotting the graph in various ways. Go through the cheat sheet and learn the Seaborn library.

Basic Overview of Seaborn
Basic Plots
Distribution and Relationship Plots
Categorical Data Visualization
Matrix and Heatmap Visualizations
Customization and Styling
Advanced Concepts

1. Basic Overview of Seaborn

Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface to create statistical graphics. This is useful for visualizing complex datasets with a minimum level of codes.

i. Installing Seaborn

To install the Seaborn library of Python, use the following command −

pip install seaborn

ii. Importing Seaborn and Matplotlib

Seaborn is a data visualization library that is built on top of Matplotlib. To import both libraries, use the below lines of code −

import seaborn as sns
import matplotlib.pyplot as plt

iii. Seaborn vs Matplotlib

When it comes to data visualization in Python, two popular libraries are Seaborn and Matplotlib. While both can be used for creating plots and charts, they serve different purposes. The seaborn library of Python is more comfortable to handle the Pandas data frames.

iv. Seaborn Built-in Datasets

In Seaborn, built-in datasets are preloaded with the library and can be easily accessed using sns.get_dataset_names(). Here, we are providing the list of datasets −

Datasets Name	Description
tips	Data on restaurant tips, useful for categorical and numerical analysis.
iris	Famous Iris dataset with flower species and measurements.
penguin	Data on penguin species, similar to Iris but for birds.
titanic	Titanic passengers data with survival details.
flights	Monthly airline passengers data over time.
diamonds	Data on diamond prices and attributes.

Below is the implementation of seaborn built-in datasets −

# Use any datasets which is mentioned above
df = sns.load_dataset('iris') 

# Display first 5 rows 
print(df.head())

2. Basic Plots

The basic plots of Seaborn summarize and explore the data, making it easier to identify patterns, trends, and correlations.

i. Scatter Plot

In Seaborn, a scatter plot is used to visualize the relationship between two numericals. The data points are represented using a dot on the graph. So, this identifies the pattern and correlation.

sns.scatterplot(data=df, x="x_col", y="y_col")

ii. Line Plot

In Seaborn, line plots are used to identify the trend over time or continuous data. This is mostly useful time series analysis.

sns.lineplot(data=df, x="x_col", y="y_col")

iii. Bar Plot

In Seaborn, a bar plot is used to compare the categorical data.

sns.barplot(data=df, x="x_col", y="y_col")

iv. Histogram

A histogram is used to show the distribution of a numerical variable. It divides data into bins and counts.

sns.histplot(data=df, x="col_name")

v. KDE Plot

In Seaborn, KDE Plot stands for "Kernel Density Estimate". This is a standard version of a histogram that estimates the probability density function of a continuous variable. So, it is useful to understand the density distribution of data.

sns.kdeplot(data=df, x="col_name")

vi. ECDF Plot

The ECDF plot stands for "Empirical Cumulative Distribution Function" that shows the proportion of data points. It is below the given value.

sns.ecdfplot(data=df, x="col_name")

3. Distribution and Relationship Plots

The distribution plots helps user to understand how data is distributed, whether it is normal, skewed, or has outliers. While relationship plots helps user to identifying patterns, correlations, or trends between them.

i. Pair Plot

A pair plot is also known as a scatter plot matrix. It's a grid of scatter plots that shows how variables in the dataset relate to each other.

sns.pairplot(data=df)

ii. Joint Plot

The joint plot draw the plots between two variable with bivariate and univariate graphs.

sns.jointplot(data=df, x="x_col", y="y_col")

iii. Rug Plot

In Seaborn, a rug plot is a type of graph that shows the data distribution along with the axis (tick marks or short lines).

sns.rugplot(data=df, x="col_name")

iv. Regression Plot

The regression plot sets the regression line between two variables, where data points are plotted into a complete scatter plot. This helps visualize the relationship between the variables and understand trends, patterns, and correlations in the data.

sns.regplot(data=df, x="x_col", y="y_col")

v. Residual Plot

In Seaborn, a residual plot is a graphical tool that shows the difference between actual and predicted values in a regression model.

sns.residplot(data=df, x="x_col", y="y_col")

4. Categorical Data Visualization

Categorical data visualization in Seaborn means creating plots to compare different groups or categories using graphs like bar plots, count plots, box plots, and violin plots. These help in understanding patterns, distributions, and differences between categories easily.

i. Box Plot

In Seaborn, box plots show the average value of a numerical variable for each category.

sns.boxplot(data=df, x="x_col", y="y_col")

ii. Violin Plot

The violin plot of Seaborn combines a box plot with a KDE to show data distribution.

sns.violinplot(data=df, x="x_col", y="y_col")

iii. Strip Plot

The strip plot in Seaborn locates the individual data points along a category axis.

sns.stripplot(data=df, x="x_col", y="y_col")

iv. Swarm Plot

The swarm plot of Seaborn is similar to a strip plot but adjusts for overlapping points.

sns.swarmplot(data=df, x="x_col", y="y_col")

v. Count Plot

The count plot of Seaborn displays the count of each category.

sns.countplot(data=df, x="col_name")

vi. Point Plot

A point plot represents an estimate of the central tendency for a numerical variable and uses error bars to show the degree of uncertainty in the estimate.

sns.pointplot(data=df, x="x_col", y="y_col")

vii. Cat Plot

The Cat Plot is useful for understanding the statistical graph that visually represents the reference of datasets. This identify the pattern and trends in categorical data.

sns.catplot(data=df, x="x_col", y="y_col", kind="box")

5. Matrix and Heatmap Visualizations

Matrix and heatmap visualization are data representation techniques used to visualize tabular data, such as matrices, correlation matrices, or data frames.

i. Correlation Heatmap

A correlation heatmap displays the correlation between several variables as a color-coded matrix.

sns.heatmap(data=df.corr())

ii. Cluster Map

In a cluster map, rows and columns are arranged by similarity, placing similar ones next to each other.

sns.clustermap(data=df)

6. Customization and Styling

Seaborn allows users to modify the look of the plots, which makes them more visually appealing and easy to understand.

i. Customizing Colors

Change the color palette of your plots to enhance readability and aesthetics.

sns.color_palette(palette = None, n_colors = None, desat = None)

ii. Setting Style

By setting style in Seaborn, we can adjust the background and grid style to match your preferred theme.

sns.set_style("darkgrid")

iii. Changing Figure Size

By changing the figure size, we can modify the overall size of the plot to fit different presentation needs.

sns.set_context("talk")

iv. Adding Titles and Labels

By adding titles and axis labels, we can make the plots more informative.

plt.title("My Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")

7. Advanced Concepts

Here is the list of two advanced concepts of Seaborn given below −

i. Working with Subplots

While working with subplots means arranging multiple plots in a single figure for better visualization and comparison.

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create a figure and axes
figure, axes = plt.subplots()

# Set the title for the entire figure
figure.suptitle('Tutorialspoint - one axes with no data')

# Display the plot (even with no data)
plt.show()

The above code produces the following result −

ii. Handling Missing Data in Seaborn

In the below program, we demonstrate the code for handling missing data in Seaborn −

import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset with missing values
df = sns.load_dataset('titanic')

# Drop missing values for a clean plot
df_cleaned = df.dropna()

# Create a bar plot (without missing data)
sns.barplot(x='class', y='age', data=df_cleaned)
plt.show()

The above code produces the following result −

Print Page