SciPy - leaves_list() Function



leaves_list(z) is a scipy.cluster.hierarchy module. It helps to identify the "leaf nodes" in a hierarchical clustering, which are essentially the raw data points in their original order after forming a tree-like cluster.

Hierarchical Clustering is the process of grouping similar data points based on their features such as products by their type or cities which have similar temperature.

After clustering to know how the things are grouped we use dendrogram which is a tree-like diagram used to visualize hierarchical clustering. The leaves at the bottom show the individual data points, and as we go higher it shows combined groups, which helps to understand relationships in the data.

The leaves_list(Z) method gives the order of the leaf nodes (i.e., the original data points) in a dendrogram after hierarchical clustering. By knowing this order we can rearrange data to follow the hierarchical clustering order, making patterns or relationships clearer. To reorder, simply index your dataset with the array returned by leaves_list(Z). The concept is better understood with the help of following examples.

Syntax

Following is the syntax of the SciPy leaves_list(Z) method −

leaves_list(z)

Parameters

This method accepts only a single parameter −

  • Z is the linkage matrix that contains hierarchical clustering information.

Return Value

This method returns the 1D array of integers showing the order of data points(leaves) as they appear from left to right in the dendrogram. It helps you to understand the sequence of original data points in the tree.

Example 1

Following is the basic SciPy leaves_list() method that illustrates the result of order of leaf nodes in a dendogram after hierarchical clustering.

import numpy as np
from scipy.cluster.hierarchy import linkage, leaves_list

data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
Z = linkage(data, method='single')
leaf_order = leaves_list(Z)

print("Linkage Matrix:\n", Z)
print("Leaf Order:", leaf_order)

Following is the output of the above code −

Linkage Matrix:
 [[0.         1.         2.82842712 2.]
 [2.         4.         2.82842712 3. ]
 [3.         5.         2.82842712 4. ]]
Leaf Order: [3 2 0 1]

Example 2

This example demonstrates how to reorder the dataset by using leaves_list(Z) to get the data point order after hierarchical clustering. Patterns and relationships become more clear within the dataset when the rearranged data follows the hierarchical clustering structure.

import numpy as np
from scipy.cluster.hierarchy import linkage, leaves_list

data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
Z = linkage(data, method='single')
leaf_order = leaves_list(Z)
reordered_data = data[leaf_order]

print("Leaf Order:", leaf_order)
print("Reordered Data:\n", reordered_data)

Following is the output of the above code −

Leaf Order: [3 2 0 1]
Reordered Data:
 [[7 8]
 [5 6]
 [1 2]
 [3 4]]

Example 3

Following is the example that shows the usage of SciPy leaves_list(Z) method. It creates a dendrogram to visualize how cities with different temperatures are grouped together

In the following output, 9, 4, 5 − index 9 [-16, -15], index 4 [-13, -12] and index 5 [-11, -11] which are grouped together first, suggesting they are more similar. The dendrogram shows how the smaller cluster merge to form larger clusters, helping us understand the temperature grouping patterns.

from scipy.cluster.hierarchy import ward, dendrogram, leaves_list
from scipy.spatial.distance import pdist
from matplotlib import pyplot as plt
temp_in_celsius = [[37, 40], [55, 33], [21, 35], 
                  [-1, -5], [-13, -12], [-11, -11], 
                  [43, 23], [21, 20], [19, 38], 
                  [-16, -15], [-6, -10], [-9, -8]]

Z = ward(pdist(temp_in_celsius))
leaf_order = leaves_list(Z)
print("Order of leaves:", leaf_order)
fig = plt.figure(figsize=(15, 8))
dn = dendrogram(Z)
plt.title("Order of temperature by cities")
plt.show()
scipy_cluster.htm
Advertisements