NumPy percentile() Function



The NumPy percentile() function computes the nth percentile of the input array along a specified axis. A percentile is a value below which a given percentage of observations fall. It is commonly used in statistics to understand the distribution of data.

The percentile() function operates similarly to the median, but it allows for the calculation of the nth percentile value in the data, taking into account the distribution and repetition of the data points. Unlike the median, which always returns the middle value, the percentile function can provide any specified percentile, even when there are repeated values in the dataset.

The percentile() function performs interpolation when the desired percentile lies between two data points in the array. By default, it uses linear interpolation to estimate the result.

Syntax

Following is the syntax of the NumPy percentile() function −

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, weights=None, interpolation=None)

Parameters

Following are the parameters of the NumPy percentile() function −

  • a: Input array. It can be a NumPy array, list, or scalar value.
  • q: The percentile value or array of percentiles to compute. It should be between 0 and 100.
  • axis (optional): Axis or axes along which the percentiles are computed. If None, the percentile is computed over the entire flattened array.
  • out (optional): Alternate output array to store the result. It must have the same shape as the expected output.
  • overwrite_input (optional): If True, the input array is modified in place. Default is False.
  • weights: If weights=None, then all data in a are assumed to have a weight equal to one. Only method=inverted_cdf supports weights.
  • keepdims (optional): If True, the reduced dimensions are retained as dimensions of size one in the output. Default is False.
  • interpolation(optional): Deprecated name for the method keyword argument.
  • method (optional): Specifies the interpolation method. Options include:
  • linear(default): Linear interpolation between two data points.
  • lower: Use the lower value when the percentile lies between two values.
  • higher: Use the higher value when the percentile lies between two values.
  • midpoint: Use the midpoint of the two values when the percentile lies between them.
  • nearest: Use the nearest value.

Return Values

This function returns the computed percentile(s) as a scalar or a NumPy array, depending on the input. The result is based on the specified interpolation method and axis.

Example

Following is a basic example to compute the 50th percentile (median) of an array using the NumPy percentile() function −

import numpy as np
# input array
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# calculating 50th percentile (median)
percentile_50 = np.percentile(data, 50)
print("50th Percentile:", percentile_50)

Output

Following is the output of the above code −

50th Percentile: 5.5

Example: Percentile Along an Axis

The percentile() function can compute percentiles along a specified axis in multi-dimensional arrays. In the following example, we calculate the 90th percentile along the rows (axis=1) of a 2D array −

import numpy as np
# 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# 90th percentile along rows (axis=1)
percentile_90_rows = np.percentile(data, 90, axis=1)
print("90th Percentile Along Rows:", percentile_90_rows)

Output

Following is the output of the above code −

90th Percentile Along Rows: [ 2.8  5.8  8.8]

Example: Usage of 'method' Parameter

In following example, we have computed the 25th percentile of an array using the 'midpoint' interpolation method −

import numpy as np
# input array
data = np.array([1, 3, 5, 7])
# 25th percentile using 'midpoint' method
percentile_25_midpoint = np.percentile(data, 25, method='midpoint')
print("25th Percentile (Midpoint Method):", percentile_25_midpoint)

Output

Following is the output of the above code −

25th Percentile (Midpoint Method): 2.0

Example: MultiDimensional Arrays with 'percentile()'

The percentile() function also works on multi-dimensional arrays. In the following example, we have calculated the 75th percentile along the columns (axis=0) of a 2D array −

import numpy as np
# 2D array
data = np.array([[1, 3, 5], [2, 4, 6], [3, 5, 7]])
# 75th percentile along columns (axis=0)
percentile_75_columns = np.percentile(data, 75, axis=0)
print("75th Percentile Along Columns:", percentile_75_columns)

Output

Following is the output of the above code −

75th Percentile Along Columns: [2.5 4.5 6.5]

Example: Graphical Representation of 'percentile()'

In the following example, we have plotted percentiles for different interpolation methods applied to a given dataset. The dataset consists of the values [0, 1, 2, 3], and we have computed the percentiles for the range 0 to 100.

Using NumPy, we have calculated the percentiles for each specified method, and matplotlib is used to visualize the results. The plot demonstrates how different interpolation methods affect the percentile estimates, with each method represented by a distinct line style and color −

import numpy as np
import matplotlib.pyplot as plt

# Define the input data and percentiles
a = np.arange(4)  # Data: [0, 1, 2, 3]
p = np.linspace(0, 100, 6001)  # Percentile values: 0 to 100 in 0.01% steps

# Create a figure and axis for plotting
fig, ax = plt.subplots(figsize=(10, 6))

# Define the interpolation methods and their styles
lines = [
    ('linear', '-', 'C0'),
    ('inverted_cdf', ':', 'C1'),
    ('averaged_inverted_cdf', '-.', 'C1'),
    ('closest_observation', ':', 'C2'),
    ('interpolated_inverted_cdf', '--', 'C1'),
    ('hazen', '--', 'C3'),
    ('weibull', '-.', 'C4'),
    ('median_unbiased', '--', 'C5'),
    ('normal_unbiased', '-.', 'C6'),
]

# Plot percentiles for each method
for method, style, color in lines:
    ax.plot(
        p, np.percentile(a, p, method=method),
        label=method, linestyle=style, color=color
    )

# Configure the plot
ax.set(
    title=f'Percentiles for Different Methods and Data: {a}',
    xlabel='Percentile',
    ylabel='Estimated Percentile Value',
    yticks=a
)
ax.legend(bbox_to_anchor=(1.03, 1), loc='upper left')
plt.tight_layout()  # Adjust layout to fit legend
plt.show()

Output

The plot demonstrates the constant 50th percentile line across the range of values −

Percentile Function Visualization
numpy_statistical_functions.htm
Advertisements