NumPy nanquantile() Function



The NumPy nanquantile() function computes the q-th quantile (or percentile) of the data along a specified axis, ignoring NaN (Not a Number) values. If multiple probability levels are given, first axis of the result corresponds to the quantiles.

The other axes are the axes that remain after the reduction of input data. If the input data contains integers or floats smaller than float64, the output data-type is float64. Otherwise, the output data-type is the same as that of the input. If out is specified, that array is returned instead.

In NumPy, the quantile() and nanquantile() functions allow for the calculation of the q-th quantile value. The only difference is that the nanquantile() function excludes NaN values from the computation, whereas the quantile() function includes them in the calculation.

Syntax

Following is the syntax of the NumPy nanquantile() function −

numpy.nanquantile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False)

Parameters

Following are the parameters of the NumPy nanquantile() function −

  • a: Input array, which can be a NumPy array, list, or scalar value. The input data may contain NaN values.
  • q: The quantile(s) to compute, which can be a single number or a list/array of numbers. It should be in the range [0, 1] (for example, 0.25 for the 25th percentile, 0.5 for the median).
  • axis (optional): The axis along which to compute the quantile. Default is None, which means that the quantile is computed over the entire array.
  • out (optional): A location into which the result is stored. If provided, it must have the same shape as the expected output.
  • overwrite_input (optional): If True, the input array is modified in place. Default is False.
  • keepdims (optional): If True, the reduced dimensions are retained as dimensions of size one in the output. Default is False.
  • method (optional): The interpolation method to use when the desired quantile lies between two data points. It can be one of the following −
  • linear (default): Performs linear interpolation between the two closest data points.
  • lower: Returns the lower of the two data points.
  • higher: Returns the higher of the two data points.
  • midpoint: Returns the midpoint of the two data points.
  • nearest: Returns the nearest data point.

Return Values

This function returns the q-th quantile(s) of the input array along the specified axis, ignoring NaN values. The result is a scalar or array depending on the input and the value of the q parameter.

Example

Following is a basic example to compute the 50th percentile (median) of an array using the NumPy nanquantile() function −

import numpy as np
# input array with NaN values
x = np.array([1, 2, 3, 4, np.nan, 6, 7, 8, 9])
# applying nanquantile
result = np.nanquantile(x, 0.5)  # 50th percentile (median)
print("NanQuantile Result (50th percentile):", result)

Output

Following is the output of the above code −

NanQuantile Result (50th percentile): 5.0

Example: Multi-dimensional Array

The nanquantile() function operates on multi-dimensional arrays while ignoring NaN values. In the following example, we have created a 2D NumPy array, and computed the 50th percentile (median) along a specific axis −

import numpy as np
# 2D array with NaN values
x = np.array([[1, 2, 3], [4, 5, np.nan], [7, 8, 9]])
# applying nanquantile along axis 0 (columns)
result = np.nanquantile(x, 0.5, axis=0)
print("NanQuantile Result along axis 0 (median):", result)

Output

Following is the output of the above code −

NanQuantile Result along axis 0 (median): [4. 5. 6.]

Example: Using Different Methods for Interpolation

In the following example, we have used different interpolation methods to compute the 50th percentile of an array, ignoring NaN values. We have demonstrated the linear, lower, higher, and midpoint methods −

import numpy as np
# input array with NaN values
x = np.array([1, 3, 5, np.nan, 9])
# applying nanquantile with different methods
result_linear = np.nanquantile(x, 0.5, method='linear')
result_lower = np.nanquantile(x, 0.5, method='lower')
result_higher = np.nanquantile(x, 0.5, method='higher')
result_midpoint = np.nanquantile(x, 0.5, method='midpoint')
print("Linear Method:", result_linear)
print("Lower Method:", result_lower)
print("Higher Method:", result_higher)
print("Midpoint Method:", result_midpoint)

Output

Following is the output of the above code −

Linear Method: 5.0
Lower Method: 3.0
Higher Method: 7.0
Midpoint Method: 5.0

Example: Plotting 'nanquantile()' Function

In the following example, we have plotted the behavior of the nanquantile function for a range of input values, ignoring NaN values. We have calculated and plotted the 25th, 50th, and 75th percentiles −

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 100)  # input range
y_25 = np.nanquantile(x, 0.25)
y_50 = np.nanquantile(x, 0.5)
y_75 = np.nanquantile(x, 0.75)
plt.plot(x, np.full_like(x, y_25), label="25th percentile")
plt.plot(x, np.full_like(x, y_50), label="50th percentile")
plt.plot(x, np.full_like(x, y_75), label="75th percentile")
plt.title("NanQuantile Function")
plt.xlabel("Input")
plt.ylabel("Quantile Value")
plt.legend()
plt.grid()
plt.show()

Output

The plot demonstrates the constant nature of each quantile (25th, 50th, 75th) as input values remain unchanged for each quantile −

NanQuantile Function Visualization
numpy_statistical_functions.htm
Advertisements