NumPy nanvar() Function



The NumPy nanvar() function computes the variance of array elements along a specified axis, ignoring NaN values. This function measures the spread or dispersion of a distribution while excluding NaN values from the calculation. By default, the variance is computed for the flattened array, but it can also be calculated along a specific axis.

In statistics, the variance is a measure of the spread of a data set. The formula is var = sum((x_i - mean)^2) / N, where x_i is each data point, mean is the mean of the data, and N is the number of data points. For the nanvar() function, NaN values are ignored in the calculation.

For a one-dimensional array, the variance is computed over all elements excluding NaN. For multi-dimensional arrays, the variance is computed along the specified axis while ignoring NaN values.

Syntax

Following is the syntax of the NumPy nanvar() function −

numpy.nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, where=<no value>, mean=<no value>, correction=<no value>)

Parameters

Following are the parameters of the NumPy nanvar() function −

  • a: Input array or object that can be converted to an array. It can be a NumPy array, list, or a scalar value.
  • axis (optional): Axis or axes along which the variance is computed. Default is None, which means the variance is computed over the entire array.
  • dtype (optional): Data type to use in computing the variance. If None, it is inferred from the input array.
  • out (optional): A location into which the result is stored. If provided, it must have the same shape as the expected output.
  • ddof (optional): Delta Degrees of Freedom. The divisor used in the calculation is N - ddof, where N is the number of elements (excluding NaN). Default is 0.
  • keepdims (optional): If True, the reduced dimensions are retained as dimensions of size one in the output. Default is False.
  • where (optional): A boolean array specifying the elements to include in the calculation.
  • mean (optional): Provides the mean to prevent its re-calculation. The shape of the mean should match as if calculated with keepdims=True.
  • correction (optional): Controls the calculation of variance, with options for modifying degrees of freedom and more.

Return Values

This function returns the variance of the input array, ignoring NaN values. The result is a scalar if the input is one-dimensional, and an array if the input is multi-dimensional.

Example

Following is a basic example to compute the variance of an array using the NumPy nanvar() function −

import numpy as np
# input array with NaN values
x = np.array([1, 2, np.nan, 4, 5])
# applying nanvar
result = np.nanvar(x)
print("Variance Result (ignoring NaN):", result)

Output

Following is the output of the above code −

Variance Result (ignoring NaN): 2.5

Example: Specifying an Axis

The nanvar() function can compute the variance along a specific axis of a multi-dimensional array while ignoring NaN values. In the following example, we have computed the variance along axis 0 (columns) and axis 1 (rows) of a 2D array −

import numpy as np
# 2D array with NaN values
x = np.array([[1, 2, np.nan], [4, np.nan, 6], [7, 8, 9]])
# applying nanvar along axis 0 (columns)
result_axis0 = np.nanvar(x, axis=0)
# applying nanvar along axis 1 (rows)
result_axis1 = np.nanvar(x, axis=1)
print("Variance along axis 0 (ignoring NaN):", result_axis0)
print("Variance along axis 1 (ignoring NaN):", result_axis1)

Output

Following is the output of the above code −

Variance along axis 0 (ignoring NaN): [6.  9.  2.25]
Variance along axis 1 (ignoring NaN): [0.25 1.   0.66666667]

Example: Usage of 'ddof' Parameter

The ddof (Delta Degrees of Freedom) parameter adjusts the divisor used in the variance calculation. By default, ddof=0, but it can be set to a different value to customize the calculation. In the following example, we have computed the variance with ddof=1

import numpy as np
# input array with NaN values
x = np.array([1, 2, np.nan, 4, 5])
# applying nanvar with ddof=1
result = np.nanvar(x, ddof=1)
print("Variance with ddof=1 (ignoring NaN):", result)

Output

Following is the output of the above code −

Variance with ddof=1 (ignoring NaN): 3.3333333333333335

Example: Plotting 'nanvar()' Function

In the following example, we plot the behavior of the nanvar() function. We calculate and plot the variance for different sizes of input arrays while ignoring NaN values −

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
x[::10] = np.nan  # introduce NaN values
# compute variance ignoring NaN
y = np.nanvar(x)
plt.plot(x, np.full_like(x, y, dtype=np.float64), label="Variance (ignoring NaN)")
plt.title("Variance Function (ignoring NaN)")
plt.xlabel("Input")
plt.ylabel("Variance Value")
plt.legend()
plt.grid()
plt.show()

Output

The plot demonstrates the variance value across the input range while ignoring NaN values −

Variance Visualization (ignoring NaN)
numpy_statistical_functions.htm
Advertisements