An Overview of Indexing Methods for Numpy Arrays

hacking skills

Author

zenggyu

Published

2018-11-22

Abstract

Explains various indexing methods for numpy arrays.

Introduction

Being familiar with the rules of indexing is essential to working fluently with numpy arrays. Although the official documentation provides detailed definitions of the rules (see this and this), they may be too overwhelming for beginners. This is because many details are only relevant to very exotic use cases or backward compatibility. Therefore, this post intends to get rid of some redundancies and summarize the most essential rules concerning numpy array indexing. To be more specific, this post concentrates on two types of indexing: basic indexing and advanced indexing (also known as fancy indexing).

Before diving in, it is useful to note that:

numpy arrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj is the selection object;
which type of indexing occurs depends on the type of selection object;
in Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN] and the latter is just syntactic sugar for the former¹.
since numpy array indexing bears some resemblance to standard Python list indexing, I will focus on the differences and omit the similarities where possible.

¹ I find this to be particularly useful to simplify the mental model for indexing. This is because to some extent it makes indexing a numpy array similar to indexing a regular list.

Example for demonstration

Here I construct an array which wille be used for demonstration in later sections.

import numpy as np

x = np.arange(60).reshape((3, 4, 5))

x
# array([[[ 0,  1,  2,  3,  4],
#         [ 5,  6,  7,  8,  9],
#         [10, 11, 12, 13, 14],
#         [15, 16, 17, 18, 19]],
#        
#        [[20, 21, 22, 23, 24],
#         [25, 26, 27, 28, 29],
#         [30, 31, 32, 33, 34],
#         [35, 36, 37, 38, 39]],
#        
#        [[40, 41, 42, 43, 44],
#         [45, 46, 47, 48, 49],
#         [50, 51, 52, 53, 54],
#         [55, 56, 57, 58, 59]]])

Basic indexing

Basic indexing always returns a view of the data. This type of indexing occurs when obj is either:

an integer;
a slice object (i.e., start:stop:step);
an ellipsis object (i.e., ...)²;
a newaxis object (i.e., numpy.newaxis)³;
a tuple of the above⁴.

² ... expand to the number of : objects needed to make a selection tuple of the same length as the number of dimensions of the array to be indexed. There may only be a single ellipsis present.

³ Each numpy.newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension. The added dimension is the position of the newaxis object in the selection tuple. Another way to expand the dimension of an array would be to use the function numpy.expand_dims().

⁴ Note that a tuple that contains at least a sequence object (tuple(), list(), range(), etc.) or numpy array triggers advanced indexing.

Here are some essential rules to remember:

If the number of objects in the selection tuple is less than the number of dimensions of the array to be indexed, it is assumed that all elements of the remaining dimensions are selected. E.g.:

np.array_equal(x[:, :], x[:, :, :])
# True

If the selection tuple has all entries : except the \(p\)-th entry which is an integer i, then the returned array has 1 fewer dimension than the original array. E.g.:

x[:, 0, :].shape
# (3, 5)

x[:, 0, :]
# array([[ 0,  1,  2,  3,  4],
#        [20, 21, 22, 23, 24],
#        [40, 41, 42, 43, 44]])

If the selection tuple has all entries : except the \(p\)-th entry which is a slice object i:j:k, then the returned array has the same number of dimensions as the original array, formed by concatenating the sub-arrays returned by integer indexing of elements i, i + k, ..., i + (m - 1) k < j. E.g.:

x[:, 0:1, :].shape
# (3, 1, 5)

x[:, 0:1, :]
# array([[[ 0,  1,  2,  3,  4]],
# 
#        [[20, 21, 22, 23, 24]],
# 
#        [[40, 41, 42, 43, 44]]])

Advanced indexing

Advanced indexing always returns a copy of the data. This type of indexing occurs when obj is either:

a numpy array of integer type;
a numpy array of boolean type;
a tuple of the above.

Here are some rules to remember when the indexing array is of integer type:

The indexing arrays in the selection tuple must have the same shape or can be broadcast into the same shape, which also defines the shape of the resultant array. E.g.:

x[np.array([[0, 1], [1, 2]]), np.array([[1, 2], [2, 3]]), np.array([3, 4])]
# array([[ 8, 34],
#        [33, 59]])

# Note that `np.array([3, 4])` is first broadcast to `np.array([[3, 4], [3, 4]])`. Therefore, `x[np.array([[0, 1], [1, 2]]), np.array([[1, 2], [2, 3]]), np.array([3, 4])]` is equivalent to `x[np.array([[0, 1], [1, 2]]), np.array([[1, 2], [2, 3]]), np.array([[3, 4], [3, 4]])]`.

Each set of elements from the same position of the indexing arrays forms a tuple that indexes an element of the indexed array, which then becomes the element of the resultant array in the same position. E.g.:

# In the last example, the positions of the four elements in the resultant array are `(0, 1, 3)`, `(1, 2, 4)`, `(1, 2, 3)`, `(2, 3, 4)`

x[0, 1, 3]
# 8

x[1, 2, 4]
# 34

x[1, 2, 3]
# 33

x[2, 3, 4]
# 59

If the number of objects in the selection tuple is less than the number of dimensions of the array to be indexed, it is assumed that all elements of the remaining dimensions are selected.

Here are some rules to remember when the indexing array is of boolean type:

If the number of objects in the selection tuple is less than the number of dimensions of the array to be indexed, it is assumed that all elements of the remaining dimensions are selected.
Boolean arrays must be of the same shape as the initial dimensions of the array being indexed.

x[np.array([[True, True, True, False], [True, True, False, False], [False, False, False, True]])]
# array([[ 0,  1,  2,  3,  4],
#        [ 5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14],
#        [20, 21, 22, 23, 24],
#        [25, 26, 27, 28, 29],
#        [55, 56, 57, 58, 59]])

Particularly, if the indexing boolean array has the same shape as the indexed array, then the resultant array will be a one dimensional array filled with elements from the indexed array where the corresponding elements from the indexing array is True.

x > 20
# array([[[False, False, False, False, False],
#         [False, False, False, False, False],
#         [False, False, False, False, False],
#         [False, False, False, False, False]],
#        
#        [[False,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True]],
#        
#        [[ True,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True],
#         [ True,  True,  True,  True,  True]]])

x[x > 20]
# array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
#        38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
#        55, 56, 57, 58, 59])

Additional notes

Basic indexing only permits access to distinct elements and regular slices of a numpy array. Advanced indexing, on the other hand, permits repeatable access to arbitrary elements of an array, which provides flexibility to index elements in an unpatterned way. An important thing to note in practice is that basic indexing always returns a view of the data while advanced indexing always returns a copy. The former requires less memory but is also more likely to cause confusing bugs when some shared data are modified unintentionally.