Sample entropy

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states.[1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities C i m ( r ) {\displaystyle C_{i}'^{m}(r)} are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at.[2]

There is a multiscale version of SampEn as well, suggested by Costa and others.[3] SampEn can be used in biomedical and biomechanical research, for example to evaluate postural control.[4][5]


Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity.[1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension m {\displaystyle m} , tolerance r {\displaystyle r} and number of data points N {\displaystyle N} , SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length m {\displaystyle m} have distance < r {\displaystyle <r} then two sets of simultaneous data points of length m + 1 {\displaystyle m+1} also have distance < r {\displaystyle <r} . And we represent it by S a m p E n ( m , r , N ) {\displaystyle SampEn(m,r,N)} (or by S a m p E n ( m , r , τ , N ) {\displaystyle SampEn(m,r,\tau ,N)} including sampling time τ {\displaystyle \tau } ).

Now assume we have a time-series data set of length N = { x 1 , x 2 , x 3 , . . . , x N } {\displaystyle N={\{x_{1},x_{2},x_{3},...,x_{N}\}}} with a constant time interval τ {\displaystyle \tau } . We define a template vector of length m {\displaystyle m} , such that X m ( i ) = { x i , x i + 1 , x i + 2 , . . . , x i + m 1 } {\displaystyle X_{m}(i)={\{x_{i},x_{i+1},x_{i+2},...,x_{i+m-1}\}}} and the distance function d [ X m ( i ) , X m ( j ) ] {\displaystyle d[X_{m}(i),X_{m}(j)]} (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

S a m p E n = ln A B {\displaystyle SampEn=-\ln {A \over B}}


A {\displaystyle A} = number of template vector pairs having d [ X m + 1 ( i ) , X m + 1 ( j ) ] < r {\displaystyle d[X_{m+1}(i),X_{m+1}(j)]<r}

B {\displaystyle B} = number of template vector pairs having d [ X m ( i ) , X m ( j ) ] < r {\displaystyle d[X_{m}(i),X_{m}(j)]<r}

It is clear from the definition that A {\displaystyle A} will always have a value smaller or equal to B {\displaystyle B} . Therefore, S a m p E n ( m , r , τ ) {\displaystyle SampEn(m,r,\tau )} will be always either be zero or positive value. A smaller value of S a m p E n {\displaystyle SampEn} also indicates more self-similarity in data set or less noise.

Generally we take the value of m {\displaystyle m} to be 2 {\displaystyle 2} and the value of r {\displaystyle r} to be 0.2 × s t d {\displaystyle 0.2\times std} . Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to 0.2 × s t d {\displaystyle 0.2\times std} for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with δ = 1 {\displaystyle \delta =1} , where δ {\displaystyle \delta } is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of δ {\displaystyle \delta } . And modified template vector is defined as X m , δ ( i ) = x i , x i + δ , x i + 2 × δ , . . . , x i + ( m 1 ) × δ {\displaystyle X_{m,\delta }(i)={x_{i},x_{i+\delta },x_{i+2\times \delta },...,x_{i+(m-1)\times \delta }}} and sampEn can be written as S a m p E n ( m , r , δ ) = ln A δ B δ {\displaystyle SampEn\left(m,r,\delta \right)=-\ln {A_{\delta } \over B_{\delta }}} And we calculate A δ {\displaystyle A_{\delta }} and B δ {\displaystyle B_{\delta }} like before.


Sample entropy can be implemented easily in many different programming languages. Below lies an example written in Python.

from itertools import combinations
from math import log

def construct_templates(timeseries_data: list, m: int = 2):
    num_windows = len(timeseries_data) - m + 1
    return [timeseries_data[x : x + m] for x in range(0, num_windows)]

def get_matches(templates: list, r: float):
    return len(
        list(filter(lambda x: is_match(x[0], x[1], r), combinations(templates, 2)))

def is_match(template_1: list, template_2: list, r: float):
    return all([abs(x - y) < r for (x, y) in zip(template_1, template_2)])

def sample_entropy(timeseries_data: list, window_size: int, r: float):
    B = get_matches(construct_templates(timeseries_data, window_size), r)
    A = get_matches(construct_templates(timeseries_data, window_size + 1), r)
    return -log(A / B)

An equivalent example in numerical Python.

import numpy 

def construct_templates(timeseries_data, m):
    num_windows = len(timeseries_data) - m + 1
    return numpy.array([timeseries_data[x : x + m] for x in range(0, num_windows)])

def get_matches(templates, r):
    return len(
        list(filter(lambda x: is_match(x[0], x[1], r), combinations(templates)))

def combinations(x):
    idx = numpy.stack(numpy.triu_indices(len(x), k=1), axis=-1)
    return x[idx]

def is_match(template_1, template_2, r):
    return numpy.all([abs(x - y) < r for (x, y) in zip(template_1, template_2)])

def sample_entropy(timeseries_data, window_size, r):
    B = get_matches(construct_templates(timeseries_data, window_size), r)
    A = get_matches(construct_templates(timeseries_data, window_size + 1), r)
    return -numpy.log(A / B)

An example written in other languages can be found:

  • Matlab
  • R.
  • Rust

See also

  • Kolmogorov complexity
  • Approximate entropy


  1. ^ a b Richman, JS; Moorman, JR (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology. 278 (6): H2039–49. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903.
  2. ^ Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. Bibcode:2019Entrp..21..541D. doi:10.3390/e21060541. PMC 7515030. PMID 33267255.
  3. ^ Costa, Madalena; Goldberger, Ary; Peng, C.-K. (2005). "Multiscale entropy analysis of biological signals". Physical Review E. 71 (2): 021906. Bibcode:2005PhRvE..71b1906C. doi:10.1103/PhysRevE.71.021906. PMID 15783351.
  4. ^ Błażkiewicz, Michalina; Kędziorek, Justyna; Hadamus, Anna (March 2021). "The Impact of Visual Input and Support Area Manipulation on Postural Control in Subjects after Osteoporotic Vertebral Fracture". Entropy. 23 (3): 375. Bibcode:2021Entrp..23..375B. doi:10.3390/e23030375. PMC 8004071. PMID 33804770.
  5. ^ Hadamus, Anna; Białoszewski, Dariusz; Błażkiewicz, Michalina; Kowalska, Aleksandra J.; Urbaniak, Edyta; Wydra, Kamil T.; Wiaderna, Karolina; Boratyński, Rafał; Kobza, Agnieszka; Marczyński, Wojciech (February 2021). "Assessment of the Effectiveness of Rehabilitation after Total Knee Replacement Surgery Using Sample Entropy and Classical Measures of Body Balance". Entropy. 23 (2): 164. Bibcode:2021Entrp..23..164H. doi:10.3390/e23020164. PMC 7911395. PMID 33573057.