Calculating kinships#

In population genetics, the term “kinship” typically refers to the level of genetic relatedness between a pair of individuals, and is represented as a number between 0 and 1. More on calculating kinships can be found here.

There are several methods available in the Pedigree, PloidPedigree classes for kinship calculation.

Calculating all-pairwise kinships#

You can calculate the kinship coefficients between all the pairs of individuals in your pedigree by running the calculate_kinship function. This function calculates the kinship coefficients and returns a numpy matrix as the result. Additionally, the function returns a dictionary that allows you to index the matrix:

# Parse the input file given by the specified file path
pedigree = Pedigree.get_pedigree_graph_from_file(filepath=small_pedigree_path)
# Calculate the kinship coefficients
(vertex_to_index, direct_kinship_matrix) = pedigree.calculate_kinship()
for vertex_1, vertex_2 in itertools.combinations(pedigree, 2):
    kinship = direct_kinship_matrix[vertex_to_index[vertex_1], vertex_to_index[vertex_2]]

Every kinship value is stored as a 16-bit float value.

Note that the complexity of this function grows quadratically with the size of the pedigree. If your pedigree is large, you may run out of memory.

Proband kinship calculation#

In some cases, we only need to calculate the kinship coefficients among a given list of individuals, not the entire pedigree. If the pedigree is small, we can still use the previous method. However, when the pedigree is large, this approach may not be sufficient. For this purpose, we can use the calculate_probands_kinship function, which is a much more efficient version of the previous algorithm when we are only interested in a few kinship coefficient values.

API#

The calculate_probands_kinship function takes two additional parameters. The first parameter, probands, is the list of individuals for which the kinship coefficients are calculated. Note that there are no restrictions on the individuals in the list; they simply should be valid vertices in the pedigree. If not specified, the sink vertices are used as probands.

The second parameter, mode, specifies the running mode of the algorithm. The default option, SPEED, offers the best running time and should be used when memory is not an issue. In cases where memory restrictions are high, the MEMORY option can be used to reduce memory usage at the cost of increased running time. On average, the MEMORY option can save up to 25% of memory and take around 40-45% more running time.

# Calculating proband kinship coefficients
kinship_matrix_default = pedigree.calculate_probands_kinship()
kinship_matrix_custom = pedigree.calculate_probands_kinship(probands={1, 3}, mode=KinshipMode.MEMORY)

Algorithm description#

The main algorithm’s idea is described in section 4.2 of the mentioned paper. The general idea of the algorithm is as follows:

Form a queue of vertices for which the kinship coefficients should be calculated.
While the queue is not empty, do the following:
1. Calculate the kinship coefficient between the chosen vertex and every other vertex in the matrix.
2. Check whether there are parent vertices of this vertex such that all their children are in the matrix. If there are such non-proband parents, remove them from the kinship matrix.
3. If there are any children of this vertex such that all their parents are in the matrix, add them to the queue.

This algorithm description is not the most precise, but it explains the general idea of how this function works.

Calculating kinships

Contents

Calculating kinships#

Calculating all-pairwise kinships#

Proband kinship calculation#

API#

Algorithm description#