cardsort.analysis

Functions

get_distance_matrix(→ numpy.ndarray)

Return condensed distance matrix from kardsort data.

create_dendrogram(→ None)

Plot hierarchical clustering of kardsort data as dendrogram.

get_cluster_labels(→ Union[pandas.DataFrame, None])

Return labels users created for clusters including a given list of cards.

Module Contents

cardsort.analysis.get_distance_matrix(df: pandas.DataFrame) numpy.ndarray[source]

Return condensed distance matrix from kardsort data.

Parameters:

df (pandas.DataFrame) –

Columns:

Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64

These columns correspond to the ‘Casolysis Data (.csv) - Recommended’ export from kardsort.com.

Returns:

out – A condensed distance matrix (a flat array containing the upper triangle of a distance matrix) representing the pairwise similarity of all cards.

Return type:

ndarray

cardsort.analysis.create_dendrogram(df, distance_matrix=None, count='fraction', linkage='average', color_threshold=None) None[source]

Plot hierarchical clustering of kardsort data as dendrogram.

Parameters:
  • df (pandas.DataFrame) –

    Columns:

    Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64

    These columns correspond to the ‘Casolysis Data (.csv) - Recommended’ export from kardsort.com. The dataframe is used to extract leaf labels, and, if no distance_matrix provided, to calculate the distance matrix.

  • distance_matrix (ndarray, optional) – Takes a condensed distance matrix as input: A flat array containing the upper triangular of the distance matrix. A pre-calculated condensed distance matrix can be provided to save time generating the dendrogram. If not specified, a new distance matrix will be calculated from df.

  • count (str, optional) –

    How similarity is displayed.

    ’fraction’ Similarity is displayed as a fraction between 0 and 1.

    ’absolute’ Similarity is displayed as absolute counts from 0 to n = number of users.

  • linkage (str, optional) –

    Linkage method used to compute the distance between two clusters.

    ’average’ Unweighted average distance between all elements in the clusters (UPGMA).

    ’complete’ Distance between the elements that are the farthest away from each other in the two clusters.

    ’single’ Distance between the elements that are the closest each other in the two clusters.

  • color_threshold (double, optional) – Level below which to cut the color threshold in the dendrogram branches. Can be a fraction (0 - 1) or an absolute value (<= n = number of users). The default cut is at 75%.

cardsort.analysis.get_cluster_labels(df: pandas.DataFrame, cluster_cards: List[str], print_results: bool = True, return_df_results: bool = True) pandas.DataFrame | None[source]

Return labels users created for clusters including a given list of cards.

Parameters:
  • df (pandas.DataFrame) –

    Columns:

    Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64

    These columns correspond to the ‘Casolysis Data (.csv) - Recommended’ export from kardsort.com.

  • cluster_cards (list of str) – List of card-labels for which you would like to get user-generated cluster-labels.

  • print_results (bool, optional) – If true, prints which users grouped cards together and under which label

  • return_df_results (bool, optional) – If true, returns a dataframe with results

Returns:

  • out (pandas.DataFrame (default)) –

    Columns:

    Name: user_id, int Name: cluster_label, str Name: cards, list of str

    Dataframe with one row for each user who clustered the given cards together, including category label and the full list of cards in that category.

  • OR

  • out (None) – If return_df_results = False