cardsort ======== .. py:module:: cardsort Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/cardsort/analysis/index Attributes ---------- .. autoapisummary:: cardsort.__version__ Functions --------- .. autoapisummary:: cardsort.create_dendrogram cardsort.get_distance_matrix cardsort.get_cluster_labels Package Contents ---------------- .. py:function:: create_dendrogram(df, distance_matrix=None, count='fraction', linkage='average', color_threshold=None) -> None Plot hierarchical clustering of kardsort data as dendrogram. :param df: Columns: Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64 These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com. The dataframe is used to extract leaf labels, and, if no distance_matrix provided, to calculate the distance matrix. :type df: pandas.DataFrame :param distance_matrix: Takes a condensed distance matrix as input: A flat array containing the upper triangular of the distance matrix. A pre-calculated condensed distance matrix can be provided to save time generating the dendrogram. If not specified, a new distance matrix will be calculated from df. :type distance_matrix: ndarray, optional :param count: How similarity is displayed. 'fraction' Similarity is displayed as a fraction between 0 and 1. 'absolute' Similarity is displayed as absolute counts from 0 to n = number of users. :type count: str, optional :param linkage: Linkage method used to compute the distance between two clusters. 'average' Unweighted average distance between all elements in the clusters (UPGMA). 'complete' Distance between the elements that are the farthest away from each other in the two clusters. 'single' Distance between the elements that are the closest each other in the two clusters. :type linkage: str, optional :param color_threshold: Level below which to cut the color threshold in the dendrogram branches. Can be a fraction (0 - 1) or an absolute value (<= n = number of users). The default cut is at 75%. :type color_threshold: double, optional .. py:function:: get_distance_matrix(df: pandas.DataFrame) -> numpy.ndarray Return condensed distance matrix from kardsort data. :param df: Columns: Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64 These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com. :type df: pandas.DataFrame :returns: **out** -- A condensed distance matrix (a flat array containing the upper triangle of a distance matrix) representing the pairwise similarity of all cards. :rtype: ndarray .. py:function:: get_cluster_labels(df: pandas.DataFrame, cluster_cards: List[str], print_results: bool = True, return_df_results: bool = True) -> Union[pandas.DataFrame, None] Return labels users created for clusters including a given list of cards. :param df: Columns: Name: card_id, dtype: int64 Name: card_label, dtype: object Name: category_id, dtype: int64 Name: category_label, dtype: object Name: user_id, dtype: int64 These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com. :type df: pandas.DataFrame :param cluster_cards: List of card-labels for which you would like to get user-generated cluster-labels. :type cluster_cards: list of str :param print_results: If true, prints which users grouped cards together and under which label :type print_results: bool, optional :param return_df_results: If true, returns a dataframe with results :type return_df_results: bool, optional :returns: * **out** (*pandas.DataFrame (default)*) -- Columns: Name: user_id, int Name: cluster_label, str Name: cards, list of str Dataframe with one row for each user who clustered the given cards together, including category label and the full list of cards in that category. * *OR* * **out** (*None*) -- If return_df_results = False .. py:data:: __version__