cardsort
========

.. py:module:: cardsort


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/cardsort/analysis/index


Attributes
----------

.. autoapisummary::

   cardsort.__version__


Functions
---------

.. autoapisummary::

   cardsort.create_dendrogram
   cardsort.get_distance_matrix
   cardsort.get_cluster_labels


Package Contents
----------------

.. py:function:: create_dendrogram(df, distance_matrix=None, count='fraction', linkage='average', color_threshold=None) -> None

   Plot hierarchical clustering of kardsort data as dendrogram.

   :param df:
              Columns:
                  Name: card_id, dtype: int64
                  Name: card_label, dtype: object
                  Name: category_id, dtype: int64
                  Name: category_label, dtype: object
                  Name: user_id, dtype: int64
              These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com.
              The dataframe is used to extract leaf labels, and, if no distance_matrix provided, to calculate the distance matrix.
   :type df: pandas.DataFrame
   :param distance_matrix: Takes a condensed distance matrix as input: A flat array containing the upper triangular of the distance matrix.
                           A pre-calculated condensed distance matrix can be provided to save time generating the dendrogram.
                           If not specified, a new distance matrix will be calculated from df.
   :type distance_matrix: ndarray, optional
   :param count: How similarity is displayed.

                 'fraction'
                 Similarity is displayed as a fraction between 0 and 1.

                 'absolute'
                 Similarity is displayed as absolute counts from 0 to n = number of users.
   :type count: str, optional
   :param linkage: Linkage method used to compute the distance between two clusters.

                   'average'
                   Unweighted average distance between all elements in the clusters (UPGMA).

                   'complete'
                   Distance between the elements that are the farthest away from each other in the two clusters.

                   'single'
                   Distance between the elements that are the closest each other in the two clusters.
   :type linkage: str, optional
   :param color_threshold: Level below which to cut the color threshold in the dendrogram branches.
                           Can be a fraction (0 - 1) or an absolute value (<= n = number of users).
                           The default cut is at 75%.
   :type color_threshold: double, optional


.. py:function:: get_distance_matrix(df: pandas.DataFrame) -> numpy.ndarray

   Return condensed distance matrix from kardsort data.

   :param df:
              Columns:
                  Name: card_id, dtype: int64
                  Name: card_label, dtype: object
                  Name: category_id, dtype: int64
                  Name: category_label, dtype: object
                  Name: user_id, dtype: int64
              These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com.
   :type df: pandas.DataFrame

   :returns: **out** -- A condensed distance matrix (a flat array containing the upper triangle of a distance matrix)
             representing the pairwise similarity of all cards.
   :rtype: ndarray


.. py:function:: get_cluster_labels(df: pandas.DataFrame, cluster_cards: List[str], print_results: bool = True, return_df_results: bool = True) -> Union[pandas.DataFrame, None]

   Return labels users created for clusters including a given list of cards.

   :param df:
              Columns:
                      Name: card_id, dtype: int64
                      Name: card_label, dtype: object
                      Name: category_id, dtype: int64
                      Name: category_label, dtype: object
                      Name: user_id, dtype: int64
              These columns correspond to the 'Casolysis Data (.csv) - Recommended' export from kardsort.com.
   :type df: pandas.DataFrame
   :param cluster_cards: List of card-labels for which you would like to get user-generated cluster-labels.
   :type cluster_cards: list of str
   :param print_results: If true, prints which users grouped cards together and under which label
   :type print_results: bool, optional
   :param return_df_results: If true, returns a dataframe with results
   :type return_df_results: bool, optional

   :returns: * **out** (*pandas.DataFrame (default)*) --

               Columns:
                   Name: user_id, int
                   Name: cluster_label, str
                   Name: cards, list of str
               Dataframe with one row for each user who clustered the given cards together, including category label and
               the full list of cards in that category.
             * *OR*
             * **out** (*None*) -- If return_df_results = False


.. py:data:: __version__