categorical_categorical_summary

categorical_categorical_summary(data: DataFrame, column1: str, column2: str, fig_height: int = 1000, fig_width: int = 1200, order1: Union[str, List] = 'auto', order2: Union[str, List] = 'auto', barmode: str = 'stack', max_levels: int = 30, include_missing: bool = False, display_figure: bool = False) Figure

Generates an EDA summary of two categorical variables

Parameters
  • data – pandas DataFrame with data to be plotted

  • column1 – First categorical column in the data to plot as independent variable

  • column2 – Second categorical column in the data to plot as dependent variable

  • fig_width – Figure width in pixels

  • fig_height – Figure height in pixels

  • order1

    Order in which to sort the levels of the first variable:

    • ’auto’: sorts ordinal variables by provided ordering, nominal variables by descending frequency, and numeric variables in sorted order.

    • ’descending’: sorts in descending frequency.

    • ’ascending’: sorts in ascending frequency.

    • ’sorted’: sorts according to sorted order of the levels themselves.

    • ’random’: produces a random order. Useful if there are too many levels for one plot.

    Or you can pass a list of level names in directly for your own custom order.

  • order2 – Same as order1 but for the second variable

  • barmode – Type of bar plot aggregation. One of [‘stack’, ‘group’, ‘overlay’, ‘relative’]

  • max_levels – Maximum number of levels to attempt to plot on a single plot. If exceeded, only the max_level - 1 levels will be plotted and the remainder will be grouped into an ‘Other’ category.

  • include_missing – Whether to include missing values as an additional level in the data to be plotted

  • display_figure – Whether to display the figure in addition to returning it