categorical_numeric_summary

categorical_numeric_summary(data: DataFrame, column1: str, column2: str, fig_height: int = 1000, fig_width: int = 1200, order: Union[str, List] = 'auto', max_levels: int = 10, include_missing: bool = False, lower_quantile: float = 0, upper_quantile: float = 1, hist_bins: Optional[int] = None, dist_type: str = 'kde_only', transform: str = 'identity', display_figure: bool = False) Figure

Generates an EDA summary of the relationship between a categorical variable as the independent variable and a numeric variable as the dependent variable.

Parameters
  • data – pandas DataFrame with data to be plotted

  • column1 – Categorical column in the data to be used as independent variable

  • column2 – Numeric column in the data to be used as dependent variable

  • fig_height – Height of the figure in pixels

  • fig_width – Width of the figure in pixels

  • order

    Order in which to sort the levels of the categorical variable:

    • ’auto’: sorts ordinal variables by provided ordering, nominal variables by descending frequency, and numeric variables in sorted order.

    • ’descending’: sorts in descending frequency.

    • ’ascending’: sorts in ascending frequency.

    • ’sorted’: sorts according to sorted order of the levels themselves.

    • ’random’: produces a random order. Useful if there are too many levels for one plot.

    Or you can pass a list of level names in directly for your own custom order.

  • max_levels – Maximum number of levels to attempt to plot on a single plot. If exceeded, only the max_level - 1 levels will be plotted and the remainder will be grouped into an ‘Other’ category.

  • include_missing – Whether to include missing values as an additional level in the data to be plotted

  • lower_quantile – Lower quantile to filter numeric column above

  • upper_quantile – Upper quantile to filter numeric column below

  • hist_bins – Number of bins to use for the histogram. Default will use plotly defaults

  • dist_type

    Type of distribution to plot:

    • ’norm_hist+kde’: Plots histograms with overlaid KDE normalized to be a probabililty density

    • ’norm_hist_only’: Plots just histograms normalized to be a probabililty density

    • ’unnorm_hist_only’: Plots just unnormalized histograms with counts

    • ’kde_only’: Plots just KDEs normalized to be a probabililty density

  • transform

    Transformation to apply to the numeric column for plotting:

    • ’identity’: no transformation

    • ’log’: apply a logarithmic transformation (zero and negative values will be filtered out)

    • ’sqrt’: apply a square root transformation

  • display_figure – Whether to display the figure in addition to returning it