numeric_categorical_summary
- numeric_categorical_summary(data: DataFrame, column1: str, column2: str, fig_height: int = 600, fig_width: int = 1200, order: Union[str, List] = 'auto', num_intervals: int = 4, interval_type: str = 'quantile', max_levels: int = 30, include_missing: bool = False, display_figure: bool = False) Figure
Generates an EDA summary of the relationship of a numeric variable on a categorical variable.
- Parameters
data – pandas DataFrame with data to be plotted
column1 – Numeric column in the data to be plotted as independent variable
column2 – Categorical column in the data to be plotted as dependent variable
fig_height – Height of the figure in pixels
fig_width – Width of the figure in pixels
order –
Order in which to sort the levels of the categorical variable:
’auto’: sorts ordinal variables by provided ordering, nominal variables by descending frequency, and numeric variables in sorted order.
’descending’: sorts in descending frequency.
’ascending’: sorts in ascending frequency.
’sorted’: sorts according to sorted order of the levels themselves.
’random’: produces a random order. Useful if there are too many levels for one plot.
Or you can pass a list of level names in directly for your own custom order.
num_intervals – Number of intervals to bin column1 into
interval_type – Type of intervals to bin column1 into. ‘quantile’ or ‘equal width’
max_levels – Maximum number of levels to attempt to plot on a single plot. If exceeded, only the max_level - 1 levels will be plotted and the remainder will be grouped into an ‘Other’ category.
include_missing – Whether to include missing values as an additional level in the data to be plotted
display_figure – Whether to display the figure in addition to returning it