categorical_univariate_summary
- categorical_univariate_summary(data: DataFrame, column: str, fig_height: int = 5, fig_width: int = 10, fontsize: int = 15, color_palette: Optional[str] = None, order: Union[str, List] = 'auto', max_levels: int = 30, label_rotation: Optional[int] = None, label_fontsize: Optional[float] = None, flip_axis: Optional[bool] = None, percent_axis: bool = True, label_counts: bool = True, include_missing: bool = False, interactive: bool = False) Tuple[DataFrame, Figure]
Creates a univariate EDA summary for a provided categorical data column in a pandas DataFrame.
Summary consists of a count plot with twin axes for counts and percentages for each level of the variable and a small summary table.
- Parameters
data – pandas DataFrame with data to be plotted
column – column in the dataframe to plot
fig_width – figure width in inches
fig_height – figure height in inches
fontsize – Font size of axis and tick labels
color_palette – Seaborn color palette to use
order –
Order in which to sort the levels of the variable for plotting:
’auto’: sorts ordinal variables by provided ordering, nominal variables by descending frequency, and numeric variables in sorted order.
’descending’: sorts in descending frequency.
’ascending’: sorts in ascending frequency.
’sorted’: sorts according to sorted order of the levels themselves.
’random’: produces a random order. Useful if there are too many levels for one plot.
Or you can pass a list of level names in directly for your own custom order.
max_levels – Maximum number of levels to attempt to plot on a single plot. If exceeded, only the max_level - 1 levels will be plotted and the remainder will be grouped into an ‘Other’ category.
percent_axis – Whether to add a twin y axis with percentages
label_counts – Whether to add exact counts and percentages as text annotations on each bar in the plot.
label_fontsize – Size of the annotations text. Default tries to infer a reasonable size based on the figure size and number of levels.
flip_axis – Whether to flip the plot so labels are on y axis. Useful for long level names or lots of levels. Default tries to infer based on number of levels and label_rotation value.
label_rotation – Amount to rotate level labels. Useful for long level names or lots of levels.
include_missing – Whether to include missing values as an additional level in the data
interactive – Whether to display plot and table for interactive use in a jupyter notebook
- Returns
Summary table and matplotlib figure with countplot