collection_univariate_summary

collection_univariate_summary(data: DataFrame, column: str, fig_height: int = 6, fig_width: int = 12, fontsize: int = 15, color_palette: Optional[str] = None, top_entries: int = 10, sort_collections: bool = False, remove_duplicates: bool = False, interactive: bool = False) Tuple[DataFrame, Figure]

Creates a univariate EDA summary for a provided collections column in a pandas DataFrame.

The provided column should be an object type containing lists, tuples, or sets.

Parameters
  • data – Dataset to perform EDA on

  • column – A string matching a column in the data

  • fig_height – Height of the plot in inches

  • fig_width – Width of the plot in inches

  • fontsize – Font size of axis and tick labels

  • color_palette – Seaborn color palette to use

  • top_entries – Max number of entries to show for countplots

  • sort_collections – Whether to sort collections and ignore original order

  • remove_duplicates – Whether to remove duplicate entries from collections

  • interactive – Whether to display figures and tables in jupyter notebook for interactive use

Returns

Tuple containing matplotlib Figure drawn and summary stats DataFrame