collection_summary

collection_summary(data: DataFrame, column: str, fig_height: int = 1000, fig_width: int = 1000, top_entries: int = 10, sort_collections: bool = False, remove_duplicates: bool = False, display_figure: bool = False) Figure

Creates a univariate EDA summary for a collections column in a pandas DataFrame.

The provided column should be an object type containing lists, tuples, or sets.

Parameters
  • data – Dataset to perform EDA on

  • column – A string matching a column in the data

  • fig_height – Height of the plot in inches

  • fig_width – Width of the plot in inches

  • top_entries – Max number of entries to show for countplots

  • sort_collections – Whether to sort collections and ignore original order

  • remove_duplicates – Whether to remove duplicate entries from collections

  • display_figure – Whether to display the figure in addition to returning it