numeric_univariate_summary

numeric_univariate_summary(data: DataFrame, column: str, fig_height: int = 4, fig_width: int = 8, fontsize: int = 15, color_palette: Optional[str] = None, bins: Optional[int] = None, transform: str = 'identity', clip: float = 0, kde: bool = False, lower_quantile: float = 0, upper_quantile: float = 1, interactive: bool = False) Tuple[DataFrame, Figure]

Creates a univariate EDA summary for a provided high cardinality numeric data column in a pandas DataFrame.

Summary consists of a histogram, boxplot, and small table of summary statistics.

Parameters
  • data – pandas DataFrame to perform EDA on

  • column – A string matching a column in the data to visualize

  • fig_height – Height of the plot in inches

  • fig_width – Width of the plot in inches

  • fontsize – Font size of axis and tick labels

  • color_palette – Seaborn color palette to use

  • bins – Number of bins to use for the histogram. Default is to determines # of bins from the data

  • transform

    Transformation to apply to the data for plotting:

    • ’identity’: no transformation

    • ’log’: apply a logarithmic transformation with small constant added in case of zero values

    • ’log_exclude0’: apply a logarithmic transformation with zero values removed

    • ’sqrt’: apply a square root transformation

  • kde – Whether to overlay a KDE plot on the histogram

  • lower_quantile – Lower quantile to filter data above

  • upper_quantile – Upper quantile to filter data below

  • interactive – Whether to modify to be used with interactive for ipywidgets