numeric_univariate_summary

numeric_univariate_summary(data: DataFrame, column: str, fig_height: int = 4, fig_width: int = 8, fontsize: int = 15, color_palette: Optional[str] = None, bins: Optional[int] = None, transform: str = 'identity', clip: float = 0, kde: bool = False, lower_quantile: float = 0, upper_quantile: float = 1, interactive: bool = False) → Tuple[DataFrame, Figure]

Creates a univariate EDA summary for a provided high cardinality numeric data column in a pandas DataFrame.

Summary consists of a histogram, boxplot, and small table of summary statistics.

Parameters

data – pandas DataFrame to perform EDA on
column – A string matching a column in the data to visualize
fig_height – Height of the plot in inches
fig_width – Width of the plot in inches
fontsize – Font size of axis and tick labels
color_palette – Seaborn color palette to use
bins – Number of bins to use for the histogram. Default is to determines # of bins from the data
transform –
Transformation to apply to the data for plotting:
- ’identity’: no transformation
- ’log’: apply a logarithmic transformation with small constant added in case of zero values
- ’log_exclude0’: apply a logarithmic transformation with zero values removed
- ’sqrt’: apply a square root transformation
kde – Whether to overlay a KDE plot on the histogram
lower_quantile – Lower quantile to filter data above
upper_quantile – Upper quantile to filter data below
interactive – Whether to modify to be used with interactive for ipywidgets