datetime_univariate_summary
- datetime_univariate_summary(data: DataFrame, column: str, fig_height: int = 4, fig_width: int = 8, fontsize: int = 15, color_palette: Optional[str] = None, ts_freq: str = 'auto', delta_units: str = 'auto', ts_type: str = 'line', trend_line: str = 'auto', date_labels: Optional[str] = None, date_breaks: Optional[str] = None, lower_quantile: float = 0, upper_quantile: float = 1, interactive: bool = False) Tuple[DataFrame, Figure]
Creates a univariate EDA summary for a provided datetime data column in a pandas DataFrame.
Produces the following summary plots:
a time series plot of counts aggregated at the temporal resolution provided by ts_freq
a time series plot of time deltas between successive observations in units defined by delta_freq
countplots for the following metadata from the datetime object:
day of week
day of month
month
year
hour
minute
- Parameters
data – pandas DataFrame to perform EDA on
column – A string matching a column in the data
fig_height – Height of the plot in inches
fig_width – Width of the plot in inches
fontsize – Font size of axis and tick labels
color_palette – Seaborn color palette to use
ts_freq –
String describing the frequency at which to aggregate data in one of two formats:
A human readable string in the same format passed to date breaks (e.g. “4 months”)
Default is to attempt to intelligently determine a good aggregation frequency.
delta_units –
String describing the units in which to compute time deltas between successive observations in one of two formats:
A human readable string in the same format passed to date breaks (e.g. “4 months”)
Default is to attempt to intelligently determine a good frequency unit.
ts_type – ‘line’ plots a line graph while ‘point’ plots points for observations
trend_line – Trend line to plot over data. “None” produces no trend line. Other options are passed to geom_smooth.
date_labels – strftime date formatting string that will be used to set the format of the x axis tick labels
date_breaks – Date breaks string in form ‘{interval} {period}’. Interval must be an integer and period must be a time period ranging from seconds to years. (e.g. ‘1 year’, ‘3 minutes’)
lower_quantile – Lower quantile to filter data above
upper_quantile – Upper quantile to filter data below
interactive – Whether to display figures and tables in jupyter notebook for interactive use
- Returns
Tuple containing matplotlib Figure drawn and summary stats DataFrame