datetime_univariate_summary

datetime_univariate_summary(data: DataFrame, column: str, fig_height: int = 4, fig_width: int = 8, fontsize: int = 15, color_palette: Optional[str] = None, ts_freq: str = 'auto', delta_units: str = 'auto', ts_type: str = 'line', trend_line: str = 'auto', date_labels: Optional[str] = None, date_breaks: Optional[str] = None, lower_quantile: float = 0, upper_quantile: float = 1, interactive: bool = False) Tuple[DataFrame, Figure]

Creates a univariate EDA summary for a provided datetime data column in a pandas DataFrame.

Produces the following summary plots:

  • a time series plot of counts aggregated at the temporal resolution provided by ts_freq

  • a time series plot of time deltas between successive observations in units defined by delta_freq

  • countplots for the following metadata from the datetime object:

    • day of week

    • day of month

    • month

    • year

    • hour

    • minute

Parameters
  • data – pandas DataFrame to perform EDA on

  • column – A string matching a column in the data

  • fig_height – Height of the plot in inches

  • fig_width – Width of the plot in inches

  • fontsize – Font size of axis and tick labels

  • color_palette – Seaborn color palette to use

  • ts_freq

    String describing the frequency at which to aggregate data in one of two formats:

    • A pandas offset string.

    • A human readable string in the same format passed to date breaks (e.g. “4 months”)

    Default is to attempt to intelligently determine a good aggregation frequency.

  • delta_units

    String describing the units in which to compute time deltas between successive observations in one of two formats:

    • A pandas offset string.

    • A human readable string in the same format passed to date breaks (e.g. “4 months”)

    Default is to attempt to intelligently determine a good frequency unit.

  • ts_type – ‘line’ plots a line graph while ‘point’ plots points for observations

  • trend_line – Trend line to plot over data. “None” produces no trend line. Other options are passed to geom_smooth.

  • date_labels – strftime date formatting string that will be used to set the format of the x axis tick labels

  • date_breaks – Date breaks string in form ‘{interval} {period}’. Interval must be an integer and period must be a time period ranging from seconds to years. (e.g. ‘1 year’, ‘3 minutes’)

  • lower_quantile – Lower quantile to filter data above

  • upper_quantile – Upper quantile to filter data below

  • interactive – Whether to display figures and tables in jupyter notebook for interactive use

Returns

Tuple containing matplotlib Figure drawn and summary stats DataFrame