text_summary

text_summary(data: DataFrame, column: str, fig_height: int = 1000, fig_width: int = 1200, top_ngrams: int = 10, remove_punct: bool = True, remove_stop: bool = True, lower_case: bool = True, display_figure: bool = False) Figure

Creates a univariate EDA summary for a text variable column in a pandas DataFrame. Currently only supports English.

Parameters
  • data – Dataset to perform EDA on

  • column – A string matching a column in the data

  • fig_height – Height of the plot in pixels

  • fig_width – Width of the plot in pixels

  • top_ngrams – Maximum number of ngrams to plot for the top most frequent unigrams to trigrams

  • remove_punct – Whether to remove punctuation during tokenization

  • remove_stop – Whether to remove stop words during tokenization

  • lower_case – Whether to lower case text for tokenization

  • display_figure – Whether to display the figure in addition to returning it