text_summary

text_summary(data: DataFrame, column: str, fig_height: int = 1000, fig_width: int = 1200, top_ngrams: int = 10, remove_punct: bool = True, remove_stop: bool = True, lower_case: bool = True, display_figure: bool = False) → Figure

Creates a univariate EDA summary for a text variable column in a pandas DataFrame. Currently only supports English.

Parameters

data – Dataset to perform EDA on
column – A string matching a column in the data
fig_height – Height of the plot in pixels
fig_width – Width of the plot in pixels
top_ngrams – Maximum number of ngrams to plot for the top most frequent unigrams to trigrams
remove_punct – Whether to remove punctuation during tokenization
remove_stop – Whether to remove stop words during tokenization
lower_case – Whether to lower case text for tokenization
display_figure – Whether to display the figure in addition to returning it