Milestone2 MADS

Explore the Assigned Topics and Original Text

We used wikipedia training dataset, processed the data and asigned topics to the original text. If you would like to explore the features and create some Visualizations you can visit our heroku app

We used Datasette to allow for exploration and sharing our data with others.

Predict Topic Text

We ran the Gibbs Sampling Dirichlet Mixture Model (GSDMM) model, a type of LDA specifically designed for shorter texts, on the original texts contained in a Wikipedia training dataset to obtain 20 topic clusters and assigned them to the original texts. We used a TfidfVectorizer, and a naive bayes MultinomialNB classifier to create a pipeline to predict probablity of topics on new text. Try it out for yourself below!

Interactive Visualizations for Topic Modeling Clusters

LDA TSNE

LDA MMDS

LDA PCOA

GSDMM Topics

Base TSNE
Base MMDS
Base PCOA
Bigram TSNE
Bigram MMDS
Bigram PCOA
Topics

Topic Modeling

Explore the Assigned Topics and Original Text

Predict Topic Text

Interactive Visualizations for Topic Modeling Clusters

See list of Visualizations