Milestone II MADS

Text Difficulty Prediction

Please enter in a sentences and click submit, we will return the results of whether or not this text should be simplified

Explore the Cleaned Data Set and Features

We used wikipedia training dataset, processed the data and created new features to use for training a model. If you would like to explore the features and create some Visualizations you can visit our heroku app

We used Datasette to allow for exploration and sharing our data with others.


Text Difficulty Prediction: Try for yourself

In our original project we were able to train a RandomForest model with the features in the dataset above with n=1000. The size of this model and resources available prevented us from using this model on this webapp. So in the form below, we are using n=10 of a RandomForest model with similar other parameters. This will reduce accuracy but for the purpose of showing its purpose and function, n=10 will have to suffice.

RandomForest with n=10 had a score of 0.97 on training data and 0.73 on test