Explore the Cleaned Data Set and Features
We used wikipedia training dataset, processed the data and created new features to use for training a model. If you would like to explore the features and create some Visualizations you can visit our heroku app
We used Datasette to allow for exploration and sharing our data with others.
Text Difficulty Prediction: Try for yourself
In our original project we were able to train a RandomForest model with the features in the dataset above with n=1000. The size of this model and resources available prevented us from using this model on this webapp. So in the form below, we are using n=10 of a RandomForest model with similar other parameters. This will reduce accuracy but for the purpose of showing its purpose and function, n=10 will have to suffice.
RandomForest with n=10 had a score of 0.97 on training data and 0.73 on test