Feature Analysis for Predicate Argument Identification using Random Forests
There are a variety of characteristics used to identify ARG1’s in the predicate-argument relationships, such as N-gram, predicate, path, and embedding features. Among these, it is unclear which ones are most important when a machine learning model is identifying ARG1’s.
This paper utilizes feature and permutation importance in a binary classification random forest, to assess a multitude of features and determine their impacts. The most important features in the random forest were distance of word to predicate, word to predicate embedding distance, and the word itself.
Final paper for NYU Natural Language Programming Course taught by Adam Meyers.