Text Classification of Road Data (Ongoing)
A research project that is part of my PhD studies. Still ongoing, so there isn’t a lot here atm!
The gist is that highways authorities have collected a vast amount of data on roads, many of which are in textual forms. These data are difficult to systematically analyse since they are unstructured, and vary in quality. Hwoever, information residing in these data sources can bring offer valuable insights to infrastructure management. This is the main motivation behind this project.
As part of my MRes course, I performed initial studies on certain aspects of this project, focusing on how to minimize the impacts of having a significantly class-imbalanced dataset. The dataset used for this part of the project is from New York City Open Data, specfically data from the Department of Transport.