In engineering education ecosystems, community members produce text through myriad activities both inside and outside of the classroom in teaching and research settings. In many of these cases, there is an abundance of text available to educators and researchers that could provide insight into various phenomena of interest within the system - student conceptual understanding, student experiences outside the classroom, how instructors can improve their teaching, or even shifts in collective conversations. Unfortunately, while these bodies of text have the potential to provide novel insights to educators and researchers, traditional analysis techniques do not scale well. For example, analyzing larger amounts of text can take one grader or researcher significantly more time than grading a small set of text responses. A larger body of text also creates more challenges for intrarater reliability. Likewise, expanding the size of the grading or research team can create interrater reliability challenges and the possibility of bias.
To address this opportunity, we have created a natural language processing system that augments human analysis so as to facilitate and enhance the work of one person (or team). Specifically, we take minimally pre-processed text, embed them using a pre-trained transformer (a specific kind of neural network architecture trained to encode inputs and decode outputs), and perform a sequence of dimension reduction techniques capped with a final clustering step. Such a system can help reduce the amount of time needed to analyze the text by effectively running a first pass on the text to group similar responses together. The human user can utilize these groupings to perform further analysis to fine tune and identify meanings in ways that only a human could. The system also can help improve consistency by analyzing across the entire collection of texts simultaneously and grouping similar items together. This is in contrast with a single person or a team that would have to work in series, analyzing responses sequentially and thereby creating the potential for inconsistencies across time.
In this paper we describe the system’s architecture and data processing steps. We demonstrate the utility of this approach by applying the method on a pair of questions from an end-of-semester feedback survey in a large, required introductory engineering course. The data were collected in spring 2020. The survey questions were part of a general feedback survey and asked students about their experiences in the transition to online learning subsequent to the SARS-CoV-2 outbreak. Given the variation in students’ experiences, this presented a challenging dataset.
Our results suggest that the pre-analysis text clustering can improve speed and accuracy of coding when compared with unassisted human coding. As natural language processing techniques continue to develop, the engineering education research community should continue to explore potential applications to improve understanding and sensemaking from large volumes of underutilized text data from both within and outside of classroom settings.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.