Few-Shot Prompting for Flexible Classification of Operator Texts in Multilingual Aircraft Manufacturing ✈️

Published: July 01, 2025

Unstructured textual data from manufacturing environments presents significant challenges for automated processing, particularly due to domain-specific jargon and abbreviations used by operators. We investigate the use of zero and few-shot prompting with open-source Large Language Models (LLMs) to classify operator-generated texts in aircraft manufacturing under data-scarce, high-variation conditions. Our experiments focus on English and Chinese operator texts, simulating three real-world technical shorthand scenarios, and evaluate four models: Mistral, Llama, Gemma, and Qwen. We show that Gemma and Mistral consistently delivers the highest few-shot accuracy for English and that providing additional examples beyond a small set yields diminishing returns. Moreover, few-shot prompts not only accelerate response times compared to zero-shot but also maintain accuracy, whereas Llama and Qwen tend to hallucinate or deviate from exact expected outputs when example counts exceed a critical threshold. When handling Chinese text, Mistral struggles to accurately recover short abbreviations. These results identify a practical “sweet spot” of 3–12 examples for deploying lightweight, locally deployable NLP tools in industrial settings, balancing accuracy, speed, and reliability.

Pierrick Bougault, Thomas Adler

Share on

Twitter Facebook LinkedIn

Thomas Adler 乐冬马

Share on