Session

Long-Text Classification: A Practical Alternative to LLMs

Cybersecurity workflows often require processing long documents such as incident reports, threat intelligence, and compliance texts, where retaining full context is essential. Most NLP models are either resource-intensive (LLMs) or limited by 512-token caps (typical Hugging Face transformers), impairing document-level understanding. We present a lightweight, prompt-free approach using SetFit extended with Longformer architecture to process up to 4096 tokens. Originally developed for automated essay scoring, the method transfers effectively to security applications while requiring far less GPU power, making it suitable for low-data or privacy-sensitive environments. Released on Hugging Face with 6,000+ downloads in the first month, our model demonstrates how small, specialized models offer scalable, cost-effective solutions for cybersecurity tasks including incident classification, compliance audits, and log analysis.

About the speaker

Dr. Elena Nazarenko

Dr. Elena Nazarenko

Dr. Elena Nazarenko is a lecturer at HSLU and Co-head of the LLMs and AI Agents bootcamp. Her work focuses on applying AI responsibly, with a special interest in bias mitigation and small language models. Her industry experience includes serving as Head of Data and AI at Witty Works, where she built the core algorithm for their inclusive writing assistant (Hugging Face startup accelerator participant and Microsoft Entrepreneurship for Positive Impact Cup 2024 finalist). She and her students recently earned the Best Paper Award at the Swiss Data Science Conference and Best Poster Award at Swiss Text.
Read more …
Copyright © 2025
 
Swiss Cyber Storm
Hosting graciously provided for free by Nine