Long-Text Classification: A Practical Alternative to LLMs
Cybersecurity workflows often require processing long documents such as incident reports, threat intelligence, and compliance texts, where retaining full context is essential. Most NLP models are either resource-intensive (LLMs) or limited by 512-token caps (typical Hugging Face transformers), impairing document-level understanding. We present a lightweight, prompt-free approach using SetFit extended with Longformer architecture to process up to 4096 tokens. Originally developed for automated essay scoring, the method transfers effectively to security applications while requiring far less GPU power, making it suitable for low-data or privacy-sensitive environments. Released on Hugging Face with 6,000+ downloads in the first month, our model demonstrates how small, specialized models offer scalable, cost-effective solutions for cybersecurity tasks including incident classification, compliance audits, and log analysis.
About the speaker
Dr. Elena Nazarenko
Read more …