Proof of Concept Complete

Text Classification

Benchmarking quantum-enhanced attention against classical transformers on the TREC-6 question classification dataset, demonstrating advantages in semantic understanding tasks.

The Task

Text classification requires models to understand semantic content and assign appropriate categories to input text. The TREC-6 dataset focuses on question classification, where the model must determine what type of answer a question is seeking based on its wording and semantic structure.

This task tests a model's ability to capture nuanced linguistic patterns and semantic relationships—capabilities where we hypothesized quantum attention mechanisms would demonstrate measurable advantages over classical approaches.

Why This Matters

Question classification is fundamental to information retrieval and question-answering systems. Accurately understanding what type of information a user is seeking enables more precise and relevant responses. Superior performance on this task indicates potential for practical applications in search engines, virtual assistants, and customer support systems.

TREC-6 Dataset

The TREC Question Classification dataset consists of questions labeled with one of six coarse-grained categories based on the expected answer type. The task requires understanding both syntactic structure and semantic meaning to correctly classify questions.

Location
"Where is the Eiffel Tower located?"
Numeric
"How many planets are in our solar system?"
Human
"Who invented the telephone?"
Entity
"What is the capital of France?"
Description
"What does photosynthesis mean?"
Abbreviation
"What does NASA stand for?"

Experimental Setup

We compared our Quantum-Enhanced Transformer Multi-Head Attention (QETMHA) architecture against a classical transformer baseline with identical parameter counts across two different configurations. Both models were trained on 5,452 training examples and evaluated on 500 validation examples.

Results: Two Experimental Configurations

We conducted experiments with two different architectural configurations to assess how quantum advantages vary with model capacity and hyperparameter settings. Both runs demonstrate consistent quantum advantages, with the magnitude varying based on configuration.

Configuration 1: Smaller Model (4 Heads, Lower Learning Rate)

Hyperparameters: d_model=64, n_heads=4, lr=5e-5

63.6%
QETMHA Final Test Accuracy
56.6%
Classical Final Test Accuracy
+7.0pp
Test Accuracy Advantage
+6.5pp
F1 Score Advantage

Configuration 2: Larger Model (2 Heads, Higher Learning Rate)

Hyperparameters: d_model=128, n_heads=2, lr=2e-4

85.2%
QETMHA Final Test Accuracy
84.2%
Classical Final Test Accuracy
+1.0pp
Test Accuracy Advantage
+0.99pp
F1 Score Advantage

Configuration 1

Validation Accuracy

QETMHA
Classical
65% 55% 45% 35% 20% 1 2 3 4 5 6 7 8 10 Epoch

Configuration 2

Validation Accuracy

QETMHA
Classical
90% 75% 60% 45% 1 2 3 4 5 6 7 8 10 Epoch

Side-by-Side Comparison

Final Accuracy Across Both Configurations

QETMHA
Classical
63.6%
56.6%
Config 1
(4 heads)
85.2%
84.2%
Config 2
(2 heads)

Quantum Advantage Across Configurations

+7.0pp
Configuration 1
(4 heads, lower LR)
+1.0pp
Configuration 2
(2 heads, higher LR)

Key Findings

Implications

These results demonstrate that quantum attention mechanisms provide consistent, measurable advantages on semantic classification tasks across different model configurations. The larger quantum advantage in Configuration 1 (+7.0pp) suggests that quantum mechanisms may be particularly beneficial in resource-constrained scenarios or earlier training stages. Meanwhile, Configuration 2 shows that quantum advantages persist even at higher absolute performance levels (85.2%), indicating the approach remains competitive as classical baselines improve.

The consistency of quantum advantages across different hyperparameter settings—from learning rates to attention head configurations—suggests these benefits arise from fundamental properties of quantum attention rather than specific architectural choices, strengthening the case for quantum-enhanced approaches in production NLP systems.

Technical Details: Configuration Comparison

Configuration 1

d_model: 64
Attention heads: 4
Learning rate: 5e-5
Layers: 1
Batch size: 32
Quantum qubits: 4
63.6%
QETMHA Accuracy
56.6%
Classical Accuracy

Configuration 2

d_model: 128
Attention heads: 2
Learning rate: 2e-4
Layers: 1
Batch size: 32
Quantum qubits: 4
85.2%
QETMHA Accuracy
84.2%
Classical Accuracy
← Back to Quantum-Enhanced Transformers