Text Classification PoC | Query Machines

The Task

Text classification requires models to understand semantic content and assign appropriate categories to input text. The TREC-6 dataset focuses on question classification, where the model must determine what type of answer a question is seeking based on its wording and semantic structure.

This task tests a model's ability to capture nuanced linguistic patterns and semantic relationships—capabilities where we hypothesized quantum attention mechanisms would demonstrate measurable advantages over classical approaches.

Why This Matters

Question classification is fundamental to information retrieval and question-answering systems. Accurately understanding what type of information a user is seeking enables more precise and relevant responses. Superior performance on this task indicates potential for practical applications in search engines, virtual assistants, and customer support systems.

TREC-6 Dataset

The TREC Question Classification dataset consists of questions labeled with one of six coarse-grained categories based on the expected answer type. The task requires understanding both syntactic structure and semantic meaning to correctly classify questions.

Location

"Where is the Eiffel Tower located?"

Numeric

"How many planets are in our solar system?"

Human

"Who invented the telephone?"

Entity

"What is the capital of France?"

Description

"What does photosynthesis mean?"

Abbreviation

"What does NASA stand for?"

Experimental Setup

We compared our Quantum-Enhanced Transformer Multi-Head Attention (QETMHA) architecture against a classical transformer baseline with identical parameter counts across two different configurations. Both models were trained on 5,452 training examples and evaluated on 500 validation examples.

Results: Two Experimental Configurations

We conducted experiments with two different architectural configurations to assess how quantum advantages vary with model capacity and hyperparameter settings. Both runs demonstrate consistent quantum advantages, with the magnitude varying based on configuration.

Configuration 1: Smaller Model (4 Heads, Lower Learning Rate)

Hyperparameters: d_model=64, n_heads=4, lr=5e-5

63.6%

QETMHA Final Test Accuracy

56.6%

Classical Final Test Accuracy

+7.0pp

Test Accuracy Advantage

+6.5pp

F1 Score Advantage

Configuration 2: Larger Model (2 Heads, Higher Learning Rate)

Hyperparameters: d_model=128, n_heads=2, lr=2e-4

85.2%

QETMHA Final Test Accuracy

84.2%

Classical Final Test Accuracy

+1.0pp

Test Accuracy Advantage

+0.99pp

F1 Score Advantage

Configuration 1

Validation Accuracy

QETMHA

Classical

Configuration 2

Validation Accuracy

QETMHA

Classical

Side-by-Side Comparison

Final Accuracy Across Both Configurations

QETMHA

Classical

63.6%

56.6%

Config 1
(4 heads)

85.2%

84.2%

Config 2
(2 heads)

Key Findings

Consistent quantum advantage across configurations: QETMHA outperformed classical baselines in both experimental setups, with advantages ranging from +1.0pp to +7.0pp depending on hyperparameters.
Configuration-dependent advantage magnitude: The smaller model (Configuration 1) showed larger absolute advantages (+7.0pp), while the larger model (Configuration 2) achieved higher overall accuracy (85.2%) with a smaller but still measurable advantage (+1.0pp).
Faster convergence in both configurations: The quantum architecture reached higher validation accuracy earlier in training across both setups, suggesting more efficient extraction of semantic patterns regardless of model size.
Stable performance at scale: Configuration 2 demonstrates that quantum advantages persist even at higher absolute performance levels, indicating the approach scales effectively as classical baselines improve.
Robust across hyperparameter settings: The quantum advantage manifests across different learning rates, attention head counts, and model dimensions, suggesting the benefits are fundamental rather than configuration-dependent.

Implications

These results demonstrate that quantum attention mechanisms provide consistent, measurable advantages on semantic classification tasks across different model configurations. The larger quantum advantage in Configuration 1 (+7.0pp) suggests that quantum mechanisms may be particularly beneficial in resource-constrained scenarios or earlier training stages. Meanwhile, Configuration 2 shows that quantum advantages persist even at higher absolute performance levels (85.2%), indicating the approach remains competitive as classical baselines improve.

The consistency of quantum advantages across different hyperparameter settings—from learning rates to attention head configurations—suggests these benefits arise from fundamental properties of quantum attention rather than specific architectural choices, strengthening the case for quantum-enhanced approaches in production NLP systems.

Technical Details: Configuration Comparison

Configuration 1

d_model: 64

Attention heads: 4

Learning rate: 5e-5

Layers: 1

Batch size: 32

Quantum qubits: 4

63.6%

QETMHA Accuracy

56.6%

Classical Accuracy

Configuration 2

d_model: 128

Attention heads: 2

Learning rate: 2e-4

Layers: 1

Batch size: 32

Quantum qubits: 4

85.2%

QETMHA Accuracy

84.2%

Classical Accuracy

← Back to Quantum-Enhanced Transformers

Text Classification

The Task

Why This Matters

TREC-6 Dataset

Experimental Setup

Results: Two Experimental Configurations

Configuration 1: Smaller Model (4 Heads, Lower Learning Rate)

Configuration 2: Larger Model (2 Heads, Higher Learning Rate)

Configuration 1

Validation Accuracy

Configuration 2

Validation Accuracy

Side-by-Side Comparison

Final Accuracy Across Both Configurations

Quantum Advantage Across Configurations

Key Findings

Implications

Technical Details: Configuration Comparison

Configuration 1

Configuration 2