DL & NLP Study Guide — ENSAM Casablanca 2025/2026

Method	Type	Accuracy	F1	Latency	GPU?	OOV	Key Strength
Bag of Words	Classical	86.2%	0.86	~1 ms	No	No	Simple, interpretable, no training
N-gram (1+2)	Classical	~89%	~0.89	~3 ms	No	No	Captures "not good" bigrams
TF-IDF 1+2gram	Classical	90.1%	0.90	~3 ms	No	No	Best classical — rare word weighting
Word2Vec (mean pool)	Static	85.7%	0.86	~12 ms	No	No	Semantic similarity, vector analogies
GloVe (mean pool)	Static	76.8%	0.77	~10 ms	No	No	Analogy champion (WordSim-353)
FastText (mean pool)	Static	86.0%	0.86	~15 ms	No	Yes	OOV, noisy text, morphological languages
BERT fine-tuned	Contextual	93.9%	0.94	~370 ms (CPU)	Yes	Yes	Polysemy, negation, bidirectional context

Decision ladder (Golden Rule):
Start with TF-IDF + bigrams (fast baseline, no GPU, 90.1%). → If OOV/noisy text: use FastText. → If polysemy, negation, or nuance is critical and GPU available: BERT fine-tuned (93.9%).

Architecture Comparison — Three Families

Family	Typical Data	Processing	Strengths	Limitations
Feed-Forward (MLP)	Tabular, vectors	Dense layers stacked	Simplicity, universality	No spatial/temporal structure; parameter explosion on images
CNN	Images, 2D/3D signals	Local convolutions + pooling	Spatial invariance, feature hierarchy, transfer learning	No native temporal handling; needs large datasets
RNN / LSTM / GRU	Sequences (text, time-series, speech)	Temporal loop with hidden state + gates	Long-term memory (LSTM); streaming O(1) inference	Sequential training (slow); no native attention
Transformer	Any sequence (text, images, audio)	Parallel self-attention on all tokens	Parallelism; scales to internet-size data; emergent abilities	O(n²) memory in attention; needs massive compute to shine

Deep Learning& NLP

Deep Learning
& NLP