1. Explain the Transformer Architecture

The Transformer architecture is the foundation of modern LLMs such as GPT, BERT, and LLaMA.

Introduced in the paper “Attention Is All You Need” (2017), it replaced traditional RNNs and LSTMs.

Key Components:

a) Encoder–Decoder Structure

  • Encoder: Understands input text

  • Decoder: Generates output text

(Some models like GPT use only Decoder.)

b) Self-Attention Mechanism

Allows the model to understand relationships between words regardless of distance.

Example:

“The student who studied hard passed the exam.”

The model links student with passed, even though they are far apart.

c) Feedforward Neural Network

Processes attention outputs through dense layers.

d) Positional Encoding

Adds word order information since Transformers don’t process sequences sequentially.

Why Transformers Work So Well:

  • Parallel processing

  • Faster training

  • Better long-context understanding

  • Highly scalable


2. How Do You Fine-Tune an Open-Source Model on Custom Data?

Fine-tuning adapts a pre-trained model to a specific domain.

Step-by-Step Process:

Step 1: Dataset Preparation

  • Clean data

  • Remove noise

  • Format as instruction-response pairs

Example:


 

{"prompt": "Explain RAG", "response": "RAG is..."}

Step 2: Model Selection

Popular models:

  • LLaMA

  • Mistral

  • Falcon

  • Bloom

Step 3: Choose Fine-Tuning Method

Full Fine-Tuning

Updates all parameters (expensive)

Parameter-Efficient Fine-Tuning (PEFT)

  • LoRA

  • QLoRA

  • Adapters

Most companies use PEFT.

Step 4: Training

Using:

  • Hugging Face Trainer

  • DeepSpeed

  • PyTorch Lightning

Step 5: Evaluation & Deployment

Evaluate → Optimize → Deploy via API


3. What Is Self-Attention and Why Does It Work?

Self-attention enables each word to focus on other relevant words.

It uses:

  • Query (Q)

  • Key (K)

  • Value (V)

Formula:


 

Attention(Q,K,V) = softmax(QK? / √d) V

Why It Works:

  • Captures long-term dependencies

  • Enables contextual understanding

  • Eliminates recurrence

This is why LLMs understand meaning, not just keywords.


4. What Are Embeddings and How Are They Created?

Embeddings are numerical representations of text.

They convert meaning into vectors.

Example:


 

"AI is powerful" → [0.12, -0.45, 0.89, ...]

Creation Process:

  • Tokenization

  • Neural projection

  • Dimensional compression

Popular Embedding Models:

  • OpenAI Embeddings

  • Sentence-BERT

  • Instructor-XL

  • Azure OpenAI

Use Cases:

  • Semantic search

  • Recommendation

  • RAG systems

  • Clustering


5. How Do You Reduce a 70B Model to 8B Parameters?

This process is called Model Compression.

Main Techniques:

a) Knowledge Distillation

Train a small “student” model from a large “teacher” model.

b) Pruning

Remove unimportant weights.

c) Quantization

Reduce precision (FP32 → INT8 / INT4).

d) Parameter Sharing

Reuse parameters across layers.

Result:

Smaller size, faster inference, lower cost.


6. What Metrics Are Used to Evaluate Fine-Tuned Models?

Evaluation depends on task type.

For Text Generation:

  • Perplexity

  • BLEU

  • ROUGE

  • METEOR

For Classification:

  • Accuracy

  • Precision

  • Recall

  • F1-score

For LLM Quality:

  • Human Evaluation

  • Win Rate

  • Helpfulness Score

  • Hallucination Rate

Production Metrics:

  • Latency

  • Throughput

  • Cost per Query


7. Explain Quantization Techniques

Quantization reduces numerical precision.

Types:

Type Precision Usage
FP32 32-bit Training
FP16 16-bit Mixed precision
INT8 8-bit Inference
INT4 4-bit Edge devices

Methods:

  • Post-Training Quantization

  • Quantization-Aware Training

Benefits:

  • Faster inference

  • Lower memory

  • Lower cost


8. What Is Multi-Head and Cross Attention?

Multi-Head Attention

Multiple attention layers run in parallel.

Each head focuses on different linguistic patterns.

Example:

  • One head → Grammar

  • One head → Meaning

  • One head → Context

Cross-Attention

Used in encoder-decoder models.

It connects:
Encoder output → Decoder input

Used in:

  • Translation

  • Summarization

  • Multimodal AI


9. Challenges in Fine-Tuning and How to Overcome Them

Common Challenges:

Problem Solution
Overfitting Regularization, more data
Hallucination RAG, filtering
Bias Data balancing
High Cost LoRA, QLoRA
Forgetting Continual learning

Best Practices:

  • Use domain-specific data

  • Apply early stopping

  • Monitor validation loss

  • Combine with RAG


10. Regularization Techniques in LLM Training

Regularization prevents overfitting.

a) Dropout

Randomly disables neurons.

b) Label Smoothing

Reduces overconfidence.

c) Weight Decay

Penalizes large weights.

d) Early Stopping

Stops training when performance drops.

These techniques improve generalization.


11. Loss Functions in NLP Model Training

Loss functions guide learning.

Common Losses:

Cross-Entropy Loss (Most Common)

Used in language modeling.

Masked Language Loss

Used in BERT.

Contrastive Loss

Used in embeddings.

Reinforcement Learning Loss

Used in RLHF.

Perplexity

Derived from cross-entropy, measures uncertainty.


Final Thoughts

Mastering these concepts prepares you for:

? AI Engineer
? LLM Engineer
? Data Scientist
? ML Researcher
? GenAI Consultant

Modern interviews focus not just on theory, but practical system design and optimization.


About WiFi Learning

At WiFi Learning, we provide industry-focused training in:

  • Generative AI

  • Data Science

  • LLM Engineering

  • Cloud AI Systems

With hands-on projects and expert mentoring.

???? Visit: wifilearning.com

QUESTIONS
Main Question: To well prepare for placement
Layana on 24-Jan-2026 10:13:32 Answer
17-Jul-2024

Corporate Training Partners

img

Times group is a leading brand in the field of Skills enhancement for corporate in IT and Non IT domain. Wifi learning has been associated with it since last 3 years and served for many corporate.

img

Futurense is a company which works on Get Hired, Trained and deployed with fortune 500. We have been continuously working for futurense for various domain specially IT Domain.

img

Jain University is a private deemed university in Bengaluru, India. Originating from Sri Bhagawan Mahaveer Jain College, it was conferred the deemed-to-be-university status in 2009. Wifi learning has been associated with it since 2020 and has been serving for B.Tch and MBA candidates.

img

SBI Cards & Payment Services Ltd., previously known as SBI Cards & Payment Services Private Limited, is a credit card company and payment provider in India. SBI Card launched in October 1998 by State Bank of India

Our Alumni Work At

Top agencies and brands across the globe have recruited Wifi Learning Alumni.