Rhetorical Questions in LLM Representations: A Linear Probing Study

Investigate how Large Language Models internally represent rhetorical questions. Use linear probing on LLM embeddings from social media data to understand how models process nuanced and persuasive language, crucial for advanced AI.

intermediate3-6 hours5 steps

The play

Understand Linear Probing
Grasp linear probing as a technique to evaluate what information is encoded in LLM embeddings. A simple linear classifier trained on fixed embeddings reveals if the LLM's representations contain linearly separable information about a specific property (e.g., rhetorical questions).
Prepare Annotated Data
Acquire or create a dataset of text (e.g., social media posts). Manually annotate each sentence with a label indicating if it's a 'rhetorical_question' or 'non_rhetorical_question'. This is the most time-consuming step and requires clear guidelines. Example Annotation Schema: ```json [ {"text": "Are you serious?", "label": "rhetorical_question"}, {"text": "What is the capital of France?", "label": "non_rhetorical_question"} ] ```
Extract LLM Embeddings
Use a pre-trained LLM (e.g., BERT) and the `transformers` library to generate contextualized embeddings for each sentence in your annotated dataset. For sentence-level tasks, typically use the `[CLS]` token's embedding.
Train Linear Classifier
Split your extracted embeddings (X) and corresponding labels (y) into training and testing sets. Train a simple linear classifier (e.g., Logistic Regression from `scikit-learn`) on the training data.
Evaluate Performance
Evaluate the trained linear classifier's performance on the test set using metrics like accuracy, precision, recall, and F1-score. High performance indicates that the LLM's embeddings effectively encode information about rhetorical questions.

Starter code

import torch
from transformers import AutoTokenizer, AutoModel

# 1. Choose an LLM and load its tokenizer and model
model_name = "bert-base-uncased" # Example: BERT base uncased
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# 2. Define a sentence to get its embedding
sentence = "Are you really going to do that?"

# 3. Tokenize the sentence
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True, max_length=512)

# 4. Get the LLM output (embeddings)
with torch.no_grad():
    outputs = model(**inputs)

# 5. Extract the [CLS] token embedding (sentence representation)
# This is typically the first token's hidden state for sentence-level tasks
sentence_embedding = outputs.last_hidden_state[:, 0, :].squeeze()

print(f"Sentence: '{sentence}'")
print(f"Embedding shape: {sentence_embedding.shape}")
print(f"First 10 dimensions of embedding: {sentence_embedding[:10].numpy()}")