Article

few-shot-learningdomain-adaptationnlptext-classificationtransfer-learninglow-resource-nlptransformersembeddings

Adapt Models to New Domains with Few-Shot Adaptation

Adapt a pre-trained model to a new, specific domain using only a few labeled examples. This is crucial when labeled data is scarce, allowing you to specialize models for niche tasks without massive datasets.

intermediate30 min4 steps

The play

Prepare Your Few-Shot Data
Define a small, labeled training set (the 'few shots') and a test set from your target domain. This technique works because we leverage a powerful base model, so we only need a handful of examples to teach it the nuances of our specific domain.
Generate Embeddings
Use a pre-trained sentence transformer model to convert your text data into numerical vectors (embeddings). These embeddings capture the semantic meaning of the text, learned from a massive general-purpose dataset.
Train an Adaptation Classifier
Train a simple, lightweight classifier (like Logistic Regression) on the embeddings of your few-shot examples. This is the core of the Few-Shot Domain Adaptation: you are quickly teaching a new 'head' to interpret the general-purpose embeddings for your specific task.
Evaluate on the Target Domain
Use your newly adapted classifier to make predictions on the unseen test set from your target domain. This validates whether the model has successfully adapted to the new data distribution.

Starter code

import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression

def run_few_shot_adaptation():
    """Demonstrates Few-Shot Domain Adaptation for sentiment analysis."""
    # 1. Prepare Data: Simulate a low-resource target domain (e.g., financial news)
    # We have only 3 labeled examples per class for training.
    few_shot_train_data = {
        "texts": [
            "The new fiscal policy is boosting the market.",
            "Earnings per share exceeded all expectations.",
            "The company announced record profits.",
            "Regulatory hurdles are causing a downturn.",
            "The stock plummeted after the announcement.",
            "Unexpected losses led to a credit downgrade."
        ],
        "labels": [1, 1, 1, 0, 0, 0]  # 1 for positive, 0 for negative
    }

    # Unseen data from the same target domain for testing
    target_test_data = {
        "texts": [
            "The acquisition is expected to drive significant growth.",
            "The firm missed its quarterly revenue target badly.",
            "Market sentiment is overwhelmingly positive after the merger.",
            "A new lawsuit threatens the company's future."
        ],
        "labels": [1, 0, 1, 0]
    }

    print("Step 1: Data prepared.")

    # 2. Generate Embeddings: Use a powerful pre-trained model
    print("Step 2: Loading model and generating embeddings... (This may take a moment)")
    model = SentenceTransformer('all-MiniLM-L6-v2')
    train_embeddings = model.encode(few_shot_train_data["texts"])
    test_embeddings = model.encode(target_test_data["texts"])
    print(f"Generated {train_embeddings.shape[0]} training embeddings.")

    # 3. Train Adaptation Classifier: The core of Few-Shot Domain Adaptation
    print("Step 3: Training a simple classifier on the few-shot examples...")
    classifier = LogisticRegression()
    classifier.fit(train_embeddings, few_shot_train_data["labels"])
    print("Classifier trained.")

    # 4. Evaluate on Target Domain
    print("Step 4: Evaluating the adapted model...")
    predictions = classifier.predict(test_embeddings)
    accuracy = np.mean(predictions == np.array(target_test_data["labels"]))

    print("\n--- Results ---")
    for i, txt in enumerate(target_test_data["texts"]):
        pred_label = 'Positive' if predictions[i] == 1 else 'Negative'
        print(f'Text: "{txt}" -> Prediction: {pred_label}')
    
    print(f"\nAccuracy on target domain test set: {accuracy:.2f}")

if __name__ == '__main__':
    # To run this code, first install the required libraries:
    # pip install numpy scikit-learn sentence-transformers torch
    run_few_shot_adaptation()