Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction

Enable LLMs to translate low-resource languages by providing grammar rules, vocabulary, and examples directly in the prompt. This method bypasses the need for extensive training data, leveraging the LLM's in-context learning ability.

intermediate1 hour4 steps

The play

Understand Low-Resource Translation Challenge
Recognize that traditional LLM translation methods require vast parallel datasets. For low-resource languages, this data is scarce. The opportunity lies in LLMs' ability to interpret explicit linguistic rules provided in-context.
Compile Linguistic Descriptions
Gather or create structured linguistic descriptions for your target low-resource language. These should include grammar rules, vocabulary, and example translations, formatted clearly for LLM ingestion.
Craft In-Context Translation Prompt
Construct an LLM prompt that embeds the linguistic descriptions (from Step 2), clear instructions for translation, and the source text you want to translate. Emphasize strict adherence to the provided rules.
Execute LLM Call and Evaluate
Send your crafted prompt to an LLM (e.g., via OpenAI API) and critically evaluate the generated translation. Assess its accuracy, adherence to the provided rules, and overall fluency in the target low-resource language.

Starter code

import os
from openai import OpenAI

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

# Initialize OpenAI client
client = OpenAI()

# --- Linguistic Descriptions for K'iche' (Example) ---
linguistic_description_text = """
# Language: K'iche' (Example)

## Grammar Rules:

1.  **Sentence Structure:** K'iche' typically follows a Verb-Object-Subject (VOS) or Verb-Subject-Object (VSO) order, though VSO is common for transitive verbs.
    *   Example: `(Verb) (Object) (Subject)` or `(Verb) (Subject) (Object)`

2.  **Possession:** Formed by prefixing a possessive marker to the noun.
    *   Example: `nu-ja` (my-house), `a-ja` (your-house)

3.  **Pluralization:** Nouns are often pluralized by context or by using plural markers like `e` before the noun or `taq` after.
    *   Example: `e achi` (the men), `ixöq taq` (women)

## Vocabulary:

*   `ati't`: grandmother
*   `ja`: house
*   `utzi':` good, well
*   `waram`: sleep
*   `binäq`: walked
*   `x-in-war`: I slept (past tense, 1st person singular)
*   `x-a-bin`: you walked (past tense, 2nd person singular)

## Translation Examples (for LLM to learn from):

*   English: "The man walked to the house."
    K'iche': "X-bin ri achi pa ri ja."
*   English: "My grandmother is good."
    K'iche': "Utzi' ri nu-ati't."
"""

def translate_with_in_context_rules(llm_client, source_text, language_rules):
    prompt = f"""
You are an expert linguist specializing in translation. Below are grammar rules, vocabulary, and example translations for a low-resource language. Your task is to translate the provided English text into this language, strictly adhering to the given rules and vocabulary.

{language_rules}

---

Translate the following English text into K'iche':

English: "{source_text}"
K'iche':
"""

    print(f"\n--- PROMPT ---\n{prompt}\n---\n")

    try:
        response = llm_client.chat.completions.create(
            model="gpt-4o", # Or another suitable LLM model
            messages=[
                {"role": "system", "content": "You are a helpful and precise translator."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error during LLM call: {e}"

# --- Example Usage ---
source_english_text = "My grandmother slept in the house."
translated_text = translate_with_in_context_rules(client, source_english_text, linguistic_description_text)

print(f"Original English: {source_english_text}")
print(f"Translated K'iche': {translated_text}")

# Expected (approximate) K'iche' output based on rules: "X-war ri nu-ati't pa ri ja."