Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction

Leverage Large Language Models (LLMs) for low-resource language translation by providing linguistic descriptions in-context, bypassing the need for extensive training data. Evaluate performance using methods like Synchronous Context-Free Grammar Transduction to validate this data-independent approach.

advancedseveral hours6 steps

The play

Identify Low-Resource Translation Challenge
Recognize the limitations of traditional LLM machine translation for languages lacking large parallel corpora due to high data dependency.
Explore In-Context Linguistic Descriptions
Investigate how to provide LLMs with structured linguistic information (e.g., grammar rules, dictionary definitions, translation examples) directly in the prompt context, rather than through fine-tuning.
Design Context-Rich Prompts
Formulate prompts that embed grammar rules, vocabulary, and specific translation patterns for a target low-resource language, enabling the LLM to 'learn' on the fly. Focus on clarity and structure for the LLM to interpret.
Implement Translation Task
Apply your designed prompts to a translation task for a selected low-resource language, using an LLM capable of advanced in-context learning.
Evaluate with SCFG Transduction (or similar)
Assess the quality of the LLM's in-context translations. Consider using rigorous, structured evaluation methods like Synchronous Context-Free Grammar Transduction to objectively measure linguistic accuracy and coherence without relying on traditional BLEU scores if parallel data is scarce.
Analyze and Iterate
Review evaluation results to understand the LLM's performance with in-context linguistic descriptions. Iterate on prompt engineering and the quality of provided linguistic data to improve translation accuracy.

Starter code

```python
# Example of a structured prompt for in-context translation
def generate_translation_prompt(source_text, language_rules, vocabulary_map):
    rules_str = "\n".join([f"- {rule}" for rule in language_rules])
    vocab_str = "\n".join([f"{word_en}: {word_target}" for word_en, word_target in vocabulary_map.items()])

    prompt = f"""
Translate the following English text into MyLanguage, adhering to the provided linguistic rules and vocabulary.

### MyLanguage Grammar Rules:
{rules_str}

### MyLanguage Vocabulary:
{vocab_str}

### English Text to Translate:
{source_text}

### MyLanguage Translation:
"""
    return prompt

# Example Usage:
language_rules = [
    "Word order is Subject-Object-Verb (SOV).",
    "Adjectives follow the noun they modify.",
    "Plurals are formed by adding '-s' to the end of the noun."
]
vocabulary_map = {
    "hello": "salama",
    "world": "tany",
    "big": "lehibe",
    "house": "trano"
}
source_text = "Hello big world. This is a big house."

print(generate_translation_prompt(source_text, language_rules, vocabulary_map))
```

Source

Paperarxiv.org