Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Generate realistic synthetic doctor-patient conversations to create long-form audio datasets. This addresses data scarcity for training and evaluating AI models, particularly for summarization in sensitive domains like healthcare.

intermediate2 hours5 steps

The play

Define Conversation Scenarios & Personas
Create a JSON object detailing patient and doctor personas, and the consultation scenario. Specify a desired duration range and key topics for the conversation.
Generate Dialogue with Large Language Models (LLMs)
Use an LLM (e.g., OpenAI GPT-4o) to generate a multi-turn conversation based on your defined scenario. Craft a detailed system prompt to guide the LLM to produce a realistic, long-form dialogue.
Simulate Audio with Text-to-Speech (TTS)
Convert the generated dialogue into audio files using a Text-to-Speech (TTS) model. Assign distinct voices for the doctor and patient to create realistic multi-speaker audio transcripts suitable for model training.
Generate Ground-Truth Summaries
Create concise, accurate, and structured summaries of the generated conversations. These summaries will serve as the ground-truth data for training and evaluating your long-form audio summarization models.
Evaluate and Iterate for Quality
Review the synthetic conversations, audio quality, and generated summaries for realism, accuracy, and adherence to the defined scenario. Adjust personas, LLM prompts, and TTS settings as needed to continuously improve the quality and diversity of your synthetic dataset.

Starter code

import openai
import json
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # <<< REPLACE THIS

scenario_data = {
  "scenario_id": "DP-001",
  "patient": {
    "name": "Alice Smith",
    "age": 45,
    "gender": "female",
    "chief_complaint": "persistent headache for 2 weeks",
    "medical_history": ["migraines (controlled)", "seasonal allergies"],
    "emotional_state": "concerned but cooperative",
    "communication_style": "detailed, asks questions"
  },
  "doctor": {
    "name": "Dr. Ben Carter",
    "specialty": "Neurologist",
    "communication_style": "thorough, empathetic",
    "consultation_goal": "diagnose headache cause, propose treatment plan"
  },
  "dialogue_constraints": {
    "duration_minutes": [15, 25],
    "key_topics": ["headache characteristics", "triggers", "medication history", "neurological exam", "next steps"]
  }
}

def generate_dialogue(scenario_data):
    system_prompt = f"""You are an AI assistant specialized in generating realistic doctor-patient conversations.
    Generate a detailed dialogue between a doctor and a patient based on the provided scenario.
    Ensure the conversation flows naturally, covers the key topics, and reflects the personas' communication styles.
    The dialogue should be long-form, aiming for a length that would correspond to {scenario_data['dialogue_constraints']['duration_minutes'][0]}-{scenario_data['dialogue_constraints']['duration_minutes'][1]} minutes of speech.
    Format the output as a JSON object with a single key 'dialogue', whose value is a list of dictionaries, each with 'speaker' and 'utterance' keys.
    """

    user_prompt = f"""
    Scenario ID: {scenario_data['scenario_id']}
    Patient: {json.dumps(scenario_data['patient'], indent=2)}
    Doctor: {json.dumps(scenario_data['doctor'], indent=2)}
    Dialogue Constraints: {json.dumps(scenario_data['dialogue_constraints'], indent=2)}

    Generate the conversation now.
    """

    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o", # or gpt-3.5-turbo
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.7,
        max_tokens=2000, # Adjust as needed for long-form
        response_format={"type": "json_object"}
    )
    try:
        generated_content = json.loads(response.choices[0].message.content)
        return generated_content
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON: {e}")
        print(f"Raw LLM response: {response.choices[0].message.content}")
        return None

dialogue = generate_dialogue(scenario_data)
if dialogue:
    print(json.dumps(dialogue, indent=2))