Paper·arxiv.org
machine-learningresearchai-agentsinfrastructureevaluation
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Improve video world models by implementing 'Hybrid Memory' to track dynamic subjects that move out of sight and re-emerge. This prevents simulation errors like freezing or vanishing, leading to more consistent and realistic world simulations for AI agents.
advanced2 hours5 steps
The play
- Analyze Current Model LimitationsEvaluate your existing video world models for failure modes in dynamic environments, specifically when subjects temporarily leave the frame or become occluded. Identify instances of freezing, vanishing, or distortion upon re-emergence.
- Define Hybrid Memory RequirementsOutline the functional specifications for a memory system that can robustly track objects even when they are not directly observable. Consider requirements for short-term (visible) and long-term (occluded) state management, and how to associate re-emerging objects with their past states.
- Design Memory ArchitecturePropose an architecture for a 'Hybrid Memory' component. This might involve a combination of explicit object tracking for visible entities and a more persistent, abstract representation for objects that have gone out of sight, along with mechanisms for re-identification.
- Integrate with World ModelDetermine how the designed Hybrid Memory will interact with your world model's perception, prediction, and latent state components. Map out the data flow for storing and retrieving object information to maintain world consistency across occlusions.
- Develop Dynamic Evaluation MetricsEstablish specific metrics to assess the improved model's performance in dynamic scenarios. Focus on evaluating object persistence, re-identification accuracy, and simulation consistency when objects re-emerge after being out of sight.
Starter code
import torch
class HybridMemory:
def __init__(self, memory_capacity=100):
self.short_term_memory = {}
self.long_term_memory = {}
self.memory_capacity = memory_capacity
self.next_id = 0
def store_visible_object(self, obj_id, current_state):
"""Stores or updates the state of a currently visible object."""
self.short_term_memory[obj_id] = current_state
if obj_id not in self.long_term_memory:
self.long_term_memory[obj_id] = current_state # Initialize long-term if new
def object_occluded(self, obj_id):
"""Moves object from short-term to long-term only, marking it as occluded."""
if obj_id in self.short_term_memory:
# Potentially update long-term with last known state before occlusion
self.long_term_memory[obj_id] = self.short_term_memory.pop(obj_id)
def retrieve_object_state(self, obj_id):
"""Retrieves the most current known state of an object, visible or occluded."""
if obj_id in self.short_term_memory:
return self.short_term_memory[obj_id]
elif obj_id in self.long_term_memory:
return self.long_term_memory[obj_id]
return None # Object not found
def re_identify_object(self, new_detection_state, potential_ids):
"""Conceptual method: matches a new detection to a known occluded object."""
# This would involve feature matching, trajectory prediction, etc.
for obj_id in potential_ids:
if obj_id in self.long_term_memory: # Check if this ID is in long-term memory
# Placeholder for actual re-identification logic
# e.g., compare new_detection_state with self.long_term_memory[obj_id]
# For now, just return the first potential match
print(f"Re-identified object {obj_id} from long-term memory.")
return obj_id
return self.create_new_object(new_detection_state)
def create_new_object(self, initial_state):
"""Creates a new entry for a newly appearing object."""
new_obj_id = f"obj_{self.next_id}"
self.next_id += 1
self.short_term_memory[new_obj_id] = initial_state
self.long_term_memory[new_obj_id] = initial_state
return new_obj_id
# Example Usage (conceptual)
memory = HybridMemory()
# Frame 1: Object appears
obj_a_id = memory.create_new_object(torch.randn(1, 64)) # Latent state
print(f"Created new object: {obj_a_id}")
# Frame 2: Object still visible
memory.store_visible_object(obj_a_id, torch.randn(1, 64) * 1.1) # Updated state
# Frame 3: Object goes out of sight
memory.object_occluded(obj_a_id)
print(f"Object {obj_a_id} occluded. Current state: {memory.retrieve_object_state(obj_a_id)[:2]}...")
# Frame N: Object re-emerges, needs re-identification
re_emerging_state = torch.randn(1, 64) * 0.9 # New detection
re_identified_id = memory.re_identify_object(re_emerging_state, [obj_a_id])
if re_identified_id == obj_a_id:
memory.store_visible_object(re_identified_id, re_emerging_state)
print(f"Successfully re-integrated object {re_identified_id} with new state: {memory.retrieve_object_state(re_identified_id)[:2]}...")Source