FrozenLake¶

Experiment Quick Start Guide

This guide helps you quickly set up and run FrozenLake experiments with ReMe integration. The FrozenLake experiment demonstrates how task memory can improve an agent’s performance in a navigation task.

Environment Setup¶

1. Clone the Repository¶

git clone https://github.com/agentscope-ai/ReMe.git
cd ReMe/cookbook/frozenlake

2. FrozenLake Environment Setup¶

Install Gymnasium for FrozenLake environment:

pip install gymnasium

This will install:

gymnasium - for the FrozenLake environment
ray - for parallel execution
openai - for LLM API access
other dependencies

3. Start ReMe Service¶

If you haven’t installed ReMe yet, follow these steps:

# Go back to the project root
cd ../..

# Create a virtual environment (optional)
conda create -p ./reme-env python==3.10
conda activate ./reme-env

# Install ReMe
pip install .

Launch the ReMe service to enable memory library functionality:

reme \
  backend=http \
  http.port=8002 \
  llm.default.model_name=qwen-max-2025-01-25 \
  embedding_model.default.model_name=text-embedding-v4 \
  vector_store.default.backend=local

Add your api key for agent:

export OPENAI_API_KEY="xxx"
export OPENAI_BASE_URL="xxx"

Run Experiments¶

1. Quick Test: Performance Evaluation Only (Default)¶

Run the main experiment script to test agent performance using existing memory:

cd cookbook/frozenlake
python run_frozenlake.py

What this does:

Tests the agent on randomly generated FrozenLake maps
Uses the default memory library (frozenlake_no_slippery)
Evaluates performance with multiple runs for statistical significance
Results are automatically saved to ./exp_result/ directory

2. Advanced: Training + Testing (Memory Generation)¶

To create new memories through training and then test performance:

You can modify the experiment parameters directly in the run_frozenlake.py file. The main parameters are in the main() function:

def main():
    experiment_name = "frozenlake_no_slippery"  # Name of the experiment
    max_workers = 4                           # Number of parallel workers
    training_runs = 4                         # Runs per training map
    num_training_maps = 50                    # Number of maps for training
    test_runs = 1                             # Runs per test configuration
    num_test_maps = 100                       # Number of test maps
    is_slippery = False                       # Enable slippery mode

Key parameters to consider:

experiment_name: Used as the workspace ID for task memory
is_slippery: When True, agent movement becomes stochastic (harder)
max_workers: Increase for faster execution on multi-core systems

3. View Experiment Results¶

After running experiments, analyze the statistical results:

python run_exp_statistic.py

What this script does:

Processes all result files in ./exp_result/
Calculates success rates and performance metrics
Generates a summary table showing performance comparisons
Analyzes the effect of task memory on performance
Saves results to frozenlake_summary.csv

Understanding the Implementation¶

Key Components¶

FrozenLakeReactAgent (frozenlake_react_agent.py)
- Implements a ReAct agent that interacts with the FrozenLake environment
- Handles task memory retrieval and storage
- Uses LLM (via OpenAI API) for decision making
Experiment Runner (run_frozenlake.py)
- Manages the overall experiment flow
- Handles training and testing phases
- Uses Ray for parallel execution
Map Manager (map_manager.py)
- Generates and manages test maps
- Ensures consistent evaluation across experiments
Statistics Analyzer (run_exp_statistic.py)
- Processes experiment results
- Calculates performance metrics
- Generates comparative analysis

Output Files¶

./exp_result/*_training.jsonl: Results from training phase
./exp_result/*_test_no_memory.jsonl: Test results without task memory
./exp_result/*_test_with_memory.jsonl: Test results with task memory
./exp_result/frozenlake_summary.csv: Statistical summary

Task Memory Mechanism¶

The task memory system works as follows:

Memory Creation: During training, successful trajectories are sent to the ReMe service
Memory Retrieval: During testing, the agent queries relevant memories based on the current map
Memory Application: The agent uses retrieved memories to guide its decision-making

The experiment demonstrates how task memory can significantly improve performance, especially in challenging environments like the slippery FrozenLake.