CRAB (Cross-environment Agent Benchmark) by CAMEL-AI is an innovative open-source framework designed to benchmark and evaluate multimodal AI agents that operate across multiple devices and environments simultaneously. Unlike most existing agent benchmarks that limit AI agents to a single device or platform, CRAB enables agents to coordinate and perform complex tasks spanning various systems like Ubuntu computers and Android smartphones. It features a modular design with a novel graph evaluator for fine-grained task progress monitoring and a task synthesis system to generate diverse, realistic benchmarking tasks. CRAB aims to become a standard for assessing real-world, multi-agent AI workflows while simplifying environment creation and benchmarking.
Key Features
Cross-platform multi-environment support allowing agents to control multiple devices at once through a unified Python interface.
Modular, easy-to-use configuration with Python decorators to define actions and environments flexibly.
Use Cases
Benchmarking multimodal AI agents that interact with graphical user interfaces across computers, phones, and other devices.
Evaluating and improving AI agent coordination in multi-agent systems with complex workflows spanning multiple environments.
Developing robust AI assistants capable of managing interconnected devices for tasks like cross-device photo editing or multi-app automation.
Technical Specifications
Python-centric framework requiring Python 3.10+ with pip installable packages.
Supports deployment in-memory, Docker containers, virtual machines, or multiple physical machines accessible via Python.
Includes an interaction protocol and implementation for seamless communication between agents and environments with open-source code and datasets available on GitHub.