Async Agent Evaluation Runner
Instructions:
- This app uses an advanced AsyncCodeAgentManager with timeout handling and fallback models.
- Primary Model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- Fallback Model:
Qwen/Qwen2.5-Coder-32B-Instruct
(used on timeout) - Log in to your Hugging Face account using the button below.
- Click 'Run Evaluation & Submit All Answers' to process all questions efficiently.
Features:
- Async processing for better performance
- Automatic fallback model on timeout
- Batch processing of multiple questions
- Enhanced error handling and logging
- Timeout: 3 minutes per question
Note: The async agent processes questions more efficiently and provides better error recovery.
Agent Configuration:
- Primary Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- Fallback Model: Qwen/Qwen2.5-Coder-32B-Instruct
- Timeout: 180 seconds per question
- Max Steps: 15
Questions and Agent Answers