Async Agent Evaluation Runner

Instructions:

  1. This app uses an advanced AsyncCodeAgentManager with timeout handling and fallback models.
  2. Primary Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
  3. Fallback Model: Qwen/Qwen2.5-Coder-32B-Instruct (used on timeout)
  4. Log in to your Hugging Face account using the button below.
  5. Click 'Run Evaluation & Submit All Answers' to process all questions efficiently.

Features:

  • Async processing for better performance
  • Automatic fallback model on timeout
  • Batch processing of multiple questions
  • Enhanced error handling and logging
  • Timeout: 3 minutes per question

Note: The async agent processes questions more efficiently and provides better error recovery.

Agent Configuration:

  • Primary Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
  • Fallback Model: Qwen/Qwen2.5-Coder-32B-Instruct
  • Timeout: 180 seconds per question
  • Max Steps: 15

Questions and Agent Answers

Questions and Agent Answers