相关标签
agentopen-sourceplaygroundaimonitoringevaluationopenaiobservabilityagentopscoze

Here are 287 public repositories matching this topic...

Catch your AI's mistakes and blind spots before your customers or regulators do. iFixAi runs 45 inspections, 32 graded core plus 13 extended for frontier risks like sabotage, sandbagging, and oversight evasion. It returns a letter grade in under 5 minutes. Industry and model agnostic.

  • Updated Jun 13, 2026
  • Python
ai-agents-reality-check

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistical validation.

  • Updated Apr 2, 2026
  • Python

🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems

  • Updated May 28, 2025