Filter by tags:

Other Stats

Average R-squared

20.96%

What is the average R-squared of all the runs? This tells us linear regression alignment human vs AI. 100% means AI perfectly predicts a human vote. 0% means AI doesn't predict it at all.

Combined R-squared

17.14%

When we set the model according to all pairs from all runs what is the R-squared?

Eval Runs

29

Total number of evaluation runs.

Total Cost of All Evals

$186.26

The sum of costs for all evaluation runs.

Average Cost per Test Run

$6.42

The average cost of a single evaluation run.

Total Duration of All Evals

2664m 13s

The sum of durations for all evaluation runs.

Average Duration per Test Run

91m 52s

The average duration of a single evaluation run.

All Generations

1050

The sum of all generations for all evaluation runs.