Filter by tags:

Other Stats

Average R-squared

16.16%

What is the average R-squared of all the runs? This tells us linear regression alignment human vs AI. 100% means AI perfectly predicts a human vote. 0% means AI doesn't predict it at all.

Combined R-squared

17.14%

When we set the model according to all pairs from all runs what is the R-squared?

Eval Runs

42

Total number of evaluation runs.

Total Cost of All Evals

$302.14

The sum of costs for all evaluation runs.

Average Cost per Test Run

$7.19

The average cost of a single evaluation run.

Total Duration of All Evals

3471m 41s

The sum of durations for all evaluation runs.

Average Duration per Test Run

82m 40s

The average duration of a single evaluation run.

All Generations

1603

The sum of all generations for all evaluation runs.