Filter by tags:

Other Stats

Average R-squared

17.21%

What is the average R-squared of all the runs? This tells us linear regression alignment human vs AI. 100% means AI perfectly predicts a human vote. 0% means AI doesn't predict it at all.

Combined R-squared

17.14%

When we set the model according to all pairs from all runs what is the R-squared?

Eval Runs

43

Total number of evaluation runs.

Total Cost of All Evals

$306.62

The sum of costs for all evaluation runs.

Average Cost per Test Run

$7.13

The average cost of a single evaluation run.

Total Duration of All Evals

3503m 56s

The sum of durations for all evaluation runs.

Average Duration per Test Run

81m 29s

The average duration of a single evaluation run.

All Generations

1624

The sum of all generations for all evaluation runs.