Evaluation suites

Test cases, expected criteria, judge-based automated scoring across six dimensions, and rubric design.

This section is being written. Check back soon.