Benchmark Education accelerates grading and boosts student feedback with generative AI on AWS

For K12 teachers, grading open-ended assignments is often one of their most time-consuming tasks. The more hours a teacher spends meticulously reviewing an assignment, the less time they spend engaging with students directly—through one-on-one support, tailored discussion, and other meaningful interactions. Still, assignment feedback is important, and delays can mean missed opportunities for students to improve their understanding and skills.

Benchmark Education Company, a leading provider of literacy and language programs with an award-winning education technology (EdTech) platform, wanted to change this. Working closely with Amazon Web Services (AWS), Benchmark Education built a grading tool powered by generative artificial intelligence (AI) to help teachers dramatically reduce the time they spend grading open-ended assessments—while maintaining accuracy, privacy, and trust. Now, teachers using the tool can reinvest their time into engaging with students. Meanwhile, students receive actionable feedback and more opportunities for personalized instruction.

Assessing student achievement is valuable—but costs teachers time

Benchmark Education Company is a leading publisher of core, supplemental, and intervention literacy and language resources in English and Spanish, with valid and reliable digital assessments that inform instruction. Benchmark Advance, a knowledge-based literacy solution, is one of Benchmark Education’s core offerings. The award-winning program frequently assesses students’ progress through the material to measure their mastery and growth. Many assessment questions require open-ended responses to language arts-related questions to thoroughly engage learners in the material and provide teachers with critical and valuable insights about each individual student’s understanding of the material.

But grading open-ended responses thoroughly and consistently takes time—an extremely finite resource in a teacher’s busy schedule, which also includes lesson planning, lecturing, family engagement, and more. “Over the years, we observed that tests with open-ended questions would not always get graded,” says Christian Carey, Senior Vice President of Software Engineering and Architecture at Benchmark Education Company. “We were seeing 10-12 percent of our tests were not being graded and then not represented in the reporting suite because they were not graded.” This lack of feedback data makes it difficult for teachers to accurately track student performance or identify opportunities for focused reteaching and review. Plus, it means students don’t get constructive feedback that can help them improve their understanding of the material.

The Benchmark Education team saw this as an opportunity. They knew applying generative AI to this problem could save teachers time while accurately and consistently measuring student progress—so they turned to AWS.

Working backward to build a generative AI grading solution

Once Benchmark Education decided to build a generative AI tool, the team collaborated with AWS on a “Working Backwards” session: an Amazonian approach to vetting ideas and creating new products. These sessions dive deep into creating a comprehensive picture of precisely the kind of success they want to bring customers and working backward from each core component of that vision to build a roadmap for development.

“We went through a press release writing exercise to understand what we wanted to build,” says Jolene Newton, Vice President of Platform Product Management at Benchmark Education. “That was an incredible experience that helped us determine what we wanted to accomplish.” The Working Backwards session explored the questions and risks associated with developing the new product while further refining the value it could bring to educators.

Shortly afterward, the AWS Product Acceleration team collaborated with Benchmark Education to create an architectural proposal for the solution and built a proof-of-concept (POC) during a “build and demo week”. Working together for nearly a week, AWS helped Benchmark Education define its minimum viable product (MVP) and upskill Benchmark’s engineers so they had the confidence to bring it to life.

Training AI data for accuracy and confidence to support student success

With AWS’s support, Benchmark Education built the generative AI grading tool using Amazon Bedrock. This fully managed service lets AWS customers choose their own secure foundation model. The solution stores assessment responses, feedback, and grades in Amazon Relational Database Service (Amazon RDS). The feature uses AWS Step Functions and AWS Lambda for serverless orchestration. The Benchmark Education team also uses Amazon SageMaker to accelerate experimentation by analyzing data and quickly iterating prompts or other components built into the AI model.

AI model training and data management was the most comprehensive part of the development process. Benchmark Education onboarded a data scientist to the development process along with a team of K12 teachers to grade hundreds of open-ended assignments against strict and consistent rubric criteria that helped inform the AI model’s assessment and feedback patterns. These graders also manually reviewed hundreds more assignments and compared them against the output of the generative AI tool to assess its effectiveness and accuracy. This independent panel helped reduce bias and make sure grades were carefully assigned according to a specific rubric rather than teacher preference, historical performance, or the need to quickly grade assignments.

“This became our ground-truth data, our baseline that we evaluated against while experimenting with different prompts, models, and approaches,” says Rahul Kulkarni, a solutions architect at AWS that supported the project.

“We spent a lot of time on justification, providing reasons for why a score was given to build trust with the model’s output,” explains Carey. “We took the time to analyze the data to make sure the accuracy was high. We wanted to deliver a great user experience, and we also needed teachers to trust the results.”

Keeping teachers in control

Some teachers may have concerns about using generative AI in the classroom, but in Benchmark Education’s solution, the teacher is always in control. When a student submits an open-ended response, the generative AI tool grades it against a set of established rubric criteria. The model then suggests an appropriate grade and provides feedback on the response to justify the grade.

When the teacher reviews the grade and feedback, they decide whether to accept the suggested grade or override it. If they decide to override the grade, the application prompts them to briefly explain why to help inform their feedback process. The teacher can always provide their own grade and feedback if they choose. This feedback is then incorporated back into the model to increase grading accuracy.

“A key part of how we’re approaching all of our AI features is to always have a human involved,” says Carey. “The AI’s role is to advise the user, and the user can either accept what the AI is telling them, or they can change it and provide their own context.”

And this commitment to model accuracy and teacher trust is paying off. In just a few months, teachers have built trust with the model and want more ways to deliver the model’s feedback to students quickly. “The type of request I’m receiving now is, ‘Can’t you just give me the easy button?'” explains Newton. One of the most common requests in user feedback is a function to “Accept All” changes once the teacher validates and trusts the model’s suggestions, instead of confirming each suggestion—this speaks to the model’s ability to match teacher reasoning and provide an accurate grade. Teachers also want ways to more rapidly deliver the model’s feedback to the students.

Saving teachers’ time to reinvest in students

Upon launch, Benchmark Education originally planned to limit the tool’s release to a single school grade in its first year. The material and reading levels across grades are so varied that the team figured they would need to build a separate model for each grade level. Working with AWS, the Benchmark Education team was amazed to discover that the model they built was flexible enough to support multiple grade levels. This gave them the confidence to release the assessment tool to grades 1-6 in Benchmark Advance. Launched at the start of the 2024 school year, the generative AI grading tool is already yielding promising results.

“We are seeing a reduction in ungraded responses,” says Carey. “We are at nine percent, down from the 10-12 percent [of ungraded responses] we saw historically. And about 73.5 percent of the grades provided by the LLM are being accepted by the teachers.” Notably, when teachers do modify the grade generated by the model, it’s to raise it by one point; many teachers cite personal student accommodations in their reasonings outside of the model’s rubric training.

Teachers have accepted over 280K of the generative AI tool’s suggested grades in just a few months. When assessment grading moves faster, students get important feedback quickly. They can improve their understanding of the material when it’s still fresh in their minds. Plus, educators benefit from an aggregated view of where students are struggling—providing an opportunity for timely reteaching or more personalized instruction for struggling students.

“The biggest benefit is that with Benchmark Education’s technology, learning and teaching is more efficient along with a degree of personalization,” says Lydia Neher, senior product acceleration manager at AWS. “Teachers can swiftly identify where students need help, provide feedback quicker, and reclaim time to focus on what matters most – student learning and growth.”

Moving forward, Benchmark Education Company is working to expand its generative AI model to its Spanish-language material and more detailed criteria by rubric domain, while developing more ways for teachers to share the model’s feedback with students faster. Benchmark Education’s continued innovation reflects its commitment to its mission: to help every student find lifelong inspiration, growth, and success through literacy, language, and knowledge development.

For more information about building AI tools for the classroom, contact the EdTech team or AWS in Higher Education.

AWS Public Sector Blog

Benchmark Education accelerates grading and boosts student feedback with generative AI on AWS

Assessing student achievement is valuable—but costs teachers time

Working backward to build a generative AI grading solution

Training AI data for accuracy and confidence to support student success

Keeping teachers in control

Saving teachers’ time to reinvest in students

Read related stories on the AWS Public Sector Blog:

Resources

Follow

Learn

Resources

Developers

Help