AI Has Changed What Student Work Can Prove—So What Are We Really Assessing?

Androy Bruney
11 hours ago
6 min read

I recently read about Princeton University’s decision to begin proctoring all in-person examinations, ending its long tradition of students sitting exams without professors or invigilators present.

At first, it sounded like another story about AI-assisted cheating and a university trying to regain control.

But that was not the part that stayed with me.

What stayed with me was the possibility that AI had not created the weakness in the system. It had simply exposed it.

Students were cheating long before generative AI became widely available. AI has made misconduct quieter, less visible, and considerably more difficult for another student—or even a teacher—to detect.

That distinction matters as it shifts the conversation away from:

How do we stop students from using AI?

And toward a more uncomfortable question:

Why are we still assessing students in ways that become unreliable the moment they gain access to powerful tools?

From Measuring Answers to Evaluating Student Thinking

Princeton’s return to proctored examinations reflects a growing concern across education: AI is changing what a completed piece of student work can actually prove.

Schools can respond by adding monitoring software, restricting devices, requiring handwritten work, or supervising students more closely. Some of those measures may be necessary when students need to demonstrate independent knowledge and fluency.

But stronger supervision addresses only one question:

Did the student complete this work without unauthorized assistance?

It does not answer the more important one:

What does this assessment reveal about how the student thinks?

Many traditional assessments were built around a familiar process:

Give students a question. Ask them to produce an answer. Treat the answer as evidence of learning.

For a long time, that seemed reasonable because producing a strong answer usually required students to do much of the intellectual work themselves. They had to find information, understand it, select what was relevant, organize their ideas, and communicate a response.

AI has disrupted that connection.

It can now retrieve information, structure an argument, explain a concept, solve a familiar problem, and present the result in polished academic language. The student may still make choices along the way, but the finished product no longer shows us clearly which parts of the thinking belong to the student and which parts were supplied by the tool.

The problem, then, is not simply unauthorized AI use.

The deeper problem is that many assessments were designed to evaluate answer production, and answer production is no longer dependable evidence of independent thought.

I have seen signs of this tension in my own teaching.

In one semester, several students earned 90% or higher on coursework completed throughout the term. Yet after the final examination grades were entered, some of those same students dropped to a C average.

That discrepancy does not prove misconduct. Students may perform differently because of:

test anxiety,
time pressure,
exam design,
or the difference between supported and independent work.

But the size of the gap raised a question I could not ignore:

What was the coursework actually measuring?

Perhaps the coursework showed that students could produce strong responses when they had time, resources, feedback, and outside support.

The examination, by contrast, measured what they could retrieve and apply independently under controlled conditions.

Neither form of assessment necessarily gave a complete picture.

The issue was not simply that one result was trustworthy and the other was not. It was that the finished coursework had appeared to demonstrate a level of understanding that some students could not later access, explain, or apply independently.

A polished answer may show that a student—or a student working with a tool—can produce a successful product. It does not necessarily reveal whether the student can:

explain why the answer is reasonable
identify the evidence that supports it
recognize its limitations
adapt the reasoning when conditions change
defend the choices made
detect when the answer is wrong

This does not make final products worthless. Essays, reports, projects, and lab conclusions still matter.

But the product can no longer carry the full burden of proving learning.

If we want reliable evidence of understanding, assessment must reveal more than what students can produce. It must reveal what they can notice, question, decide, justify, and revise.

The central question is no longer simply:

Can the student produce a correct answer?

It is:

What can the student do intellectually that the answer alone cannot show us?

Why Recall and Background Knowledge Still Matter in the Age of AI

Traditional assessment has often treated recall as one of the clearest forms of evidence that learning has occurred.

More recently, however, I have heard recall spoken about almost as though it were an outdated or inferior skill—something we should move beyond now that students can search for information or ask AI.

I think that goes too far.

There is an important difference between asking students to memorize disconnected facts for the purpose of reproducing them on a test and helping them build knowledge they can retrieve fluently when they need it.

Recall is not the opposite of deeper thinking. In many cases, it is what makes deeper thinking possible.

Research on retrieval practice has repeatedly shown that actively recalling information strengthens long-term learning more effectively than simply rereading or reviewing it. Retrieval does not only help students remember facts for an examination. It can also help them retain knowledge and access it more flexibly in later situations.

That matters because students cannot evaluate every new idea from first principles.

A chemistry student deciding whether an explanation of reaction rate is plausible needs to know that increasing temperature affects particle motion and the proportion of collisions with sufficient energy. A student evaluating a calculation needs enough familiarity with units, significant figures, and expected values to notice that something is wrong.

They cannot pause at every step to look up every foundational idea. Some knowledge must already be available to think with.

The same is true beyond science.

A student cannot meaningfully evaluate a historical claim without enough knowledge of the period to recognize missing context. A student cannot judge whether a graph is misleading without understanding scale, variables, and proportional relationships. A student cannot assess the credibility of an AI-generated response if they know too little about the subject to detect an error.

This is one of the dangers of treating access to information as equivalent to possessing knowledge.

A student may be able to find a definition, formula, date, or explanation in seconds. But locating information is not the same as having enough knowledge to recognize what is relevant, connect it to other ideas, or determine whether it deserves to be trusted.

Background knowledge also gives students something against which new information can be tested.

When AI produces an answer that is confidently wrong, incomplete, or misleading, the student with stronger knowledge is more likely to notice. The student without that foundation may have no reason to question it.

Why Recall Alone Is Not Enough to Demonstrate Understanding

So the argument is not that recall should disappear from assessment.

Students still need opportunities to demonstrate that essential knowledge is secure, retrievable, and independently available to them. There are concepts, vocabulary, procedures, and relationships students should not have to outsource every time they encounter a problem.

But recall should not be mistaken for the whole of understanding.

A student may remember a definition without recognizing when it applies. They may reproduce a formula without knowing whether the result is reasonable. They may recall a scientific model without understanding its limitations.

Recall is therefore necessary, but it is not sufficient.

But in a world of abundant information, Students also need judgment.

They need to know what to ask, where to look, which source to trust, whether a response is plausible, what evidence is missing, and when a confident answer should be challenged.

They must be able to distinguish between an answer that is merely polished and one that is genuinely well reasoned.

Knowledge still matters.

But knowledge should increasingly become the material students think with, rather than the only product we assess.

The most valuable assessments now ask students to make judgments that cannot be reduced to copying, searching, or generating a polished response.

Students need to decide:

which evidence matters
how strongly the evidence supports a conclusion
where uncertainty remains
which assumptions are operating
what additional information is needed
when an explanation should be revised

The goal is not merely to make assignments harder for AI to complete.

The goal is to make student thinking indispensable.

Before designing an assessment, perhaps we need to ask not only:

What should students know?

But also:

What should they be able to decide, interpret, question, or defend using that knowledge?

If producing a polished answer no longer gives us sufficient evidence of learning, then our assessments must reveal something AI cannot demonstrate on a student’s behalf: how that student evaluates evidence, makes decisions, responds to uncertainty, and revises an idea.

In the next post, I will explore seven practical ways teachers can design assessments that make this thinking visible.

CLICK HERE TO SHOP SCIENCE RESOURCES