Student Modeling

Student modeling is crucial for an intelligent learning environment to be able to adapt to the needs and knowledge of individual students [5,20]. There are many techniques for generating student models. Most of them are computationally complex and expensive, for example, the Bayesian networks [20,21,29], the Dempster-Shafer theory of evidence [4], and the fuzzy logic approach [14]. Other techniques although computationally cheap, such as the model tracing approach [1], can only record what a student knows but not the student's behavior and characteristics.

The difficulties in applying Bayesian modeling are the high cost in knowledge acquisition and in the time to update the student model. The inference in Bayesian belief networks is NP-hard, and the model requires prior probabilities. In developing practical and efficient Bayesian methods, we trade complexity of knowledge representation and depth of modeling for linear-time belief updating and a small number of model parameters.

In IDEAL, a student model is inferred from the performance data using a Bayesian belief network. The measure of how well a skill is learned is represented as a probability distribution over skill levels, such as novice, beginning, intermediate, advanced, and expert. Under the assumption that the performance on questions is independently distributed, to model one skill with

skill levels and

questions for each in a Bayesian network, we need

probabilities plus the

prior probabilities of the skill levels to calculate the probability distribution of skill levels given all the question scores. To model

skills with the same skill levels for each, we need

probabilities, which is too large for non-trivial real-world applications.

To reduce the number of probabilities required and improve the efficiency of the algorithm in IDEAL, questions of similar difficulty are grouped into categories associated with the conditional probabilities of answering each set of questions correctly to the possible skill levels. Now, only

probabilities are required, where

is the number of categories.

The probabilities are further reduced by matching the question categories to the skill levels. For

skill levels, only

question categories are required. If a student has reached a certain skill level, then he should be able to answer all questions at that skill level and all easier questions. Considering that students sometimes miss questions that they should know or may guess the right answer, the probability of a slip

, e.g., 0.1, and a probability of a lucky guess

, e.g., 0.2, are used in the conditional probabilities for correct answers to questions of increasing difficulty. By using these two probabilities, a simple way to set the conditional probabilities for 5 skill levels is as follows:

Question	Skill Levels
Categories	Novice	Beginning	Intermediate	Advanced	Expert
Beginning
Intermediate
Advanced
Expert

Based on this model, the probability distribution of the skill levels given performance data can be determined in linear time. Based on the Bayesian theory and the assumption that the performance data are independent, the conditional probability of skill levels is as follows:

$\displaystyle p(X=x_j \vert \vec{e} )$	$\textstyle =$	$\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) * p(\vec{e} \vert X = x_j)$
	$\textstyle =$	$\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) * \prod_{i=1}^{n} p(e_i \vert X = x_j)$
	$\textstyle =$	$\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) *$
		$\displaystyle (1-s)^{\sum_{i=1}^j e_{i+}} * s^{\sum_{i=1}^j e_{i-}} *$
		$\displaystyle g^{\sum_{i=j+1}^n e_{i+}} * (1-g)^{\sum_{i=j+1}^n e_{i-}}$	(1)

The advantages of this model are (1) questions can be added, dropped, or moved between categories with minimal overhead; (2) the model incorporates uncertainty and allows for both slips and guesses in student performance; (3) the time complexity is linear in the number of data items, whereas updating belief networks is in general NP-hard; and (4) only a small number of parameters are required. The restrictions of the model are that only binary-valued evidence is modeled and only one skill can be modeled at a time.