next up previous
Next: Curriculum Sequencing Up: An Intelligent Distributed Environment Previous: Community Interaction

Student Modeling

Student modeling is crucial for an intelligent learning environment to be able to adapt to the needs and knowledge of individual students [5,20]. There are many techniques for generating student models. Most of them are computationally complex and expensive, for example, the Bayesian networks [20,21,29], the Dempster-Shafer theory of evidence [4], and the fuzzy logic approach [14]. Other techniques although computationally cheap, such as the model tracing approach [1], can only record what a student knows but not the student's behavior and characteristics.

The difficulties in applying Bayesian modeling are the high cost in knowledge acquisition and in the time to update the student model. The inference in Bayesian belief networks is NP-hard, and the model requires prior probabilities. In developing practical and efficient Bayesian methods, we trade complexity of knowledge representation and depth of modeling for linear-time belief updating and a small number of model parameters.

In IDEAL, a student model is inferred from the performance data using a Bayesian belief network. The measure of how well a skill is learned is represented as a probability distribution over skill levels, such as novice, beginning, intermediate, advanced, and expert. Under the assumption that the performance on questions is independently distributed, to model one skill with $n$ skill levels and $q$ questions for each in a Bayesian network, we need $nq$ probabilities plus the $n$ prior probabilities of the skill levels to calculate the probability distribution of skill levels given all the question scores. To model $k$ skills with the same skill levels for each, we need $knq$ probabilities, which is too large for non-trivial real-world applications.

To reduce the number of probabilities required and improve the efficiency of the algorithm in IDEAL, questions of similar difficulty are grouped into categories associated with the conditional probabilities of answering each set of questions correctly to the possible skill levels. Now, only $knc$ probabilities are required, where $c$ is the number of categories.

The probabilities are further reduced by matching the question categories to the skill levels. For $n+1$ skill levels, only $n$ question categories are required. If a student has reached a certain skill level, then he should be able to answer all questions at that skill level and all easier questions. Considering that students sometimes miss questions that they should know or may guess the right answer, the probability of a slip $s$, e.g., 0.1, and a probability of a lucky guess $g$, e.g., 0.2, are used in the conditional probabilities for correct answers to questions of increasing difficulty. By using these two probabilities, a simple way to set the conditional probabilities for 5 skill levels is as follows:

Question Skill Levels
Categories Novice Beginning Intermediate Advanced Expert
Beginning $g$ $1-s$ $1-s$ $1-s$ $1-s$
Intermediate $g$ $g$ $1-s$ $1-s$ $1-s$
Advanced $g$ $g$ $g$ $1-s$ $1-s$
Expert $g$ $g$ $g$ $g$ $1-s$
Now, the total number of probabilities required is reduced to the prior probabilities for the skill levels plus the probabilities $s$ and $g$.

Based on this model, the probability distribution of the skill levels given performance data can be determined in linear time. Based on the Bayesian theory and the assumption that the performance data are independent, the conditional probability of skill levels is as follows:

$\displaystyle p(X=x_j \vert \vec{e} )$ $\textstyle =$ $\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) * p(\vec{e} \vert X = x_j)$  
  $\textstyle =$ $\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) * \prod_{i=1}^{n} p(e_i \vert X = x_j)$  
  $\textstyle =$ $\displaystyle \frac{1}{p(\vec{e})} * p(X = x_j) *$  
    $\displaystyle (1-s)^{\sum_{i=1}^j e_{i+}} * s^{\sum_{i=1}^j e_{i-}} *$  
    $\displaystyle g^{\sum_{i=j+1}^n e_{i+}} * (1-g)^{\sum_{i=j+1}^n e_{i-}}$ (1)

where $X$ represents the skill levels; $\vec{e}$ is the evidence vector of $n$ elements, in which each element $e_i$ contains two numbers, $e_{i+}$ and $e_{i-}$, corresponding to the number of correct and incorrect answers to questions at difficulty level $i$, respectively.

The advantages of this model are (1) questions can be added, dropped, or moved between categories with minimal overhead; (2) the model incorporates uncertainty and allows for both slips and guesses in student performance; (3) the time complexity is linear in the number of data items, whereas updating belief networks is in general NP-hard; and (4) only a small number of parameters are required. The restrictions of the model are that only binary-valued evidence is modeled and only one skill can be modeled at a time.


next up previous
Next: Curriculum Sequencing Up: An Intelligent Distributed Environment Previous: Community Interaction
2001-02-13