EdX discussion data is stored as collections of JSON documents in a MongoDB database. MongoDB is a document-oriented, NoSQL database system. Documentation can be found at the mongodb web site.
In the data package, discussion data is delivered in a .mongo file, identified
by organization and course, in the format
{org}-{course}-{run}-{site}.mongo.
The primary collection that holds all of the discussion posts written by users is “contents”. Two different types of objects are stored, representing the three levels of interactions that users can have in a discussion.
CommentThread represents the first level of interaction: a post that
opens a new thread, often a student question of some sort.Comment represents both the second and third levels of interaction: a
response made directly to the conversation started by a CommentThread is
a Comment. Any further contributions made to a specific response are also
in Comment objects.A sample of the field/value pairs that are in the .mongo file, and descriptions of the attributes that these two types of objects share and that are specific to each type, follow.
In addition to these collections, events are also emitted to track specific user activities. For more information, see Discussion Forum Events.
Two sample rows, or JSON documents, from a .mongo file of discussion data
follow.
The JSON documents that include discussion data are delivered in a compact, machine-readable format that can be difficult to read at a glance.
{ "_id" : { "$oid" : "50f1dd4ae05f6d2600000001" }, "_type" : "CommentThread",
"anonymous" :false, "anonymous_to_peers" : false, "at_position_list" : [],
"author_id" : "NNNNNNN","author_username" : "AAAAAAAAAA", "body" : "Welcome to
the edX101 forum!\n\nThis forum willbe regularly monitored by edX. Please post
your questions and comments here. When asking aquestion, don't forget to
search the forum to check whether your question has already
beenanswered.\n\n", "closed" : false, "comment_count" : 0, "commentable_id" :
"i4x-edX-edX101-course-How_to_Create_an_edX_Course", "course_id" :
"edX/edX101/How_to_Create_an_edX_Course","created_at" : { "$date" :
1358028106904 }, "last_activity_at" : { "$date" : 1358134464424 },"tags_array"
: [], "thread_type": "discussion", "title" : "Welcome to the edX101 forum!",
"updated_at" : { "$date" :1358134453862 }, "votes" : { "count" : 1, "down" :
[], "down_count" : 0, "point" : 1, "up" :[ "48" ], "up_count" : 1 } }
If you use a JSON formatter to “pretty print” this document, a version that is more readable is produced.
{
"_id": {
"$oid": "50f1dd4ae05f6d2600000001"
},
"_type": "CommentThread",
"anonymous": false,
"anonymous_to_peers": false,
"at_position_list": [
],
"author_id": "NNNNNNN",
"author_username": "AAAAAAAAAA",
"body": "Welcome to the edX101 forum!\n\nThis forum will be regularly
monitored by edX. Please post your questions and comments here. When
asking a question, don't forget to search the forum to check whether
your question has already been answered.\n\n",
"closed": false,
"comment_count": 0,
"commentable_id": "i4x-edX-edX101-course-How_to_Create_an_edX_Course",
"course_id": "edX\/edX101\/How_to_Create_an_edX_Course",
"created_at": {
"$date": 1358028106904
},
"last_activity_at": {
"$date": 1358134464424
},
"tags_array": [
],
"thread_type": "discussion",
"title": "Welcome to the edX101 forum!",
"updated_at": {
"$date": 1358134453862
},
"votes": {
"count": 1,
"down": [
],
"down_count": 0,
"point": 1,
"up": [
"48"
],
"up_count": 1
}
}
{ "_id" : { "$oid" : "52e54fdd801eb74c33000070" }, "votes" : { "up" : [],
"down" : [], "up_count" : 0, "down_count" : 0, "count" : 0, "point" : 0 },
"visible" : true, "abuse_flaggers" : [], "historical_abuse_flaggers" : [],
"parent_ids" : [], "at_position_list" : [], "body" : "I'm hoping this
Demonstration course will help me figure out how to take the course I enrolled
in. I am just auditing the course, but I want to benefit from it as much as
possible, as I am extremely interested in it.\n", "course_id" :
"edX/DemoX/Demo_Course", "_type" : "Comment", "endorsed" : true, "endorsement"
: { "user_id" : "9", "time" : ISODate("2014-08-29T15:11:49.442Z") },
"anonymous" : false, "anonymous_to_peers" : false, "author_id" : "NNNNNNN",
"comment_thread_id" : { "$oid" : "52e4e880c0df1fa59600004d" },
"author_username" : "AAAAAAAAAA", "sk" : "52e54fdd801eb74c33000070",
updated_at" : { "$date" : 1390759901966 }, "created_at" : { "$date" :
1390759901966 } }
When pretty printed, this comment has the following format.
{
"_id": {
"$oid": "52e54fdd801eb74c33000070"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
],
"at_position_list": [
],
"body": "I'm hoping this Demonstration course will help me figure out how
to take the course I enrolled in. I am just auditing the course, but I
want to benefit from it as much as possible, as I am extremely interested
in it.\n",
"course_id": "edX\/DemoX\/Demo_Course",
"_type": "Comment",
"endorsed": true,
"endorsement": {
"user_id": "9",
"time": {
"$date": 1390759911966
}
}
"anonymous": false,
"anonymous_to_peers": false,
"author_id": "NNNNNNN",
"comment_thread_id": {
"$oid": "52e4e880c0df1fa59600004d"
},
"author_username": "AAAAAAAAAA",
"sk": "52e54fdd801eb74c33000070",
"updated_at": {
"$date": 1390759901966
},
"created_at": {
"$date": 1390759901966
}
}
The following fields are specific to CommentThread objects. Each thread in
the discussion forums is represented by one CommentThread.
If true, this thread was closed by a discussion forum moderator or admin.
The number of comment replies in this thread. This includes all responses and replies, but does not include the original post that started the thread. In this example, the
comment_countfor the initialCommentThreadis 4.
- CommentThread: “What’s a good breakfast?”
- Comment: “Just eat cereal!”
- Comment: “Try a Loco Moco, it’s amazing!” * Comment: “A Loco Moco? Only if you want a heart attack!” * Comment: “But it’s worth it! Just get a spam musubi on the side.”
A course team can attach a discussion to any piece of content in the course, or to top level categories like “General” and “Troubleshooting”. When the discussion is a top level category it is specified in the course’s policy file, and thecommentable_iduses the formati4x-{org}-{course}-{run}-{name}. When the discussion is a specific component in the course, thecommentable_ididentifies that component; for example, “d9f970a42067413cbb633f81cfb12604”.
Timestamp in UTC indicating the last time there was activity in the thread (new posts, edits, etc). Closing the thread does not affect the value in this field.
No longer used.
History: Intended to be a list of user definable tags.
Title of the thread. UTF-8 string.
Identifies the type of post as a “question” or “discussion”.
History: Added 4 Sep 2014.
The following fields are specific to Comment objects. A Comment is
either a response to a CommentThread (such as an answer to the question),
or a reply to another Comment (a comment about somebody’s answer).
History: In earlier versions of the edX platform, Comment replies could
nest much more deeply. However, edX later restricted participation to three
levels (post, response, comment), similar to the practice on StackOverflow.
Not used.
Records the user ID of each user who selects the Report Misuse flag for aCommentin the user interface. Stores an array of user IDs if more than one user flags theComment. This is empty if no users flag theComment.
If a discussion moderator removes the Report Misuse flag from aComment, all user IDs are removed from theabuse_flaggersfield and then written to this field.
Boolean value. True if a forum moderator has marked this response to a
CommentThreadwith athread_typeof “discussion” as a valuable contribution, or if a forum moderator or the originator of aCommentThreadwith athread_typeof “question” has marked this response as the correct answer.The
endorsedfield is present for comments that are made as replies to responses, but in these cases the value is always false: the user interface does not offer a way to endorse comments.
Contains
timeanduser_idfields for the date and time that this response to a post was endorsed and the numeric user ID (fromauth_user.id) of the person who endorsed it.History: Added 4 Sep 2014.
Identifies theCommentThreadthat theCommentis a part of.
Applies only to comments made to a response. In the example given for
comment_countabove, “A Loco Moco? Only if you want a heart attack!” is a comment that was made to the response, “Try a Loco Moco, it’s amazing!”The
parent_idis the_idof the response-levelCommentthat thisCommentis a reply to. Note that this field is only present in aCommentthat is a reply to anotherComment; it does not appear in aCommentthat is a reply to aCommentThread.
Theparent_idsfield appears in allCommentobjects, and contains the_idof all ancestor comments. Since the UI now prevents comments from being nested more than one layer deep, it will only ever have at most one element in it. If aCommenthas no parent, it is an empty list.
A randomly generated number that drives a sorted index to improve online performance.