Replies: 3 comments 27 replies
-
I have collected some points about the currently available data that will need to be considered alongside schema requirements.
As far as point 3 goes, I have already identified two special cases the scraper needs to be able to handle. One being sections that do not reward integer amounts of credits, and the other being sections that have multiple discrete meeting schedules rather than a single consistent meeting schedule. Currently my plans for these two cases are as follows: Non-integer credit count:
Multiple meeting schedules:
Any suggestions for any of these points are welcome. |
Beta Was this translation helpful? Give feedback.
-
It would be useful to know how data will be stored in the database given a schema. I will include a proposed Course schema and a proposed Section schema here. Note that this is the representation in MongoDB (for example the ObjectId type). I will update with more detailed documentation once we all agree or modify my proposals. I have also included potential schemas for all component data-types of the Course and Section schemas. I wrote this up without much collaboration or oversight and only just now so please critique and give feedback on my proposals. Course = {
"_id": ObjectId,
"course_number": string,
"subject_prefix": string,
"title": string,
"description": string,
"school": string,
"credit_hours": string,
"class_level": string,
"activity_type": string,
"grading": string,
"internal_course_number": string,
"prerequisites": Collection,
"corequisites": Collection,
"lecture_contact_hours": string,
"laboratory_contact_hours": string,
"offering_frequency": string,
"attributes": Object,
}
Section = {
"_id": ObjectId,
"section_number": string,
"course_reference": ObjectId,
"section_corequisites": Collection,
"academic_session": AcademicSession,
"professors": Array<ObjectId>,
"teaching_assistants": Array<Assistant>,
"internal_class_number": string,
"instruction_mode": string,
"meetings": Array<Meeting>,
"syllabus_uri": string,
"grade_distribution": Array<number>,
"attributes": Object,
}
AcademicSession = {
"_id": ObjectId,
"name": string,
"start_date": Date,
"end_date": Date,
}
Assistant = {
"_id": ObjectId,
"first_name": string,
"last_name": string,
"role": string,
"email": string,
}
Meeting = {
"_id": ObjectId,
"start_date": Date,
"end_date": Date,
"meeting_days": Array<String>,
"start_time": Time,
"end_time": Time,
"modality": string,
"location": Location,
}
Professor = {
"_id": ObjectId,
"first_name": string,
"last_name": string,
"title": string,
"email": string,
"phone_number": string,
"office": Location,
"profile_uri": string,
"office_hours": Array<Meeting>,
}
Location = {
"_id": ObjectId,
"building": string,
"room": string,
"map_uri": string,
}
Date = {
"day": string,
"month": string,
"year": string,
}
Time = { // Not sure if UTC or CST
"_id": ObjectId,
"hour": number,
"minute": number,
"second": number,
}
Collection = {
"_id": ObjectId,
"name": string,
"abbreviation": string,
"type": string,
"requisite_type": string,
"required": number,
"total": number,
"options": Array<ObjectId | Collection>,
} |
Beta Was this translation helpful? Give feedback.
-
Overall I'm feeling pleased with the data collected by the scraper as of now and believe it will be able to support whatever schema plans we develop. However, I have noticed a large issue with the current Coursebook parsing, in that sometimes instructors and assistants are incorrectly listed multiple times. Here is a real example of such an error: {
"Name": "Lin Jia",
"Role": "Primary Instructor (50%)",
"EMail": "[email protected]"
},
{
"Name": "Stefanie Boyd",
"Role": "Primary Instructor (50%)",
"EMail": "[email protected]"
},
{
"Name": "Stefanie Boyd",
"Role": "Primary Instructor (50%)",
} I have already identified the cause of this error and plan on fixing it shortly. I would encourage everyone to skim through the sample data linked at the top of this discussion and bring up any comparable issues they notice so they can be resolved. |
Beta Was this translation helpful? Give feedback.
-
The purpose of this is to promote discussion of API data types based on sample data collected from the Coursebook scraper so far. The exact format of this sample data is completely subject to change per our decisions here, so this is simply to be treated as a sample of the kind of data the scraper makes available.
Sample data from the scraper is available here.
Beta Was this translation helpful? Give feedback.
All reactions