This page contains information about the course groups as well as information on mandatory assignments and exam project.
Place and time
The course runs from August 1, 2022 to August 23. See timeplan for teaching schedule.
- All lectures and lab/exercise classes on campus will take place in CSS 35-01-05.
Groups
You have two options of being assigned to a group. The first is that you report the group to us and where all members consent. The second option is that if we have not heard from you then you will be assigned to a group by us. We will communicate the groups to you on the first day. Details about the two options for assignment is found below.
- In order to pre-register a group only one member should write an email to Jonas Skjold Raaschou-Pedersen expressing this. The deadline for sending the information is Friday July 29 at 12.00. The email should only say that all members consented. In order to be valid all group members must be recipients of the email using their KU email. Groups of only two people have the option of joining another group of two new people as this course has a broad group of students. The groups will be used for handing in Assignments 1 and 2 and the final exam projects, see below. If you are too few people in the group we will reassign you to another group.
- If you do not have a pre-registered group or are two in a pre-registered group and explicitly note in the e-mail that you are looking for more members, we will randomly assign you to other students who performed similar in Assignment 0, see below.
Mandatory assignments
There are three mandatory assignments. Two of three must be passed in order to be eligible for participating in the exam. The first must be handed-in individually. The two other must be handed-in as groups where each group hands in the same copy.
- Assignment 0: basic Python, packages and data processing
- Assignment 1: collecting data and structuring data.
- Assignment 2: machine learning and text data.
The assignments will be available in Absalon and at the assignment page
Exam project
At the end of the course your group must hand in a independent exam project.
The content of the exam project is something that you choose. You and your group must find a subject, data, choose methods etc.
Grading
The grade for this course is exclusively determined by the project handed in. The project will be judged on a number of dimensions, these include:
- how the data was obtained (setting up new data collection);
- how the data was processed;
- how machine learning methods are applied and which methods are used;
- how results are explained (writing, figures, tables with model output etc.);
- the research question and its originality as well as how it is answered.
Some advice about the grading. It is essential that spend time on motivating your project and conveying your results. In addition, it is important that you spend time on calibrating and validating the models you work with rather than using as many models as possible. We emphasize that using machine learning is NOT necessary to make a great project, many of the best projects gain insights from the data without modelling.
Requirements for project
The exam projects have a number of requirements that must be met, these are: requirement
- Research question (you should discuss with TAs)
- Groups with two to four members
- Project formalia
- Project must consist of a report (.pdf file) and a documentation as Jupyter Notebook (.ipynb file).
- The report should be written like a brief research article (short literature review, references to methods, results etc.). The report is limited to the following maximum number of pages (normalsider).
- 1 member, 9 pages;
- 2 members, 12 pages;
- 3 members, 15 pages;
- 4 members, 18 pages.
- Note that 1 page (normalside) corresponds 2,400 and does not count figures, abstract, list of reference, frontpage, appendix.
- The report should contain your names. The names MUST show who contributed with writing which parts of the report. At most 20 pct. of the report can be written shared. If you fail to provide this the submission of your project may get rejected!
- Grading will be based on the report but process but data collection, computations etc. should be well documented in the supporting Jupyter Notebook.
Possible data sources
Students in previous years of Social Data Science have used a large variety of data sources including:
- news on DR (Danish Broadcasting Company) and the Danish newspaper Information
- price of cars for sales on bilbasen
- analyzing linguistic content on Twitter
- Airbnb pricing in Copenhagen
- Prediction of bitcoin prices from Reddit data.
If you are interested in working with one or more of these datasets or see the projects by the students who made them please contact us and we will put you in touch.