Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 17 – Friday 21 February 2019, 14:00 – 17:30 (finishing slightly earlier on Friday)
15 hours over five days
This course provides a basis in Python programming and its application in particular to online social media data.
Python is one of the most popular and versatile script languages, and has a large user community. It has become increasingly popular among social scientists because of its attractive features: ease to learn, flexibility in handling a massive dataset, and fast calculation.
A large set of libraries helps users solve complex problems – making Python particularly attractive for those who need to handle massive amounts of diversely structured data collected online.
The course covers:
The course involves hands-on exercises in collecting, managing, and analysing data using Python.
Tasks for ECTS Credits
2 credits (pass/fail grade): Attend at least 90% of the course, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.
3 credits (to be graded): As above, plus complete small daily programming tasks. These must be submitted prior to the course the following morning.
4 credits (to be graded): As above, plus one of the tasks below, due within a week of the end of the course:
A: conduct a small independent project, collecting and analysing social media data, and summarising the results in a short paper
B: solve a programming task.
Taehee Kim is a postdoctoral researcher at Carl von Ossietzky University in Oldenburg.
Her research interests include political behaviour, computational social science methods, network analysis, and Japanese politics.
Nowadays, very large and diverse kinds of data are becoming available to researchers. Online social media data in particular has great potential to provide a new approach to social science questions.
However, these kinds of data have diverse data structures, which often differ from traditional social science data. Some are provided by a structured way through application program interfaces (API): e.g. Twitter and Facebook; others could be semi-structured data such as web pages. Moreover, those data could be in different formats and the available data size much larger than before.
Although widely used software packages such as R, STATA, and Matlab are practical for statistical analysis, they are of only limited use for gathering, transforming, managing, and analysing new, massive and diversely structured types of data.
As an alternative to these packages, many scholars have begun to use Python, a popular, versatile script language with a large user community. Python has become popular because it is open source, easy to learn (even for beginners), and allows researchers to handle massive datasets quickly.
A large, rapidly expanding set of libraries helps users solve complex problems with ease. These include Tensorflow and Keras: newly developed libraries for deep learning.
Familiarity with Python language opens up new possibilities for conducting your research in a more efficient way.
The course covers:
The course will include concrete examples of how to collect, manage and analyse social media data, especially Twitter.
Course structure
First, I introduce the basic concepts of programming. You will learn types of data, operators, conditions, loops, functions, data structure, and objected-oriented programming.
Then you will learn how to implement the programming in Python language. I will set programming tasks for you to solve, to teach you how to program in an efficient way.
After basic programming, I will introduce a couple of methods for obtaining social media data, such as scraping web pages, and using API – in particular Twitter’s. I will also introduce useful Python libraries for data collection, such as urllib and Beautiful Soup.
I will demonstrate several analytical methods for text data:
You will learn basic regular expressions to handle text data and Python libraries for the analysis such as Numpy, Pandas, NLTK, scikit-learn etc.
I will ask you to submit a small programming assignment every day. The task will be directly related to the content of the corresponding day and should take one or two hours maximum.
Required literature
Lubanovic, Bill. 2014
Introducing Python: Modern Computing in Simple Packages
O’Reilly Media
Mitchell, Ryan. 2015
Web Scraping with Python: Collecting Data from the Modern Web
O’Reilly Media
Installation and setup
Please install and configure Python and PyCharm on your laptop before the course starts, using these step-by-step instructions.
Experience in other languages
You should have some experience with basic programming/data analysis in other languages, e.g. R, Matlab, STATA. In other words, you should be able to write basic codes in the corresponding languages: e.g. assigning a value to a variable, writing for loop, if condition.
If you do not fulfil the above requirements
If you have problems following the installation instructions, or do not have experience in other languages, take the course WA108, Basics of Programming in Python.
Day | Topic | Details |
---|---|---|
Monday | Introduction of Python and Basic Principles of Programming | |
Tuesday | Programming in Python | |
Wednesday | Collecting Online Data: Utilising APIs and Web Scraping | |
Thursday | Analysing Data: Basic Statistics, Visualisation | |
Friday | Analysing Data: Text Analysis and Machine Learning |
Day | Readings |
---|---|
Monday |
Distributed materials during the course, Lubanovic (2014) ch 1-2 |
Tuesday |
Distributed materials during the course, Lubanovic (2014) ch 3-7 |
Wednesday |
Distributed materials during the course, Mitchell (2015) ch 1-4 |
Thursday |
Distributed materials during the course |
Friday |
Distributed materials during the course |
Please prepare the following free, open-source environments on your laptop using these step-by-step instructions.
Python 3: version > 3.5.
Among several possibilities, I recommend using Anaconda to install Python.
We will use PyCharm (Community version) as a Python editor.
Please also apply for a Twitter developer account. When you get approved, you can create Twitter apps, which you need to access Twitter API. The reviewing process can take anything from a couple of days to weeks. If you do not get approved by the time course starts, I can give you an access during the course. You will be given further instructions after you register.
Please bring your own laptop with Python and PyCharm installed, as described in the software section.
Swaroop, C. H. 2013
A Byte of Python
Raschka, Sebastian and Vahid Mirjalili. 2017
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2nd ed
PACKT Publishing
Jürgens, Pascal and Andreas Jungherr. 2016
A Tutorial for Using Twitter Data in the Social Sciences: Data Collection, Preparation, and Analysis
Russell, Matthew A. 2013
Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More 2nd ed
Sebastopol, CA: O’Reilly Media
Gutted, John V. 2013
Introduction to Computation and Programming Using Python: Revised and Expanded Edition
The MIT Press
Summer School
Introduction to R
Winter School
Basics of Programming in Python
Introduction to R