Intensive Social Media Weeks

One Week Intensive Course: May 25-29


room: 190

Monday 25.05

8:45 - tea, coffee
9:00-9:15     Opening
9:15-10:30   Lecture "Social Media Analysis: Introduction"
10:35-11:55 Lecture "Social Network Analysis: Introduction"
11:55-12:55 lunch
12:55-14:30 Lecture "Working with Community Question Answering Data: Tasks and Methods"
14:30-14:40 coffee break
14:40-16:00 Lecture "Cohesive Sub-communities and Their Identification" and "Measuring Centrality: Revealing the Relative Value of Node Position within a Network Structure"
16:00-16:15 coffee break
16:15-18:20 Workshop "Analysis and Interpretation of Online Social Ego-Networks"

Tuesday 26.05

8:45 - tea, coffee
9:00-10:20    Lecture "Graph-theoretic Network Formation Models: from Random Graphs to Power Laws"
10:25-11:45 Lecture "Introduction to natural language processing and DataMining" /nlp_itmo_may2015.pdf
11:45-12:45 lunch
12:45-14:05 Workshop "Introduction to natural language processing and DataMining" /ISMW-Workshop-NLPDM.pdf
14:05-14:20 coffee break
14:20-16:00 Group task
16:00 - 16:20 coffee break

16-30 cultural program

Wednesday 27.05

8:45 - tea, coffee
9:00-10:20   Lecture "Cascading Behavior in Social Networks"
10:25-11:45 Lecture "Social Influence and Influence Maximization Models"
11:45-12:45  lunch
12:45-14:05 Lecture "Social media data gathering" 
14:05-14:20 coffee break
14:20-15:40 Workshop "Social media data gathering" 
15:40-16:00 coffee break
16:00-18:00 Group task

Thursday 28.05

8:45 - tea, coffee
9:00-10:20  Lecture "Semantic technologies and Representation of the models in semantic databases"
10:25-11:45 Workshop  "Semantic technologies and Representation of the models in semantic databases"
11:45-12:45 lunch
12:45-14:05  Lecture "Social media analysis: An end-user’s perspective" 
14:05-14:20 coffee break
14:00-16:00 Group task

19:00 cultural program


Friday 29.05

8:45 - tea, coffee
9:00-10:20 Lecture "Social Network Analysis of Russian Protest Meetings in Twitter: Methodological Aspects and Descriptive Results"
10:30-11:20 Lecture "Business and Social Media"
11:45-12:45 lunch
12:45-14:05 Group task: presentations
14:05-14:20 coffee break
14:20-16:00 Group task: presentations
16:00-16:20 Сlosing

19:00 Dinner        


Jari Veijalainen

"Social Media Analysis: Introduction"

The lecture introduces social media analysis, background of the research, definitions of used terms, directions in social media research, goals, course structure, schedule, literature In addition to that, the lecture will give an overview on how computerized social networks have been modeled and what kind of aspects have been investigated. The latter include statistical properties of social networks, community discovery, evolution of social networks, social influence analysis, privacy issues, and data mining.


Alexander Nikolaev

"Social Network Analysis: Introduction"
A short excursion into the history of the science of social network analysis: the key names, case studies, and examples of real-world social networks.

"Cohesive Sub-communities and Their Identification"
An overview of the concepts and metrics for defining and measuring network cohesiveness and network clustering algorithms.

"Measuring Centrality: Revealing the Relative Value of Node Position within a Network Structure"
A recap of the most widely used centrality measures, highlighting the relationship between their utility and the network processes; "key players" problems.

"Analysis and Interpretation of Online Social Ego-Networks"
A guided tour about using NodeXL for the analysis of your own social networks.

"Graph-theoretic Network Formation Models: from Random Graphs to Power Laws"
A survey of modeling contributions leading to the understanding of the mechanisms underlying social community formation, motivated by the properties observed in large real-world graphs.

"Cascading Behavior in Social Networks"
Illustrations and examples of how information- and direct benefit-based cascades can be triggered in locally connected subpopulations, and the implications of these phenomena.

"Social Influence and Influence Maximization Models"
Most well-known models of influence propagation, its maximization, and the pitfalls of social influence research.


Olessia Koltsova

"Social media analysis: An end-user’s perspective"

In this talk, I will address the problem of the gap between existing approaches to social media analysis and the interests of its end users. The latter are usually social science researchers or practical media analysts who want the results be reliable, easily interpretable and reflect the “real” market or societal  situation. Computer scientists who develop methods of social media analysis are often interested in advancing mathematical algorithms or finding software solutions for specific sub-tasks. In this lecture, I shall:

·        observe typical analytical / research goals that end users have (public opinion and customer research network structure as social structure research, SNS-based real-world phenomena prediction etc)
·        look at typical problems of data collection and resulting data quality (sample bias, data sparcity, data noise, technical and legal restrictions on data collection),
·        address core methods of analysis (text and graph mining) and their limitations, given the end goals.


Pavel Braslavski

"Working with Community Question Answering Data: Tasks and Methods"

Community Question Answering (CQA) sites allow users to post questions on virtually any subject to other community members, answer questions, rate and comment answers, and gain points and badges. Yahoo!Answers, Quora, and Stackoverflow are examples of popular CQA platforms. CQA is a good complement to web search that allows for a more detailed description of information need, delivers more social and personalized search experience, suits users with low search engine pro ciency, etc. Vast amount of data collected by the CQA sites allow for re-using the “wisdom of crowds”.

In the lecture we will overview different CQA sites with particular emphasis on possibilities to obtain their data for analysis. Then, we will consider different problems in context of CQA data: content quality evaluation, expert finding, question categorization, answer retrieval, as well as using CQA data in 
web search. In conclusion of the lecture we will view at health-related CQA as an example of narrow-domain analysis.


Alexander Semenov
Social Network Analysis of Russian Protest Meetings in Twitter: Methodological Aspects and Descriptive Results

Twitter is one of the most popular research subject among the online social network sites due to the open nature of communication, where users post and exchange short text public statuses, often containing hyperlinks and pictures. Because Twitter is in the public domain, its data can be easily accessed. Besides its basic features, Twitter provides several additional functionalities: to reply to users, mention them in own statuses or share others’ tweets with own followers (called “re-tweet”). We used these features to construct several types of networks: “reply,” “retweet” and “mention,” and study their topological properties, positions and characteristics of key users, information propagation and discussion.

We collected the data before, during and after the political protest meetings in Moscow on December 24, 2011 on Prospect Sakharova and Poklonnaya Gora. Based on this data we analyzed and visualized discussion and diffusion networks, built using the Twitter functions "reply" and "retweet" respectively. In the presentation we’ll particularly emphasize the main steps of data acquisition, extraction and transformation and discuss caveats of each step and biases, which these errors can cause on social network analysis metrics and its interpretation.


Fedor Kozlov 

"Semantic technologies and Representation of the models in semantic databases"

Recent years have seen the automatic construction of very large knowledge bases, such as DBpedia, Freebase, Wikidata, and Yago, as well as industrial knowledge graphs at Google, Microsoft, Facebook, Walmart, and others. Some of these knowledge bases contain many millions of entities, organized into thousands of fine-grained semantic classes, and billions of facts that capture entity attributes or relationships between entities. Linked Data is a set of best practices for publishing structured data on the Web which focuses on setting hyperlinks between entities provided by different knowledge bases and web servers. These hyperlinks connect the data from all servers into a single global data graph – the Web of Linked Data. Software developers can use the Web of Linked Data to aggregate, enrich and harmonise the data from different domains and sources in context of single application.  

Participants will obtain the basic knowledge (standards, formats, technologies and libraries) for working with Semantic and Linked Data.


Alexander Semenov

Overview of possibilities for collection of data from the Internet: introduction to social media API, and web crawling.
The lecture would show how to get the data from such social media sites as Twitter,, and Facebook, and explain possibilities and limitations of data collection. Processing of collected data would be also discussed.