This Monday all of the speakers that were present at PASS Summit received their session evaluations. This is a very special time for a conference speaker because through the feedback inside the evaluations we get an idea on how we performed, what things we can improve and whether or not the audience got some kinds of value from the session.
So, while we are discussing feedback, let’s analyse the feedback I got in depth using some sentiment analysis!
I like to be open about the evaluations I get, so I’m going to share everything with you here, including the numbers and some of the written feedback. Before we get started though, a big thanks to everyone who took the time to fill in the evaluation form! Like I mentioned earlier, feedback is extremely important for us speakers so thanks for providing it!
- 223 Attendees;
- 51 attendees fill in the feedback form which translates to roughly 23%;
- 20 written feedback items on the room and event logistics;
- 26 written feedback items on the session and the speaker.
|Evaluation question||Avg. rating|
|Rate the value of the session content.||4,57|
|How useful and relevant is the session content to your job/career?||4,41|
|How well did the session’s track, audience, title, abstract, and level align with what was presented?||4,63|
|Rate the speaker’s knowledge of the subject matter.||4,76|
|Rate the overall presentation and delivery of the session content.||4,61|
|Rate the balance of educational content versus that of sales, marketing, and promotional subject matter.||4,67|
If we look at a graph of the average rating (of all the questions above) per attendee that filled in the evaluation form we can conclude that most attendees are pretty happy with the session:
Personally, I am very happy with these results. The number of attendees that filled in the evaluation is relatively low, but that is something that is not uncommon for technical events. Many attendees that did fill in the evaluation also took the time to write down feedback, this is more valuable to me than the numbers since written feedback simply tells far more than a single number.
With that in mind, let’s analyse the written feedback using sentiment analysis to see if we can extract some emotions from the written feedback that was supplied!
Feedback Sentiment Analysis
I performed the entire sentiment analysis in R using a library called tidytext. I only focused on the written feedback about the session/speaker and not the feedback on the room or event logistics. In general, most people wrote about the room/event logistics that the room was too small (standing room only since all the seats were filled), there were some AV issues and apparently two attendees wrote down that the room had a weird smell. About that last one….it wasn’t me…
To kick of the sentiment analysis, I downloaded the .csv file of my session evaluations. If you are a speaker that spoke at PASS, you can repeat the steps in this article to perform the sentiment analysis on your evaluation results as well!
Now first of, the .csv file is quite a mess. It’s not formatted very logically and is mostly a dump of the visual representation as it is presented in the PASS Speaker Portal. The first steps we need to do is the clean up the data from the .csv and extract what we need for further processing. The code below loads the required libraries inside R, imports the .csv file and processes the information inside the .csv.
# import all the libraries we are going to need library(dplyr) library(tidyr) library(tidytext) library(stringr) library(ggplot2) library(plotly) # import the evaluation results raw_feedback_file <- read.csv("~/Downloads/SessionEval.csv", stringsAsFactors = FALSE, na.strings = c("", "NA")) # Only extraction the questions that allowed text to be entered free_text_comments <- raw_feedback_file %>% select(Textbox271, Comment1) # Remove all the empty / NA rows free_text_comments_cleaned <- free_text_comments %>% na.omit(free_text_comments) # Give the columns a useful name names(free_text_comments_cleaned) <- c('Question', 'Answer') # Fill the type column depending on the question, either Room or Session comments free_text_comments_cleaned$Type <- ifelse(grepl("Event", free_text_comments_cleaned$Question), "Room", "Session") # Add a feedbackup nr to each one free_text_comments_cleaned$ID <- seq.int(nrow(free_text_comments_cleaned)) # Create two new dataframes to seperate the Room and Session comments comments_room <- filter(free_text_comments_cleaned, Type == 'Room') comments_session <- filter(free_text_comments_cleaned, Type == 'Session')
After running the code above, we are left with two data frames: comments_room and comments_session. As mentioned earlier we are going to focus on the comments_session data frame that has the answers to the Session or speaker comments question together with a unique ID for each answer so we can identify them later. The image below shows a number of the rows inside the data frame:
Now that we have our dataset all nice and clean, we can actually start with the sentiment analysis.
The first thing we need to take care of is “tokenization” of the sentences. This process splits the sentences into individual words. After that we have to handle the removal of so-called “stop” words. These are words like the, is, which, etc. These words do not represent any kinds of sentiment so they are useless for the analysis which is why we remove them from our feedback.
# Let's look at the sentiment of the session feedback # Tokenize and remove stop words session_sentiment <- comments_session %>% unnest_tokens(word, Answer) session_sentiment <- session_sentiment %>% anti_join(stop_words, by = c("word" = "word"))
When this process is done we are left with a data frame that split the sentence into individual words and removed the stop words while maintaining the rest of the information about the sentence (like the ID of the feedback item). The image below shows a small sample of the data frame:
Now that we have everything split up we are truly ready for some sentiment analysis. In the R code below I join our data frame with words to the “AFINN” lexicon. A lexicon is basically a list with words, and in this case, a certain sentiment rating given to the word. For sentiment analysis we usually use one, or multiple, of the following three lexicons:
The AFINN lexicon gives a sentiment rating to a word between -5 for very negative and +5 for a very positive word.
The BING lexicon is a bit simpler than the AFINN lexicon. Instead of giving a sentiment rating on a certain scale it uses a binary rating. -1 is a word with a negative sentiment, +1 a word with a positive sentiment.
The NRC lexicon does not give sentiment ratings to a word but rather places words into categories like disgust, fear, joy, surprise, etc. We will have an example of using the NRC lexicon later in this article.
For now, let’s see which words inside the feedback have the largest contribution to a certain sentiment rating. The code below will generate a plot for each of the six sentiment ratings a word can receive using the AFINN lexicon:
session_sentiment_words <- session_sentiment %>% inner_join(get_sentiments("afinn")) %>% count(word, score, sort = TRUE) %>% ungroup() session_sentiment_words %>% group_by(score) %>% top_n(10) %>% ungroup() %>% mutate(word = reorder(word, n)) %>% ggplot(aes(word, n, fill = score)) + geom_col(show.legend = FALSE) + facet_wrap(~score, scales = "free_y") + labs(y = "Contribution to sentiment", x = NULL) + coord_flip()
As you can see from the graph above words like weird and risks have a relatively negative sentiment, while a word like fun has a very positive sentiment, almost reaching the maximum value of +5.
To get an idea on the sentiment for each individual feedback item we need to calculate the sentiment for each word that makes up the sentence and average the sentiment for each sentence. That way we end up with a sentiment rating for each individual feedback item. The R code below does just that and displays the results in a graph:
# Let's check the average sentiment of each feedback item and plot it # To do this we calculate the sentiment of each work per feedback item # and get the average of all the words used in the feedback session_sentiment_com <- session_sentiment %>% group_by(ID) %>% inner_join(get_sentiments("afinn")) sentiment_avg_per_feedback <- aggregate(session_sentiment_com$score, list(session_sentiment_com$ID), mean) names(sentiment_avg_per_feedback) <- c('ID','AvgSentiment') ggplot(sentiment_avg_per_feedback, aes(ID, AvgSentiment, fill = AvgSentiment, label = ID)) + geom_col(show.legend = TRUE) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + geom_text()
As you can immediately see in the graph above, not all IDs received an average sentiment rating. This is because they probably used words to express their feelings that are not recorded in the lexicon. One thing to keep in mind is that not all words are present in the lexicons which can lead to missing sentiment ratings.
I also notice that most feedback items are relatively positive, save for two (ID 21 and 30) that seem very negative. Let’s look at the original feedback of item 21:
# Let's take a look at feedback nr 21 filter(comments_session, ID == 21)
It seems this feedback item is actually pretty positive, I got an awesome compliment that I was doing great. Then why was sentiment of this feedback item so negative? Well the word “great” is not present in the AFINN lexicon and the word “disruptive” has a very negative sentiment rating inside the lexicon. This explains the negative sentiment for this specific sentence.
Let’s also look at a positive scored feedback item:
# Let's look at a positive ranked feedback item filter(comments_session, ID == 43)
We can see how this sentence got the sentiment rating by looking at the sentiment score of the words:
# How does that translate to the shown sentiment? filter(session_sentiment_com, ID == 43)
Both the words “lively”, and especially, “fun” have a pretty high positive sentiment rating, this of course results in the sentence being rated very positively.
So far we looked positive or negative sentiment, but like mentioned earlier, using the NRC lexicon allows us to place words into categories instead of scores. Simply rewriting some code and selecting the NRC lexicon yields these results:
# So far we used the 'AFINN' lexicon # We can also switch to a different one like 'NRC' # which also places words into various categories session_sentiment_com_nrc <- session_sentiment %>% group_by(ID) %>% inner_join(get_sentiments("nrc")) # Return a count for each nrc sentiment category nrc_category_score <- session_sentiment_com_nrc %>% group_by(sentiment) %>% summarise(sentiment_count = n_distinct(ID)) nrc_category_score
From this information we can see many words ended up in positive categories like “joy”, “positive” and “trust”. From this I can conclude that the majority of the attendees had a good time during the session! Something that is very important for me since I decided to switch the dataset I worked with just hours before the session. Instead of predicting (boring) car prices using In-Database Analytics we tried to predict whether or not you would die if you ended up in the Halloween horror movie series. Even though I was a bit nervous of using this specific dataset, remember boring = safe, I think it payed back in the feedback comments! I do have to apologies to the attendees that felt a bit of fear during the session, I know horror movies are scary and I’ll make sure to inform attendees the next time I get to do this session :-).
To finish this article I plotted the results above into a radar graph using this bit of R code:
plot_ly( type = 'scatterpolar', r = nrc_category_score$sentiment_count, theta = nrc_category_score$sentiment, fill = 'toself' ) %>% layout( polar = list( radialaxis = list( visible = T, range = c(0,20) ) ), showlegend = F )
As a closing note: Thanks to everyone who took the time to visit my session on In-Database Analytics using SQL Server! From writing this article I kinda have the feeling the vast majority of the attendees had a great time, I know I did! Special thanks for those attendees that filled in the feedback, I learned a lot from it!