The new Star Wars movie, The Force Awakens premieres tonight in Australia. Three data geeks from KPMG, Praveen Thirukonda, Chetan Ganjihal and Kenrick Lim show what’s possible when you have the scripts from the movies, deep understanding of data analytics and 5 days spare time.
Movie scripts and conversation transcripts are a huge source of unstructured text. The challenge is finding the underlying associations, structures, patterns, and most importantly meaning.
This takes significant expertise but can lead to insights into customer sentiment that are not immediately apparent. Take, for example, social media reaction to the COP21 conference. The data is easily available, but it is only through data analysis that the overall public reaction to a company or government’s policy is revealed.
But back to Star Wars.
For the inner nerds, this is how the data engineers spent the 5 days to come up with their Star Wars analysis. Beware – there is a plot and emotion spoiler alert. For those who just want to know the results I suggest you skip this bit and go straight to ‘what we found’.
The key steps undertaken in the analysis included:
Text analytics: Text mining of unstructured data from the scripts using natural language processing to identify sentiments and keywords:
For each episode, the script is parsed and associations are carried out for scenes and dialogues with individual characters. Natural language processing is used to calculate sentiments and keywords for each dialogue and character
Identifying Sentiments is based on lexical analysis and n-gram models
Association Mining: Identifies strengths between relationships based on association mining. Our model is based on screen space shared by each character. The Results are used to build the social graph
What we found
Some unexpected insights you can explore visually by clicking here include:
Definite proof Luke is happier after he meets Princess Leia.
Even C3PO is happy at some points in the movies.
Common themes clearly emerge by character across each movie and are clearly tracked on the visual simulation.
You can even compare your own thoughts about upward feedback to the characters and read our episode 7 predictions – although these are definitely statistically unsound.
Praveen Thirukonda, Chetan Ganjihal and Kenrick Lim have made this fantastic Star Wars data analysis and visual representation happen. If they have done this in their spare time – what could an analysis of your business data achieve in business time?
May the force be with you!
Explore more here