by Megan Risdal
Many of us know that data collection, cleaning, and processing is a time-consuming and sometimes arduous ordeal that requires patience along with elbow grease. It’s usually the end product—insights from an analysis to feed action—that motivates us to munge. In this interview, Khuram Zaman of Fifth Tribe, explains how a desire to develop effective counter-messaging measures against violent extremists was the impetus behind creating and sharing his carefully curated dataset, How ISIS uses Twitter, on Kaggle.
The dataset, which consists of over 17,000 tweets from more than 100 pro-ISIS “fanboys”, is available to Kaggle users to analyze and participate in “crowdsourcing the fight against terrorism.” Khuram uploaded the tangle of extremist chatter in May 2016 as Kaggle began piloting a new feature allowing users to share the fruits of their labor in the form of public datasets. So far users and organizations have uploaded a wide variety of datasets for Kagglers to explore, analyze and visualize.
As he shares his story, Khuram tells us how witnessing the effects of al-Qaeda shifting its means of radicalization from mainstream media to modern, polished digital marketing to spread terrorist propaganda to new, younger audiences motivated him to make a positive impact on the world. His longtime interest in social media coupled with experience in digital strategy prepared him to challenge their terroristic messages. Since making the data publicly available, Kagglers have so far revealed that a small number of individuals command most of the influence in this pro-ISIS network. Read on to learn about the fascinating story behind Khuram’s data and how it can be used to understand and combat the influence of extremist messages on social media.
Deep in the data
What motivated you to share this data with the Kaggle community?
I shared this dataset because I really believe that the open source community is a powerful resource in the fight against terrorism. I see how ISIS and groups like it are essentially crowdsourcing terrorism and would like to develop responses in kind. Many government officials tend to try and engage the large, established companies like Facebook, Google, and Twitter. Those companies have to balance privacy and security. Other than banning users, it’s not really clear how they can combat violent messages from extremists. However, the open source and crowd source community doesn’t have such limitations.
I think platforms like Github, Kaggle, Algorithmia, Meetups, and coding competitions can connect designers, developers, and digital strategists to create a strong network effect in ways that other organizations can’t. I’d like to see greater coordination with moderate Muslim leadership, government, and the open source community to work together to deal with challenge of violent extremism which is both a problem we face at home and abroad.
Can you tell us about how you collected and cleaned the data?
Collecting and cleaning the data was very challenging. Firstly, it’s a lot harder to find pro-ISIS fanboys (or “daeshbags” as Anonymous likes to call them) than you’d think. By studying tweets over an initial three month period, I realized that I needed to find clear rules for separating people that were pro-ISIS and people that were anti-Western. Indicators of someone being pro-ISIS included: (a) keywords in the user’s name, description or tweets such as “Dawla” (which refers to the State), “Baqiyyah” (which denotes being part of the supposed ‘ever-expanding’ of the ISIS state), “Amaq” (the agency used by ISIS to issue official proclamations, “Wilayat” (used by ISIS to divide up the world into its provinces), etc. I also looked at imagery such as if a user had the ISIS flag or images of radical leaders like al-Baghdadi, Anwar Awlaki, etc. I also looked at who they were following and following them back. All of these factors helped me identify factors to collect the data.
Secondly, once I was able to find these guys on Twitter, it was hard to keep track of them because Anonymous had initiated their campaign #OpISIS and was actively reporting the Twitter accounts of violent extremists and their sympathizers. ISIS responded by simply acquiring hacked twitter accounts on the dark web and respawning very quickly. Some of the users I first tracked were on the 4th or 5th account but within a few months, they were on their 90th or 100th account. The solution was to create multiple scripts. One script was used to create a library of all the users and checked if they were still active or had their account suspended. Another script used Tweepy to download the data and store it in a SQLite database.
How did you become interested in learning from social media to combat terrorism?
My interests in combating terrorism and social media were two distinct interests that eventually converged.
My interest in combating terrorism originated when an individual named Samir Khan I knew was killed in a U.S. drone attack in Yemen. Samir and I met online through various forums and came to know one another. However, as time progressed, he began expressing more and more radical views such as supporting the Taliban, al-Qaeda, and eventually a former Imam named Anwar Awlaki. At first, I didn’t think that much about his views as I thought it was just a phase and he would outgrow it. I got admitted and attended law school and lost touch with him. When I graduated and came back home, I learned that he had moved from North Carolina to join Anwar Awlaki in Yemen.
Up until this point, groups like al-Qaeda had relied on the mainstream media to propagate its message of terror. Anwar Awlaki was a very popular preacher and spoke both English and Arabic and ended up reaching audiences farther than other radical preachers. Unfortunately, he became effective at radicalizing young Western Muslims to go overseas and take up arms for various causes. Samir was one of those youth. However, he ended up being very influential in the evolution of terrorist propaganda. Samir was one of the first extremists to use introducing modern design, technology, and marketing to promote terrorism. His digital magazine “Inspire” ended up becoming the basis for which later groups which draw inspiration. ISIS created its own magazine “Dabiq” which very clearly is modeled off of Inspire.
When Samir Khan was killed in the drone attack, it really affected me deeply. As an average person, you read about terrorism in the news and it sounds very far away. You think, “Oh, this is something that the government should handle. What can I possibly do in my individual capacity?” My focus when I graduated from law school was just finding a job and settling down, I wasn’t thinking about geopolitics. Samir’s death was a wake-up call to me that the issue of terrorism strikes closer to home more than we’re aware of. My father served in the U.S. Army Medical Corps when he was my age and when I asked him why he signed up, he told me he felt that it was his duty to serve his country. I began to feel that I had an obligation to my countrymen, my community, and to the broader world to try and do something but I didn’t know exactly what.
Samir’s death was a wake-up call to me that the issue of terrorism strikes closer to home more than we’re aware of.
My interest in social media developed pretty early on. I was in college when Facebook came out and Twitter came out when I graduated from law school. I used these social media platforms quite effectively to do community organizing and fundraising for various social causes. I ended up raising about half a million dollars online. Given that success, I ended up working with various organizations including a non-profit, a real-estate investment firm, and ended up joining my current company where I worked my way up to both partner and CEO. Our company’s name is Fifth Tribe and we’re a digital agency based out of the Washington DC Metro area. We came up with this name based on David Logan’s “Tribal Leadership” which talks about how there are five types of corporate culture and each type defines the level of success a company has. A stage five company is one whose tribe is aligned to seek success not only for itself, its clients, but also to impact the world in a positive way. We gave ourselves this name in order to remind ourselves to do good things. At Fifth Tribe, we did product development, branding, web/mobile application development, and digital marketing. It gave me a lot of the skills to see digital strategy as a powerful tool to do more than just sell widgets for organizations. I began to feel that digital strategy could help solve some of the world’s greatest challenges.
My interest in combating terrorism and my interest in digital strategy converged through complete chance. My friend Shahed Amanullah who runs Affinis Labs was organizing a hackathon at the 2014 Hedaya CVE Expo in Abu Dhabi. He invited Fifth Tribe to participate in the competition. I ended up inviting my co-worker who survived a suicide bombing in Pakistan while he was working at the United Nations a few years earlier. There, we first exposed in detail to some of the propaganda tactics by groups like ISIS. We saw how they were issuing 200,000+ tweets a day, using cutting edge design and videos to recruit unsuspecting youth from all over the world, and even modding popular games like Grand Theft Auto. We realized that as a digital agency, we were perfectly poised to go head to head with their propaganda. We presented in front of the entire conference and competed against 4 other teams and, much to our surprise, ended up winning first place.
We saw how they were issuing 200,000+ tweets a day, using cutting edge design and videos to recruit unsuspecting youth from all over the world, and even modding popular games like Grand Theft Auto.
After we came back home, I realized that there was a lot more that I could be doing. I began teaching myself Python through courses on Udemy and Coursera. During one of our company hackathons, I decided to try and play around with Twitter’s API and used an open source library called Tweepy to began analyzing ISIS propaganda in real time. After the 2015 Paris Attacks, I increased the scale of the project and instead of analyzing a handful of users and hashtags, I began looking into integrating multiple languages. I presented my twitter app publicly for the first time at theDC Hack and Tell in December 2015. Right before the hackathon, was able to figure out how to implement unicode encoding which allowed me to download tweets in multiple languages including Arabic, Turkish, Urdu, etc. After that, I began downloading tweets at a rate of 2,500 or so per week culminating in a dataset of 17,000 tweets over the past 6 months.
Tell us about your favorite script (so far!) made using the data…
This is a very hard question to answer. I think all of them are pretty cool. I think it’s a three-way tie between Georgi Gospodinov who analyzes the network of users, HuwFulcher who color-coded message senders, message receivers, and both, and Megan Risdal who analyzed the most popular languages used in the tweets.
What’s the most interesting or insightful thing you’ve learned about the data?
The most interesting insight I learned was how a handful of users are essentially thought leaders and influence much of the network. I thought it would be a bit more evenly spread across the network, but its clear that 3-5 people are generating the lion’s share of content and serve as connectors between content producers and content receivers.
How do you plan to use the analyses created by Kagglers?
I can’t go into details about this, but the general idea is to present this data to mainstream clergy and influencers and develop counter-messaging. I’m putting together a team of people from all over the world to take this project to the next level. If anyone is interested in helping, please let me know.
Opening up about open data
In what ways do you see easy access to open data changing the world?
Whether we like it or not, open data is here to stay. Instead of avoiding it, we should embrace it. I think the greatest impact will be on governance and business. Open data forces transparency and accountability and there are a lot of inefficiencies in terms of how we govern and conduct commerce. Open data exposes these inefficiencies and allows us to correct them.
If you could make any other data freely available for analysis, what would it be?
I’d like to to upload a dataset of Islamophobes as well. This should be a lot easier given that they don’t go through as many suspensions and also typically speak in one language as opposed to many. I’d then like to compare both datasets to see how news events affect both datasets. For example, Republican Presidential candidate Donald Trump’s decision to ban Muslims from the U.S. following the San Bernardino attacks was used in digital propaganda by al-Shabbab. My hypothesis is that these extreme fringes of society are feeding off of one another and I want to see if the data shows that.
Khuram Zaman is the CEO of Fifth Tribe, a leading digital agency based in Washington D.C. He has provided digital marketing services to clients as diverse as the U.S. Air Force, Aetna Innovation Health, Kaiser Permanente, Silatech, Oxfam, and the Hult Prize. His writing has been featured in Entrepreneur.com, Business2Community, and LinkedIn Publisher. He’s interested in studying social media data for a variety of different sectors.
Megan Risdal is a Content Marketing Intern at Kaggle. She is a Data explorer and linguist with interests in scientific literacy. She received a Master’s in Theoretical Linguistics from the University of California, Los Angeles and a Master’s in English Sociolinguistics from North Carolina State University. She can be followed on Twitter: @
This article was first published on 3 June, 2016 on the official Kaggle blog. Re-published here with permission.