Our Cyber Threats Research Centre colleagues couldn’t host an in-person TASM Conference this year, but instead organised a week of virtual events from 21 to 25 June 2021. This post is the first in a three-part series based on overviews of three of the virtual TASM panels [Ed.]
By Adam Whitter-Jones
Many Internet and social media companies now use computer code instead of people to make decisions on what we see, and we call these algorithms. What we see online has the potential to influence our thoughts and behaviour. The past few years have seen a shift in the way that algorithms present content, with the existing deterministic reverse chronological approach (newest stuff first) not being scalable due to the sheer volume of user generated content.
To challenge this, more complex algorithms have been employed, which utilise developments such as deep learning neural network models (making computers think like a human brain) pulling on metadata (likes, interests, searches), as well as content analysis (what is this video about?), and computer vision techniques (what is this picture of?) in order to understand what types of content are likely to interest users at a specific point in time. This helps keep users engaged.
On Wednesday 23 June 2021, a panel of experts discussed the increasing prevalence and role that algorithms play in our daily lives. They highlighted key issues that we face, potential resolutions to such issues, and thoughts for the future.
Algorithms are almost inseparable from the internet that we use today. Content is personalised; this can either be through web searches which collect and sort public information and recommend the most relevant information to you, or through content amplification to drive engagement on platforms such as YouTube and Facebook.
The issue is that the most extreme types of content drive the highest engagement. This creates a cycle of promoting more content of a similar extreme nature. There is a growing body of evidence to show that one can start at a mainstream political view but slowly be pushed to more extreme views and content, demonstrating a potential path to extremism.
Content moderation can help to solve this problem
Until recently few people were talking about what content moderation is or was, and this conversation has evolved over time. Perhaps previously one could say “moderation is a decision a company makes about taking down something or leaving it up” such as the DMCA notice and take down regime.
This area is easy enough to comprehend: requests to remove and moderate content based on legality are an easy black and white decision. The most divisive and difficult issues are where the content lies between those two, sometimes referred to content that is ‘lawful but awful’.
Content moderation as a tool is much broader and diverse than simple take down requests. Any number of actions can be taken to moderate a piece of content, such as de-monetizing, geo-blocking, or downranking/shadow banning, which results in less distribution. If one thinks more critically, the use of content amplification itself (or rather a potential lack of amplification) is in a broader sense a kind of content moderation.
The issue with this? Transparency
We do not know how content moderation algorithms work, and it is not always clear what, when and how much content has been removed. Even if said algorithms were made public, the end-user is unlikely to understand it. Even so, trying to comprehend it all as one parcel is likely to be fruitless. So, it is necessary to unpack each of the different elements (machine learning, neural networks, computer vision techniques, natural language processing) to understand the bigger picture.
The real blurring of the lines lies in that services and platforms have, for a long time, been able to determine their own content policies from which they can take their own actions and sometimes arbitrary decisions on what content to allow or disallow. Now many countries, as well as the EU, are debating whether Internet companies should have these freedoms and governments have become accustomed to companies having control over speech.
Proposals to bring control of content moderation algorithms under the state through the guise of ‘democratic supervision’ presents the risk of governments attacking and potentially censoring speech that is lawful. As discussions on this nuanced area evolve, we are beginning to see how human rights can be implicated. This is especially so now that the Internet has become far more centralised, with a small number of companies having a huge influence on the content that is circulated on platforms that we are almost dependent on.
The crux of the issue remains: we don’t really know or understand how platform’s algorithms work.
Whilst some platforms such as Facebook and YouTube have released a few company statements and research papers. These still result in researchers relying on inferences or reverse engineering through APIs to try and understand and piece together hypotheses.
There are two potential approaches:
- Opening the algorithms for audit,
- Using an Application Programming Interface (API) as a route into the decision-making algorithms to measure, input, and sample data.
Option 1 has a precedent in the financial services industry; however, it presents issues on misappropriation for nefarious uses by rogue states; we do not really want governments having a deep view in to private messages and infrastructure. Option 2 was the approach that was preferred by the panel; it could provide richer datasets to work with, potentially including ethnographic studies. Making these data available for research would better inform public policy decisions.
Finally, what is really needed is an understanding of the baseline operation of algorithms that manage content amplification, i.e. how should they work for the target user? Similarly, an understanding of how algorithms work in practice when sharing or viewing extremist content is required. Once this understanding is available, researchers can conduct comparative research which will hopefully result in a more balanced and informed discussion helping to shape policy in this nuanced area.
The CYTREC/Legal Innovation event ‘Algorithmic Transparency and Content Amplification’ was chaired by Seán Looney of Swansea University and featured speakers Dr Chris Meserole from the Brookings Institution and Emma Llanso from the Centre for Democracy and Technology.
For more on this topic please check out the Center for Democracy & Technology’s Do You See What I See? Capabilities and Limitations of Automated Multimedia Content Analysis, the work of Mnemonic on takedowns of human rights documentation on social media platforms, and the upcoming GIFTCT Content-Sharing Algorithms, Processes, and Positive Intervention Working Group Report. See also the recently released article on Recommender Systems and the Amplification of Extremist Content available open access at the Internet Policy Review.