How to moderate constantly evolving communities?

Social media platforms are not considered to be publishers of content hence they are not required to moderate the content hosted on their website. Although the lack of legal burden allows such platforms to not employ any sort of moderation, the toxic and dangerous content submitted by a few anti-social parties can damage the reputation of the website resulting in fewer opportunities for monetization by investments or advertisements. Websites with minimal moderation do exist, such as 4chan, however, they rarely have a stable monetization strategy and are not made to appeal to a larger audience. Reddit has found itself in a special position where its inception came out of the rebellion against bigger mainstream media sites, its evolution has led it to adopt moderation strategies to maintain a welcoming platform. Its primary strategy for moderation was and still is community moderating itself, however, cases, where communities themselves are anti-social, has presented Reddit with a problem for which it has employed administrator to make the decision on behalf of the entire Reddit to intervene in the community. However, due to its scale, and as our results show the constant evolution of communities, constant human moderation has become a need. This prohibitively expensive requirement of constant moderation leads us to test automated proactive solutions in flagging subreddits evolving towards undesirable and dangerous communities. Our analysis show such tools can effectively and timely assist moderators in making these decisions based on a few selected features extracted from the subreddit. From these features, our model relies the most on migration patterns and the participation of moderators. Simply put, if a community is seeing a lot of incoming users migrating from a recently banned or quarantined community or moderators becoming highly active in removing or posting on the subreddit. Other features such as the language used, external reports from media outlets are also features that reveal the regression of a subreddit.

Reddit is a content aggregation platform where users submit and vote on content in user-created topic-specific communities. Initially considered as the bastion of free speech rebelling against the gatekeepers of information i.e. the big media, Reddit, over the years, has found itself recovering ad hoc from the underbelly of free speech and anonymity. With users governing user-created communities, Reddit’s governance has mainly been left to the communities to police themselves. Moderating is volunteer work and is appointed from within the community. Their role is mostly restricted to ensuring content is relevant to the community and the content is following the community-specific guidelines. This often means the moderator’s decisions are tunnel-visioned to only uphold the subreddit rules, this results in users, such as, u/Violentacrez moderating r/jailbait, a subreddit to share suggesting pictures featuring minors. This subreddit was only closed after the administration was criticized on CNN for hosting such content. The administrators of Reddit ensure each community does not violate Reddit’s content policy.

The problem with governing Reddit.

Being a user aggregation website, Reddit cannot be legally considered as a publisher. This means, as per Section 230, Reddit is not obliged to vet and moderate the content hosted on Reddit but can optionally choose to moderate hosted content. Fighting the Big Media, Reddit started off with few rules and regulations, however, cases such as that of r/jailbait compelled Reddit to reevaluate and make additions to their content policies. Currently, after more than 12 amendments, Reddit's content policy lists 8 rules for each user and community to abide by. This was primarily done to ensure a welcoming platform. Having advertisers and investors as the primary source of revenue, it pays to have a non-toxic platform. While the members of and the content in the communities are moderated by volunteers from within the communities, the communities are moderated by the administration. Made up of employees from Reddit, these administrators monitor communities for violation of any content policies. It is the administration that decides to close or quarantine the communities. Keeping a tight line between its interest and the interest of its users seeking an open and non-interfering platform, moderation and administration have presented themselves as a life and death problem for Reddit. While it might seem evaluation of the community upon its creation could reveal its intention and user base, it is unknown whether communities evolve in terms of its members of the topic of discussion? An example would be, does a football-related community remain a football community throughout its lifetime and does not let's say, evolve into a community with anti-social users advocating for fringe ideologies. Our analysis of the 3000 most popular subreddits shows communities do not converge to a stable user base or discussion topic.

The influx of community members.

The community is essentially driven by the community members. Regardless of the topic of the community, the types of users in the community determine what the community talks about and how the interactions play out. A community can be related to a non-political topic that the administration has evaluated to be in no violation of the content policy, can be overtaken by users from toxic subreddits to change the course of the subreddit. ____ is an example of such a case. A way to determine the category of a user would be to observe their submissions and observe the subreddits they are previously active in. A group of users highly active in subreddits related to video games might shift the direction of a political subreddit upon joining. In our analysis, for each of the users, we create an embedding representation in the latent space. This embedding represents the posting history of the user and is in the latent space allows us to perform measurements on these embeddings. In the figure below we outline how we measure the evolution of a subreddit with respect to the user joining and exiting the community.

user-embeddings.svg

The topic drift.

In addition to the community members, the topic of the subreddit is subject to change as well. Accounts of radicalized users detail how their engagement in benign gaming-related subreddits radicalized their beliefs due to the involvement of political topics and ideologies in the discussion. Measuring the evolution and instability in the topic of discussion inside a subreddit would reveal whether it is feasible for the administrators to evaluate the community based on its intended topic upon its creation or whether constant moderation is required.

topic-embeddings.svg

Measuring Monthly Evolution

For each of our subreddit, we create two community embedding each month, one representing the topics discussed in the subreddit that month and the other representing the user base of the subreddit for that particular month. These community embeddings present the subreddit in a latent space that can be exploited to compute distances and similarities between two community embeddings. Measuring the differences in community embeddings overtime for a subreddit show both topics and user bases diverge for the majority of subreddits over time. Our results highlight more often than not subreddits change the topic of discussion and user-bases as the community grows.

Exploring the results we observe another important pattern in the evolution. The subreddits which end up being banned and quarantined evolved significantly different as compared to normal and control subreddits since their inception. This hints at the possibility that evolution into dangerous subreddits might be predictable due to its distinct evolutionary pattern.

Figure 3.svg