Twitter’s latest Robo-Nag flags “harmful” language before you post


Enlarge /. Before you tweet, you might be asked if you want to be that rude.

Want to know exactly what the Twitter fleet of text combining and dictionary parsing bots defines as “mean”? You have instant access to this data from every day – at least when a strict auto-moderator says that you are not tweeting politely.

On Wednesday, members of Twitter’s product design team confirmed that a new automatic prompt will be introduced for all Twitter users regardless of platform or device, which will be activated when the language of a post exceeds the Twitter threshold for “potentially harmful or objectionable language” exceeds. This follows a series of restricted-user tests of the Announcements that began last May. In a moment, all Robo-moderated Tweets will appear with the message “Would you like to check this before tweeting?” Interrupted.

A screenshot example of what the note will look like in action.  The feature seems to be specifically geared towards responses on the site.Enlarge /. A screenshot example of what the note will look like in action. The feature seems to be specifically geared towards responses on the website. Previous tests of this feature, unsurprisingly, had their share of problems. “The algorithms that the [warning] In many conversations, prompts struggled to grasp the nuance and often did not differentiate between potentially offensive language, sarcasm and friendly banter, “said the Twitter announcement. The news item made it clear that Twitter’s systems now do this among other things are responsible for how often there are two accounts interacting with each other – that is, I’ll likely get a flag to send swear words and insults to a celebrity I’ve never spoken to on Twitter, but I’d probably be aware that I do send the same sentences to friends or Ars colleagues via Twitter.

Additionally, Twitter admits that its systems previously needed updates to “accommodate situations where language can be reclaimed by underrepresented communities and used in non-harmful ways.” We hope the data points used for these determinations don’t go as far as verifying a Twitter account’s profile photo, especially since troll accounts typically use fake or stolen images. (Twitter has yet to clarify how it makes provisions for these “situations” mentioned above.)


At press time, Twitter does not offer a handy dictionary where users can read or skilfully misspelling their favorite insults and curses in order to mask them from Twitter’s automatic moderation tools.

So two-thirds kept it real?

To sell these annoying messages to users, Twitter pats itself on the back in the form of data, but it’s not entirely convincing.

During the friendliness notification testing period, Twitter reported that a third of users either rephrased their flagged posts or deleted them, while everyone who was flagged posted 11 percent fewer “obnoxious” posts and replies, as averaged. (That is, some users may have gotten friendlier while others may have gotten more determined in their gun-protected language.) It all sounds like a vast majority of users are steadfast in their personal quest to say it as it is remains.

The strangest data point from Twitter is that anyone who has received a flag is “less likely to get offensive and harmful responses back”. It is unclear what point Twitter is aiming at with this data: Why should there be a duty of courtesy on those who receive nasty tweets?

This follows another Twitter nagging initiative launched in late 2020 to encourage users to “read” an article linked by another Twitter user before “tweeting” it again. In other words, if you see a juicy headline and hit the RT button, you can inadvertently say something that you may not be okay with. However, this change seems too small an association for a bigger Twitter problem: How the service incentivizes rampant, timely use of the service in finding likes and interactions, honesty and courtesy is doomed.

And no nagging is likely to fix Twitter’s problems with the way bogus actors and trolls keep playing the system and poisoning the site’s discourse. The biggest example remains a problem that occurs when clicking heavily “popular” and replied to posts, usually from high profile or “verified” accounts. On Twitter, drive-by posts are often placed at the top of responses from these threads, often from accounts with suspicious activity and a lack of organic interactions.

Perhaps Twitter could take the lessons of this nagging rollout to heart, particularly about how to weight interactions based on a confirmed back-and-forth relationship between accounts. Or, the company could get rid of any algorithmic weighting of posts, especially those that put untracked content into a user’s feed, and revert to the better days of purely chronological content – so we at BS can shrug our shoulders more easily.


Steven Gregory