The secret life of tweets

On Twitter you can say anything you want as long as it fits in 140 characters. The length limit is one of those frozen accidents of history, like QWERTY and the genetic code. In olden days (2006), tweets had to fit into cell-phone text messages, which imposed a limit of 160 characters. (Twitter reserves 20 characters for the sender’s @handle.) Back then, resources were so scarce the company had to squeeze the vowels out of its name: “twttr,” they called it. Now, we have bandwidth to burn. On the other hand, human attention is still a constraint.

Tweet being composed. The text reads: 'To my prolific tweeps: I dote upon every precious character you send my way, which is why I am sometimes grateful you can send me no more than 140.' This message exceeds the 140-character limit by 5 characters.

Last year, a proposal to raise the limit to 10,000 characters was shouted down in a storm of very terse but intense tweets.

The 140-character limit is enforced by the Twitter software. When you compose a tweet, a counter starts at 140 and is decremented with each character you type; if the number goes negative, the Tweet button is disabled (as in the screen capture above). Based on this observation, I had long believed that every tweet was indeed a little snippet of pure text composed of no more than 140 characters. Was I naïve, or what?

My belated enlightenment began earlier this week, when I began having trouble with links embedded in tweets. Clicking on a link opened a new browser tab, but the requested page failed to load. The process got stuck waiting to connect to a URL such as https://t.co/E0R99xtQng. The “t.co” domain gave me a clue to the source of the problem. A long URL (http://bit-player.org/2016/bertrand-russell-donald-trump-and-archimedes, for example) can use up your 140-character quota in a hurry, and so twitterers long ago turned to URL-shortening services such as bit.ly and TinyURL, which allow you to substitute an abbreviated URL for the original web address. The shortening services work by redirection. When your browser issues the request “GET http://bit.ly/xyz123″, what comes back is not the web page you’re seeking but a message such as “REDIRECT http://ultimate.destination.page.com”. The browser then automatically issues a second GET request to the provided destination address.

In 2011 Twitter introduced its own shortening service, t.co. Use of this service is automatic and inescapable. That is, any link included in a tweet will be converted into a 23-character t.co URL, whether you want it to be or not, and even if it’s already shorter than 23 characters. The displayed link may appear to refer to the original URL, but when you click on it, the browser will go first to a t.co address and only afterwards to the true target. Embedded images also have t.co URLs.

A drawback of all redirection services is that they become a bottleneck and a potential point of failure for the sites that depend on them. If t.co goes down, every link posted on Twitter becomes unreachable, and every image disappears. Is that what happened earlier this week when I was having trouble following Twitter links? Probably not; a disruption of that scale would have been widely noted. Indeed, I soon discovered that the problem was quite localized: It plagued all browsers on my computer, but other machines in the household were unaffected.

When I did a web search for “t.co broken links,” I quickly discovered a long discussion of the issue in the Twitter developer forum, with 87 messages going back to 2012. Grouchy complaints are interspersed with a welter of conflicting diagnoses and inconsistent remedies. Much attention focused on Apple hardware and software (which I use). A number of contributors argued that the problem is not in the browser but somewhere upstream—in the operating system, the router, the cable interface, or even the internet service provider.

After a day or two, my problem with Twitter links went away, and I never learned the exact cause. I hate it when that happens, although I hate it more when the problem doesn’t go away. However, that’s not why I’m writing this. What I want to talk about is something I stumbled upon in the course of my troubleshooting. I found a plugin for the Google Chrome browser, Goodbye t.co, that promised to bypass t.co and thereby fix the problem. How could it do that? If t.co is not responding, or if the response is not getting through to the browser, how can code running in the browser make any difference? It seems like tinkering with your television set when the broadcaster is off the air.

The source code for Goodbye t.co is on GitHub, so I took a look. The program is just a couple dozen lines of JavaScript. What I saw there sent me running back to my Twitter feed, to examine the web page using the browser’s developer tools.

Here’s a tweet I posted a few days ago, as it is displayed by the Twitter web site. Note the link to an arXiv paper:

Beckett tweet

And here’s the HTML that encodes that tweet in the web page:

Beckett tweet HTML

The text of the tweet (“A problem in coding theory that comes from a Samuel Beckett play: ”) amounts to 66 characters, plus 25 more for the link (“arxiv.org/abs/1608.06001 “). But that’s not all that Twitter is sending out to my followers. Far from it. The block of HTML shown above is 751 characters, and the complete markup for this one tweet comes to just under 7,000 characters, or 50 times the nominal limit.

Take a closer look at the anchor tag in that HTML block:

The href attribute of the anchor tag is a t.co URL; that’s where the browser will go when you click the link. But, reading on, we come to a data-expanded-url that gives the final destination link in full. And then that same final destination URL appears again in the title attribute. This explains immediately how Goodbye t.co can “bypass” the t.co service. It simply retrieves the data-expanded-url and sends the browser there, without making the detour through t.co.

I have two questions. First, if you’re going to use a shortened, redirected URL, why also include the full-length URL in the page markup? The apparent answer is: So that the web browser can show the user the true destination. This is clearly the point of the title attribute. When you hover on a link, the content of the title attribute is displayed in a “tooltip.” I’m not so sure about the purpose of the data-expanded-url attribute. It’s surely not there to help the author of Goodbye t.co. Twitter presumably has some JavaScript of its own that accesses that field.

The second question is the inverse of the first: If you’re going to include the full-length URL, why bother with the shortening-and-redirecting rigmarole? Twitter could shut down the t.co servers and doubtless save a pile of money. Those servers have to deal with all the links and images in some 200 billion tweets per year. The use of redirection doubles the number of requests and responses—that’s a lot of internet bandwidth—and introduces delays of a few hundred milliseconds (even when the service works correctly). Note that Twitter could still display a shortened URL within the text of the tweet, without requiring redirection.

Twitter’s own developer documents offer an answer to the second question:

Tens of millions of links are tweeted on Twitter each day. Wrapping these shared links helps Twitter protect users from malicious content while offering useful insights on engagement.

The promise to “protect users from malicious content” presumably means that if I link to a sufficiently sleazy site, Twitter will refuse to redirect readers there, or perhaps will just warn them of the danger. (I don’t know which because I’ve never encountered this behavior.) As for “offering useful insights on engagement,” I believe that phrase could be translated as “helping us target advertising and collect data with potential market value.” In other words, t.co is not just a cost center but also a revenue center. Every time you click on a link within a tweet, Twitter knows exactly where you’re going and can add that information to your profile.

A few months ago, Twitter announced a slight change to the 140-character rule. @handles included in the text will no longer count toward the character total, and neither will images or other media attachments. Some press reports suggested that links would also be excluded from the count, but the official announcement made no mention of links. And t.co redirection is clearly here to stay.

I can suggest two takeaway messages from this little episode in my life as an internaut.

If you want to limit the “insights on engagement” that Twitter accumulates about your activities, you might consider installing a plugin to bypass t.co redirection. There’s an ongoing argument about the wisdom and morality of such actions, focused in particular on ad-blocking software. I have my own views on this issue, but I’m not going to air them here and now.

The other small lesson I’ve learned is that using alternative URL-shortening services with Twitter is worse than pointless. Pre-shrinking the URL has no effect on the character count. It also obscures the true destination from the reader (since the title attribute is “bit.ly/whatever”). Most important, it interposes two layers of redirection, with two delays, two potential points of failure, and two opportunities to collect saleable data. Yet I still see lots of bit.ly and goo.gl links in tweets. Am I missing or misunderstanding something?

4 Responses to The secret life of tweets

Kartik Agaram says:

27 August 2016 at 6:49 pm

“Pre-shrinking the URL has no effect on the character count. It also obscures the true destination from the reader, interposes two layers of redirection, with two delays, two potential points of failure, and two opportunities to collect saleable data. Yet I still see lots of bit.ly and goo.gl links in tweets. Am I missing or misunderstanding something?”

I fear it’s just the last ‘drawback’ you mentioned. People posting on Twitter often have a marketing agenda, and want to collect analytics on their readers to be able to show how ‘effective’ they are at tweeting. Not for advertisers this time but employers.
AP says:

29 August 2016 at 1:21 pm

I am probably just expanding on what Mr. Hayes and Mr. Agaram have already said, but I think URL shorteners represent more than a single point of failure: they represent a centralization of links, hence a centralization of the internet. Much as the dominance of google search has done, whereby an “upgrade” to the algorithm can put companies out of business, imagine what could happen to the internet if every link were shortened using one or few services: the centralization of linking. Not only it’s a single point of failure, but it’s also a vantage point to snoop on people’s browsing (what if that URL were personalized for you — they are in marketing email) and a point of control (show competitors down 10% of the time). If your browser called home to report every click you would change browser, wouldn’t you? Shortened links “call home” before redirecting. Which brings to the next point, why use alternate, redundant shorteners? By negating the exclusiveness of the analytics information that twitter is gathering under the pretenses of shortening (shortening was necessary only because of the equally arbitrary 140 character limit, but links don’t count towards it any longer), using, say, bit.ly or goog.gl, users are in part boycotting this unhealthy business model. It’s a bit like ad-blocking: some users are just trying to clean up their screens and save bandwidth, others want to kill a business model predicated on creating user addiction and then selling out their attention, that is their system of beliefs, to the highest bidder.
- Brian Hayes says:
  
  30 August 2016 at 6:59 am
  
  Agreed about the iniquity of centralization.
  
  As for using bit.ly or the like to disrupt Twitter’s business model, it may well be a motive for some users. But is it effective? Through t.co Twitter still gets its own call home, and they can learn the ultimate destination simply by issuing a GET or HEAD request to the embedded bit.ly address. Could bit.ly block bulk requests from Twitter? I suppose they could, but would they risk alienating a service that’s probably responsible for a big chunk of their traffic?
  
  Also, as far as I can tell, links do still count as 23 characters. The Twitter announcement of the change does not mention them at all, one way or the other. A simple experiment shows the character count decremented by 23 for every link pasted into a tweet.
  
  It’s an interesting question why Twitter chose to keep the count penalty on links while dropping it for images, although the answer probably has nothing to do with the issue under discussion here. They may just want to discourage the use of Twitter as a link farm.
John Cowan says:

29 August 2016 at 1:58 pm

The connection between the 140-character tweet limit and SMS is more symbolic than real. SMS uses a 7-bit character set and doesn’t waste the high-order bit, so it packs up to 160 characters into 140 bytes. The lower 32 characters, corresponding to the ASCII controls, represent a small selection of additional letters, enough to support French, German, Italian, Danish, Swedish, Norwegian, Finnish, Spanish (without accent marks, or with the wrong ones) and Greek (capitals only). For languages other than these, SMS sends a maximum of 70 Unicode characters represented with 16 bits each (only Plane 0).

Twitter, on the other hand, has always supported up to 140 Unicode characters, so you can get twice as much Chinese text (say) in a tweet as you can in an SMS.