The secret life of tweets

On Twitter you can say anything you want as long as it fits in 140 characters. The length limit is one of those frozen accidents of history, like QWERTY and the genetic code. In olden days (2006), tweets had to fit into cell-phone text messages, which imposed a limit of 160 characters. (Twitter reserves 20 characters for the sender’s @handle.) Back then, resources were so scarce the company had to squeeze the vowels out of its name: “twttr,” they called it. Now, we have bandwidth to burn. On the other hand, human attention is still a constraint.

Tweet being composed. The text reads: 'To my prolific tweeps: I dote upon every precious character you send my way, which is why I am sometimes grateful you can send me no more than 140.' This message exceeds the 140-character limit by 5 characters.

Last year, a proposal to raise the limit to 10,000 characters was shouted down in a storm of very terse but intense tweets.

The 140-character limit is enforced by the Twitter software. When you compose a tweet, a counter starts at 140 and is decremented with each character you type; if the number goes negative, the Tweet button is disabled (as in the screen capture above). Based on this observation, I had long believed that every tweet was indeed a little snippet of pure text composed of no more than 140 characters. Was I naïve, or what?


My belated enlightenment began earlier this week, when I began having trouble with links embedded in tweets. Clicking on a link opened a new browser tab, but the requested page failed to load. The process got stuck waiting to connect to a URL such as https://t.co/E0R99xtQng. The “t.co” domain gave me a clue to the source of the problem. A long URL (http://bit-player.org/2016/bertrand-russell-donald-trump-and-archimedes, for example) can use up your 140-character quota in a hurry, and so twitterers long ago turned to URL-shortening services such as bit.ly and TinyURL, which allow you to substitute an abbreviated URL for the original web address. The shortening services work by redirection. When your browser issues the request “GET http://bit.ly/xyz123″, what comes back is not the web page you’re seeking but a message such as “REDIRECT http://ultimate.destination.page.com”. The browser then automatically issues a second GET request to the provided destination address.

In 2011 Twitter introduced its own shortening service, t.co. Use of this service is automatic and inescapable. That is, any link included in a tweet will be converted into a 23-character t.co URL, whether you want it to be or not, and even if it’s already shorter than 23 characters. The displayed link may appear to refer to the original URL, but when you click on it, the browser will go first to a t.co address and only afterwards to the true target. Embedded images also have t.co URLs.

A drawback of all redirection services is that they become a bottleneck and a potential point of failure for the sites that depend on them. If t.co goes down, every link posted on Twitter becomes unreachable, and every image disappears. Is that what happened earlier this week when I was having trouble following Twitter links? Probably not; a disruption of that scale would have been widely noted. Indeed, I soon discovered that the problem was quite localized: It plagued all browsers on my computer, but other machines in the household were unaffected.

When I did a web search for “t.co broken links,” I quickly discovered a long discussion of the issue in the Twitter developer forum, with 87 messages going back to 2012. Grouchy complaints are interspersed with a welter of conflicting diagnoses and inconsistent remedies. Much attention focused on Apple hardware and software (which I use). A number of contributors argued that the problem is not in the browser but somewhere upstream—in the operating system, the router, the cable interface, or even the internet service provider.

After a day or two, my problem with Twitter links went away, and I never learned the exact cause. I hate it when that happens, although I hate it more when the problem doesn’t go away. However, that’s not why I’m writing this. What I want to talk about is something I stumbled upon in the course of my troubleshooting. I found a plugin for the Google Chrome browser, Goodbye t.co, that promised to bypass t.co and thereby fix the problem. How could it do that? If t.co is not responding, or if the response is not getting through to the browser, how can code running in the browser make any difference? It seems like tinkering with your television set when the broadcaster is off the air.

The source code for Goodbye t.co is on GitHub, so I took a look. The program is just a couple dozen lines of JavaScript. What I saw there sent me running back to my Twitter feed, to examine the web page using the browser’s developer tools.

Here’s a tweet I posted a few days ago, as it is displayed by the Twitter web site. Note the link to an arXiv paper:

Beckett tweet

And here’s the HTML that encodes that tweet in the web page:

Beckett tweet HTML

The text of the tweet (“A problem in coding theory that comes from a Samuel Beckett play: ”) amounts to 66 characters, plus 25 more for the link (“arxiv.org/abs/1608.06001 “). But that’s not all that Twitter is sending out to my followers. Far from it. The block of HTML shown above is 751 characters, and the complete markup for this one tweet comes to just under 7,000 characters, or 50 times the nominal limit.

Take a closer look at the anchor tag in that HTML block:

Beckett tweet HTML anchor tag

The href attribute of the anchor tag is a t.co URL; that’s where the browser will go when you click the link. But, reading on, we come to a data-expanded-url that gives the final destination link in full. And then that same final destination URL appears again in the title attribute. This explains immediately how Goodbye t.co can “bypass” the t.co service. It simply retrieves the data-expanded-url and sends the browser there, without making the detour through t.co.

I have two questions. First, if you’re going to use a shortened, redirected URL, why also include the full-length URL in the page markup? The apparent answer is: So that the web browser can show the user the true destination. This is clearly the point of the title attribute. When you hover on a link, the content of the title attribute is displayed in a “tooltip.” I’m not so sure about the purpose of the data-expanded-url attribute. It’s surely not there to help the author of Goodbye t.co. Twitter presumably has some JavaScript of its own that accesses that field.

The second question is the inverse of the first: If you’re going to include the full-length URL, why bother with the shortening-and-redirecting rigmarole? Twitter could shut down the t.co servers and doubtless save a pile of money. Those servers have to deal with all the links and images in some 200 billion tweets per year. The use of redirection doubles the number of requests and responses—that’s a lot of internet bandwidth—and introduces delays of a few hundred milliseconds (even when the service works correctly). Note that Twitter could still display a shortened URL within the text of the tweet, without requiring redirection.

Twitter’s own developer documents offer an answer to the second question:

Tens of millions of links are tweeted on Twitter each day. Wrapping these shared links helps Twitter protect users from malicious content while offering useful insights on engagement.

The promise to “protect users from malicious content” presumably means that if I link to a sufficiently sleazy site, Twitter will refuse to redirect readers there, or perhaps will just warn them of the danger. (I don’t know which because I’ve never encountered this behavior.) As for “offering useful insights on engagement,” I believe that phrase could be translated as “helping us target advertising and collect data with potential market value.” In other words, t.co is not just a cost center but also a revenue center. Every time you click on a link within a tweet, Twitter knows exactly where you’re going and can add that information to your profile.


A few months ago, Twitter announced a slight change to the 140-character rule. @handles included in the text will no longer count toward the character total, and neither will images or other media attachments. Some press reports suggested that links would also be excluded from the count, but the official announcement made no mention of links. And t.co redirection is clearly here to stay.

I can suggest two takeaway messages from this little episode in my life as an internaut.

If you want to limit the “insights on engagement” that Twitter accumulates about your activities, you might consider installing a plugin to bypass t.co redirection. There’s an ongoing argument about the wisdom and morality of such actions, focused in particular on ad-blocking software. I have my own views on this issue, but I’m not going to air them here and now.

The other small lesson I’ve learned is that using alternative URL-shortening services with Twitter is worse than pointless. Pre-shrinking the URL has no effect on the character count. It also obscures the true destination from the reader (since the title attribute is “bit.ly/whatever”). Most important, it interposes two layers of redirection, with two delays, two potential points of failure, and two opportunities to collect saleable data. Yet I still see lots of bit.ly and goo.gl links in tweets. Am I missing or misunderstanding something?

Posted in computing | 4 Comments