Geotargeted by the NY Times

Reading the Times this morning, I got to the middle of a story about the cost of medical care when I was stopped by this passage:

Consider Boston, our best guess for where you might be reading this article. It’s very expensive for spending on the average Medicare patient. But, when it comes to private health insurance, it’s about average.

Their best guess about my whereabouts was pretty good. I am indeed in the Boston area. I’m not surprised that they know that, but I was nonplussed to discover that they had altered the story according to my location. Here’s the markup for that paragraph:

<div class="g-insert">
<p class="g-body">
Consider <span class="g-custom-place g-selected-hrr-name">
Boston</span>
<span class="g-geotarget-success">, our best 
guess for where you might be reading this article</span>. 
<span class="g-new-york-city-addition g-custom-insert g-hidden"> 
(Here, the New York City region includes all boroughs but the 
Bronx, which is listed separately.)</span>

It’s <span class="g-medicare-adjective g-custom-place">
very expensive</span> 
for spending on the average Medicare patient. 
<span class="g-local-insert g-very-different">
But, w</span><span class="g-close g-same g-hidden g-local-insert">
W</span>hen it comes to private health insurance, it’s 
<span class="g-hidden g-same g-local-insert">also</span> 
<span class="g-private-adjective g-custom-place">
about average</span>. 

<span class="g-close g-hidden g-local-insert">
The study finds that the levels of spending for the two programs 
are unrelated. That means that, for about half of communities, 
spending is somewhat similar, like it is in 
<span class="g-custom-place g-selected-hrr-name g-hrr-only-no-state">
Boston</span>
</span>

<span class="g-same g-hidden g-local-insert g-same-sentence">
<span class="g-custom-place g-hrr-only-no-state">Boston</span> 
is one of the few places where spending for both programs 
is very similar – in most, there is some degree of mismatch.
</span>

<span class="g-new-york-addition g-local-insert g-hidden">
Several parts of the New York metropolitan area are outliers 
in the data – among the most expensive for both health 
insurance systems.</span>

<!-- <span class="g-atlanta-addition g-local-insert g-hidden">
(Atlanta is one of the few places in the country where spending 
for both programs is very similar. In most, there is some 
degree of mismatch.)</p> -->
</p>
</div>

It seems I live in a g-custom-place (“Boston”), which is associated with a g-medicare-adjective (“very expensive”) and a g-private-adjective (“about average”). Because the Times has been able to track me down, I get a g-geotarget-success message. But for the same reason I don’t get to see certain other text, such as a remark about New York as an outlier; those text spans are g-hidden.

Presumably, a program running on the server has located me by checking my IP number against a geographic database, then added various class names to the span tags. Some of the class names are processed by a CSS stylesheet; for example, g-hidden triggers the style directive display: none. The other class names are apparently processed by a Javascript program that inserts or removes text, such as those custom adjectives. Generating text in this way looks like a pretty tedious and precarious business, a little like writing poetry with refrigerator magnets. For example, an extra set of class names is needed to make sure that if a place is first mentioned as a city and state (e.g., “Springfield, Massachusetts”), the state won’t be repeated on subsequent references.

I suppose there’s no great harm in this bit of localizing embellishment. After all, they’re not tailoring the article based on whether I’m black or white, male or female, Democrat or Republican, rich or poor. It’s just a geographic split. But it makes me queasy all the same. When I cite an article here on bit-player, I want to think that everyone who follows the link will see the same article. It looks like I can’t count on that.

Update: Turns out this is not the Times’s first adventure in geotargeting. A story last May on “paths out of poverty” used the same technique, as reported by Nieman Lab. (Thanks to Andrew Silver (@asilver360) for the tip via Twitter.)

Update 2015-12-17: Margaret Sullivan, the Public Editor of the Times, writes today on the mixed response to the geotargeted story, concluding:

The Times could have quite easily provided readers with an opt-in: “Want to see results for your area? Click here.”

As the paper continues down this path, it’s important to do so with awareness and caution. For one thing, some readers won’t like any personalization and will regard it as intrusive. For another, personalization could deprive readers of a shared, and expertly curated, news experience, which is what many come to The Times for. Losing that would be a big mistake.

This entry was posted in modern life.

3 Responses to Geotargeted by the NY Times

kktkkr says:

15 December 2015 at 11:44 pm

The paragraph mentioned is customized so heavily that it becomes nonsense if the script is blocked entirely:

Consider . Spending on Medicare patients is in this area. hen it comes to private health insurance, spending is .

This also reveals more geotargeting further on in the article.

As an example of the havoc dynamic page generation can cause, consider one of Randall Munroe’s April Fools jokes: Umwelt (discussion), which serves up a different comic out of ~100 available depending on location/ISP, browser version, referrer header and window size. It’s impossible to link to one specific version without taking a screenshot and/or reuploading the image.

Geotargeting is particularly annoying because it works even on a fresh browser without any prior browsing, is really difficult to manipulate, and can be almost invisible in the case of this article. If the wording had been different for Canada versus the US, for instance, it would become a geopolitical split rather than just a geographic one.
Wolfgang says:

18 December 2015 at 6:52 pm

I share your ambivalent feelings. On the one side it’s totally cool to think of a highly personalized story-telling feature in the news. Maybe even some day connecting to your health monitor and skipping news which would embarass you, raise your pulse etc. On the other hand it is really creepy and, indeed, intrusive. Because it makes the judgements for you, taking away your personal liberties. Maybe you belong to a certain classification “white, male, leftist, over 40″ etc. but you yourself do not subscribe to it. Nevertheless you get “personalized” information, and maybe no information at all, in which you would be interested, but which cannot be known to the algorithm, which always wants the “best” for you. On the other hand, maybe this state is already present when it comes to classical propaganda, which is exactly in the same way intrusive and selective?
Brian Petersen says:

24 December 2015 at 12:03 pm

It’s a troubling trend. One solution is dynamically redirecting the reader from the article to a URL with the tailored content.

In this example, a GET to /my-article would result in a 3XX response to a geo-appropriate URL like /my-article?location=… that your browser could then follow. This would at least allow you to share a link to the resource/content you consumed. Unfortunately, most of the 3XX status codes aren’t entirely appropriate for this use case, either wreaking havoc on caching mechanisms (which is a big deal for large content providers like NYT) or resulting in undesired permanent behavior (eg, instructing your browser to always follow the same redirect, independent of your current location).

This is a much bigger problem for complex dynamic web applications where the state maintained on the client is much more comprehensive and cannot be easily serialized as part of the URL. When I’ve run into this problem in the past, I’ve insisted that the client is responsible for sharing the content directly with interested parties (eg, you’d host the HTML that was rendered by NYT on your blog), but then of course there’s an issue with data integrity and authenticity, which we “solved” by inventing an X-Authored-By HTTP header whereby the content author could digitally sign the content. I’m not holding my breath for this hack to catch on more generally, but it’s worked for us in the past. :)