I just read Top 10 Lies Newspaper Execs are Telling Themselves and it seems to start off as “What is the opposite of what the Boston Globe is doing”. The tips are: “We can manage this disruption from within an integrated organization” where more of Boston.com reports to Boston Globe management than three years ago. “Print advertising reps can sell online ads too” and the Boston.com sales teams were merged with the print sales teams about four years ago. “We can re-create scarcity by putting up pay walls”, we’ll see how that goes.
Judy Sims is an online media professional, so she may have a particular point of view. I guess over time we’ll see which direction works.
Boston.com announced a redesign of their homepage. They describe it as a “cleaner, leaner homepage” Since I had copies of their previous homepages for my previous experiment (fetched from the Wayback Machine), I figured I’d take a look at what their homepage size has looked like over time. The size of just the index.html file (not counting embeded images, ads, etc.) looks like this:
and the homepage as I write this is 103,478 bytes. The big jump in 2008 I’m not sure if it was short term thing. (there were only four days of data for 2008, so it may not be representitive) and I don’t have any recent data, but the most you can say is they brought the homepage back to 2001-2006 levels.
I’m sure that keeping the homepage under control is a daunting task, politically. According to the boston.com mediakit the homepage gets about a third of the site’s total pageviews. I’m sure there is a strong motivation for all the editorial departments, and the advertising department to add just one more thing onto the page.
It seems in the past month or so, newspapers have been making some noise that they are going to start charging for at least some of the content. One of the reports I heard about for boston.com was that people were considering charging for the content found in the Boston Globe, and leaving the rest as available to all readers.
That got me thinking about how much of what gets read from boston.com is Globe content, and also about how that may have changed over the years. I started thinking how traffic gets to the article pages anyway, and to a great extent the biggest traffic driver to an article page seems to be the homepage. This could be something I could measure, how much of the boston.com homepage is allocated to links to Boston Globe articles, and how much of it is from other sources. And has it changed over time. (My assumption was that Globe content would increase around 2006 when the boston.com and Boston Globe editorial departments became more closely integrated, and Globe reports started writing more direct to online content. Before that, the sense that I got was that Globe content was frequently highlighted for its unique view, but when news of the day changed from what was published the night before, the site started putting more up to the minute wire content from AP or Reuters.)
So I grabbed every copy of the boston.com homepage that existed on the Internet Archive Wayback Machine. I wrote a small script that would read each file and judge each link to be either a Globe link, a non-globe link, or one that didn’t count (I omitted things links to other section fronts like the news, sports, etc. pages.) I called a link a Globe link if it either was within a section of Globe content (/dailyglobe2/world/…) or the tease associated with the link had an attribution of “(Boston Globe)” or “(Today’s Globe)” I ran my script over all the homepages, put the results into a spreasheet to graph them and found….
Practically no difference in the ratio of Globe content vs. other content from about 2001 through 2008 (the last dates that the wayback machine has data.) Now I’m wondering what to do next. There may be something that I’m missing (I’m counting any link with a greater relevancy over another, even though links “above the fold” tend to get clicked on more than links near the bottom.) It could be the script I wrote to parse the homepage into globe/noglobe links isn’t counting things accurately. Maybe the data is right, but I should start playing around with it in r just to learn it.
Oh well, I was looking forward to publish the results that supported my hypothesis. Saying that I can’t support it isn’t nearly as much fun, but I figure its as much of a story to tell as the other.