A failed experiment

September 1st, 2009

It seems in the past month or so, newspapers have been making some noise that they are going to start charging for at least some of the content. One of the reports I heard about for boston.com was that people were considering charging for the content found in the Boston Globe, and leaving the rest as available to all readers.

That got me thinking about how much of what gets read from boston.com is Globe content, and also about how that may have changed over the years. I started thinking how traffic gets to the article pages anyway, and to a great extent the biggest traffic driver to an article page seems to be the homepage. This could be something I could measure, how much of the boston.com homepage is allocated to links to Boston Globe articles, and how much of it is from other sources. And has it changed over time. (My assumption was that Globe content would increase around 2006 when the boston.com and Boston Globe editorial departments became more closely integrated, and Globe reports started writing more direct to online content. Before that, the sense that I got was that Globe content was frequently highlighted for its unique view, but when news of the day changed from what was published the night before, the site started putting more up to the minute wire content from AP or Reuters.)

So I grabbed every copy of the boston.com homepage that existed on the Internet Archive Wayback Machine. I wrote a small script that would read each file and judge each link to be either a Globe link, a non-globe link, or one that didn’t count (I omitted things links to other section fronts like the news, sports, etc. pages.) I called a link a Globe link if it either was within a section of Globe content (/dailyglobe2/world/…) or the tease associated with the link had an attribution of “(Boston Globe)” or “(Today’s Globe)” I ran my script over all the homepages, put the results into a spreasheet to graph them and found….

Practically no difference in the ratio of Globe content vs. other content from about 2001 through 2008 (the last dates that the wayback machine has data.) Now I’m wondering what to do next. There may be something that I’m missing (I’m counting any link with a greater relevancy over another, even though links “above the fold” tend to get clicked on more than links near the bottom.) It could be the script I wrote to parse the homepage into globe/noglobe links isn’t counting things accurately. Maybe the data is right, but I should start playing around with it in r just to learn it.

Oh well, I was looking forward to publish the results that supported my hypothesis. Saying that I can’t support it isn’t nearly as much fun, but I figure its as much of a story to tell as the other.

Screenscraping is tough

August 9th, 2009

I just read about a site called The Book Seer which does book recommendations. So I went back through the last three books that I finished (which were all very different from each other, so I wanted to see how it would respond:

Fun Home oddly enough couldn’t get any book recommendations from Amazon. Some of the recommendations it did find seemed to be geared more on its graphic novel-ness and less on its content (can someone see a connection Frank Miller’s Sin City and  Alison Betchel’s Fun Home besides they both have lots of pictures?)

Angels & Demons suggested most of Dan Brown’s other works, which isn’t very surprising. (my brother lent it to me well before the movie came out, and it just reached the priority level in the pile. We were discussing Bruce Scheier’s post about Hacking a Papal Election and the connection between security and tradition. I also meant to read it before I saw the film, but now its too late, at least for the theaters.)

Sacred Attunement was the oddest of all though. I got a whole bunch of recommendations labeled “attribute error at line …”.

The next six years will be fun.

July 16th, 2009

Huffington Post had a story earlier today Franken’s First Time Around: Comedian-Turned-Senator Questioned “Clarence Thomas” In 1991 SNL Sketch and it occured to me: Every time Franken does something, somebody is going to drag out an old SNL sketch about the subject. If he does something on health care, people will drag out the “Brain Tumor Comedian” bit. Does something about aging and people will pull out some appropriate Stuart Smalley piece.

I’m not all that familar with the current cast; When SNL spoofs a senate committe hearing, who are they going to pick to play Franken?

Its those darn bloggers, they ruin everything

July 15th, 2009

Last night  (July 13, 2009) The Late Late Show with Craig FergusonConnie Schutz was a guest. She talked a bit about journalism a bit:

I’ve been writing about this lately. I think  there might be somewhat of a solution. In fact as I was riding in today there was a, a judge ruled … there was a court ruling in February. Basically what I’m saying is these bloggers; (to Furgeson)You know bloggers, you and I have so much fun with bloggers. They tend to take our work for free and they interfere with our advertising rates online and stuff. I’m worried about copyright law. We’re working to change it. I feel like I’m rambling and here is what I’m trying to say. Today there was a court case for the AP, actually the court case was in Februrary, a judge said ‘yeah, you know what blog aggregators, you can’t just take their work for free’ and today they settled out of court for a large amount of money, which is really good news.

Saying that most blogs are copyright infringement seems out of line. Although it does exist, its a small minority of some rarely read blogs. Its actually not all that hard to find them, either. Here are all the people who copied Connie Schutz articles. (there are a few false positives in there, but if you searched once a day,week,month and kept your eye out for new articles that weren’t there in before, you could wind up with a reasonably small set of cease and desist letters to send out each time period.) I can see how the argument could apply to news aggregators (Google news, Drudge Report, etc.) but are bloggers really the problem?

My second problem is that the “Hot News Doctrine“,she’s talking about changing copyright law (but giving no details on how) while at the same time showing how existing law and legal decisions seem to be succeeding at what she wants.

And to show how dedicated she is to copyright law itself (and not just how it can be expanded to help make her employer more money) just before that speech quoted above she pointed out this Huffington Post page: Craig Ferguson’s Best Musical Numbers: Which Is Your Favorite? (VIDEO) (POLL) which has a bunch of youtube clips of the show which are obviously copyright infringement. (CBS doesn’t upload their content to Youtube, they have the own video playback and embedding service on cbs.com) This maybe only slightly mitigated by the fact that Craig said during this segment that he didn’t care what got posted from the show after it aired. (“They can have mine. I don’t care. I do it, it’s done.” although I’m not sure he has all of the rights to the content to tell anyone that.) What a way to practice what you preach.

So I went to her web site at cleveland.com and found her recent articles. In them she does indeed talk about news agregator services and not blogs themselves. (she must have a good editor) In Tighter copyright law could save newspapers she is pointing to a lawyers wanting minor changes copyright to make the hot news doctrine stronger. She has a followup article Idea that would help save newspapers makes bloggers howl where she picks a couple of the most outlandish rebuttles and takes them down as straw men. (most articles I found referencing the article were much more measured than the two she decided to use as examples.)

Unfortunately, although I want to see journalism find its way into the future, I’m really annoyed that yet another group of content publishers wants to further restrict copyright and take uses of the work that were once legal. The movie studios wanted the DMCA and got it. The music publishing industry wanted the Sony Bono Copyright Act and got it. Now Ms. Schutz want to encourage the newspaper industry to encourage further restrictions.

Are the newspapers going to hold themselves to the same standard? When bloggers break a story will web sites run by newspapers wait for the original author to derive a competitive benefit from it?

Update:

I guess I somehow missed the party as it was going on. Tonight I read about the pipsqueaks comment from the Cleveland Plain Dealer’s reader representitive, etc.

I don’t know much about Twitter

July 6th, 2009

… but why would someone post a link to “about:blank“?

obsolete skills

July 3rd, 2009

Did anyone else see that article from a few days ago where BBC magazine had a kid give up his MP3 player for a Sony Walkman? After reading that article, and after talking about that article with my older  daughter. We wound up in a similar situation. Yesterday she started being tutored for her Torah portion for her Bat Mitvah next June.

Her tutor sends home tapes for people to practice with. She doesn’t have the capability to make CDs yet. My first idea was to send her with a laptop and Audacity. The record/stop button  interface of Audicity should be familiar to the tutor. My backup plan was to create MP3s off of the tape. I forgot about the laptop until my daughter texted me about needing a tape recorder, so on to the backup plan.

Until I can make the MP3s, she’s using an old portable tape recorder (some sort of late 80′s Walman-style portable tape player.) and it was amusing to see all of the things that I’d take for granted that she had trouble figuring out:

The first issue was to figure out which side of the tape it was recorded on. This involved playing the tape on each side, rewinding, fast-forwarding, playing with the volume control, and unfortunately switching from “tape” to “fm” while the volume was at full. Eventually she figured out which side the recording was on and where on the tape. Then she labeled it and told me that it was unfortunate design (where I pointed out to here that the cassette did have sides labeled “A” and “B”, which she didn’t notice.)

The next issue was how to fit that cardboard track listing sleeve into the plastic cassette holder. She had it folded L-shaped and not J-shaped. It didn’t fit in right and was too tall. Once I pointed out how that there were two creases in the cardboard, she figured it out.

Its been a while since I’ve needed a tape player, but I learned these skills a long time ago. To my daughter, this is all brand new and there is a learning curve.

The connection between an the artist and tools

June 29th, 2009

Can you picture an artist who claims that they can’t sketch/draw/paint/sculpt or otherwise use  tools to manipulate their medium? Or an author who is can’t write or type? That without using these tools can still create world famous works?

The recent news of Michael Jackson passing away keeps bringing me to this question. The news keeps reminding me of Jackson, and when I think of Jackson, I think about the plagerism lawsuit over The Girl Is Mine. In it Jackson claims that although he doesn’t know how to play an instrument, he composes complete renditions of his songs fully formed out of his head (melody, harmony, counterpoint, rhythm.) , through singing and scatting into a tape recorder. No rough drafts, rewrites, editing etc. The tapes then go to Quincy Jones who arranges the piece and writes out parts for the studio musicians.

I find it hard to believe that someone can have that great of a musical mind they can conceive of these great pieces, and yet not have the ability to correlate that hitting this or that key on a keyboard matches the pitch that is going on inside his head. I would think that eventually tying the notes of those melodies into their names would be necessary, if even as a temporary holding space to work out the rest of it. (or more accurately, since I know of people with physical disabilities creating works under extraordinary difficulty: Christy Brown painting with his left foot, or Richard Stallman dictating code to a transcriber, that the effort to create without learning a tool seems far more burdensome than actually learning it.)

Although I think the story that was given seems implausible, I’m not sure what the true story is. It could be that the work was transformed by musicians piece by piece so that Jackson didn’t realize that it was thoroughly transformed. I’ve known some very clever graphics designers who could do that.  You start by giving your idea of what you want. They already know how they want it to end up, and all the development iterations involve convincing you that their changes are a refinement your idea. It could be that his private deals with unknown songwriters is that they have to give up composer credit for their work. Whatever the real story is, once I decided he was lying about how his music was composed, I decided that the potential that other statements were lies were high.

Maybe its just me though. There have been too many times I’ve been in disagreements with people that the tools that they need to use are “computer stuff” and so they can remain oblivious too it: Salesmen asking me for help setting up an LCD projector and Powerpoint. Web content producers not understanding or caring that a (table embedded in a table embedded in a table)^n embedded in a Javascript document.write() will not render quickly no matter how many servers are added. That when the web server you are using treats URLs case-sensitively, you shouldn’t put the wrong URL on a billboard facing the southeast expressway. Where is the line though? At what point does something become a system needing a technologist to extend or maintain and at what point does a content producer need to accept responsibility to understand the tools of their art?

The inverted pyramid of news reporting

June 25th, 2009

One time Penn Jillette was recounting an anecdote on his radio show. He was on an airplane and then boarding after him was a notable stand up comic. After greeting each other, the comic started asking questions about his career. “How many times did you do the Tonight Show? “Oh about three or four times so far”. “How many appearances on Letterman? Your New York show, is that on Broadway or an off-Broadway show” After answering his questions the comic muttered, “just great. If this plane goes down, the article will read ‘Penn Jillette comedian and magician noted for his Broadway show and his late night talk show appearances died in a plane crash. Also on board was …”
A slightly different collection of on screen credits and Jillette would have been related to the “also on board was magician Penn Jillette and and 175 other passengers.”

A slightly different collection of credits, and Farrah Fawcett’s passing would be all over the news tonight. Ed McMahon and David Carrodine are off of everyones notice entirely.

clichéd analogies

June 24th, 2009

I got told one of those “all of the other kids…” lines and I responded with “if all the other kids jumped off a bridge, would you?”
The reply I got was “if all the other parents used the same lame analogy, would you?”

Engineering put into perspective

June 4th, 2009

I’ve had this link hanging around for a while, wanting to put it into a story. http://briandavidphillips.typepad.com/brian/2008/10/how-to-fold-a-t.html I still haven’t written anything about it, so I might as well post it as is. Sometimes the best engineered solutions aren’t the best solutions.