Analysis: Updating Our Video Tracking System
Creative Commons image courtesy of redjar
We've been publishing our video sales charts for a little over five years now. In that time, I've made various tweaks to the algorithms and introduced Blu-ray sales tracking. We're starting 2013 with an overhaul of the tracking system, designed to increase the number of films we track on a weekly basis as well as to enhance the accuracy of the charts. In this article, I'll dig a little into how our tracking works, and what's new with the overhaul.
My goal in designing our video tracking back in 2006 was to have a way of measuring weekly sales for each title in the market. This was a huge task in comparison to our tracking of film box office, which are reported to us each day by the major studios, and twice a week by independent distributors. Maintaining our box office charts involves the relatively simple task of transcribing (either manually or automatically) data on films into the database. In a busy week, we might be tracking about 50 films on a daily basis and 150 films for weekend and weekly numbers.
For video sales, however, there are thousands of titles available at any one time, many with multiple different packagings (DVD, Blu-ray, special editions, boxed sets and so on), and no-one sends out official sales reports, other than the occasional press release when a studio has a particularly good week, or a distributor or production company reports something in an SEC filing. (We also get some numbers through the grapevine... please don't hold back from sending me numbers if you have any!)
So, what to do?
Well, it was obvious from the start that the most realistic way to construct a chart would be to take a weekly survey of titles and work out their individual market share for the week, and then to multiply the market share for a title by total sales for the week for all titles to get an estimate of how many units that title had sold during the week. For example, if we estimate that Brave had a 20% DVD market share the week of November 18, 2012, and a total of 10 million DVDs were sold that week, then that tells us that the film sold a total of 2 million units. (In all of the following, I am using simplified examples and numbers, by the way. The gory details of the tracking system have to take into account factors not covered here.)
As far as calculating the market share for a title goes, we essentially have a "poll of polls": we gather all data we can find about sales for each title and aggregate it into a single market share figure. This is similar to the polls of polls run by news organizations during the presidential elections. More on that shortly.
For the estimate of total sales for all titles, we use three measures: a market size estimate derived from numbers published by the Digital Entertainment Group and Home Media Magazine (let's call that the total market prediction); an estimate based on predicted first week sales for new titles (let's call that the first week prediction); and an estimate based on the sales strength of existing titles (let's call that the returning titles prediction). By combining these measurements, we produce a market size estimate that reflects all the information we have on the expected market size.
While this system worked very well in aggregate (my most recent analysis suggests it was accurate to within 3% over time), it turned out to have some problems for individual weeks. This was because the second two parts of the system -- the "first week" prediction and the "returning titles" prediction -- were not independent. To understand why, imagine that we start in week one with a high estimate for a title in its opening week. Let's say the model predicts that Brave will sell 2.3 million units on DVD. When we average that number with the other market size estimates, which predict 2 million units for Brave, we get an average estimate of 2.1 million units for Brave. If the actual sales were 2 million, that's 5% too high. In week two, our estimate of second week sales for Brave would also be 5% too high (the details are a bit more subtle than that, but it will be roughly 5%). That will increase the "returning titles" estimate for that week a bit -- less than 5%, because we're taking into account lots of titles for that calculation, but let's assume for the sake of argument that we get a estimate 1% too high from the "returning titles" factor. That's not too terrible in itself, but the problem came with the way the high estimate for first week sales for Brave also fed back into the analysis for first week sales for future titles. Again for the sake of argument, let's assume that it increased the projected sales for The Expendables 2 by 1%. We now have two factors, the "first week" prediction and the "returning titles" prediction, that are skewed too high. Over several weeks, this could mean that our overall chart would drift higher or lower, until a correction was made because the "total market" estimate would eventually pull everything back into line.
You can think of this as having a boat with three oars, two of which are on the same side. These two oars would pull the estimates high or low for a few weeks, and then the other oar (the total market estimate) would give a big tug and bring everything back into line again.
In the new revision to the system, we correct for this imbalance, so the total market estimate has enough strength to keep everything in line.
The other innovation in the 2013 iteration of our video tracking is in the way that we conduct the "poll of polls". The main challenge for this is that WalMart does not make its sales numbers available, which means that one needs to estimate their sales in order to produce a complete picture of the market. To return to the election analogy, what we do is similar to what Nate Silver did on his FiveThirtyEight blog, where instead of making predictions based solely on the averages of each candidate's support in polls, he brought in other factors to help improve the accuracy of predictions for particular states.
Our old system already did this based on the categorization of each film: family titles might be adjusted higher because they tend to sell well at WalMart, TV shows adjusted lower because they are less popular at the store, and so on. This was based on analysis of total actual sales versus sales measured by our poll of polls. While this improved accuracy overall, it was, to be honest, a bit of a pain to maintain, and somewhat unsatisfactory in that it was a "one size fits all" solution that didn't reflect performance of individual titles.
For the 2013 video tracking system, we now calculate two "polls of polls," one for online sales and the other for bricks and mortar sales. We then combine those figures to produce an overall market share for each title. For example, if Brave had a 15% market share for online sales and a 25% market share for bricks and mortar sales on its opening week, and online sales account for 10% of the DVD market, we get an overall market share for Brave of (15% x 10%) + (25% x 90%) = 24%. This should further improve the accuracy of the estimates, on the assumption that WalMart behaves similarly to other bricks and mortar retailers like Target or Best Buy.
So, to summarize, we have two adjustments to the tracking system. First, a revised total market estimate, and second a revised "poll of polls."
As you might imagine, revamping the system has taken some time. With something over 50 million measurements in our video tracking system, rerunning analysis has been quite the entertaining data management task. But we're now catching up with the backlog. The Blu-ray chart has already switched fully over to the new tracking system and the DVD chart will follow shortly. We'll be publishing two charts a week of each during January to catch up, and then further extending our charts and analysis during 2013.
One final plea. If you work for a retailer, distributor, production company, or any other organization that has raw data on video sales, rentals, or streaming information, I'd love to talk about how we can incorporate your information into our tracking. I'll be happy to provide access to some of our internal numbers in return, or find some other quid pro quo service we can provide. Every piece of data helps, so I'll be happy to hear from small retailers and producers, as well as large ones. Oh, and WalMart, if you're out there, give me a call...
Bruce Nash firstname.lastname@example.org
Date posted: 2013-01-19