Project Page
This semester's project is on modeling movie gross income,
is it exponential decay? Details to come.
Data from the
Internet Movie Data Base
Here is the data on the top 10 movies of 1999
and 2003. Besides the raw data, a green
line was constructed using the publically available program gnuplot which
is a non-linear fit to exp(a*x+b). You can see in many (most?) cases the
decaying exponential is not a bad fit to the data. Some movies open only in
NY and LA, a few others build an audience by word of mouth. By in
large, blockbusters have a huge advertising budget to make the opening
week the biggest grossing week, and the following weeks fall off by
a more-or-less constant (percentage) rate.
- **this need to be reworded, perhaps being more vaque?**
Assuming that the a movie that opens nation wide has a gross can be
modeled using A*exp(-k*t) for t the time in weeks and some constant k > 0
since the movie
opened, and stays open forever after. Find expressions for the gross
after the first week and the total gross, make a table of the
ratio of
total gross to gross for the openning week for typical values of k.
- **Sophmore slump. Unusally large drop offs after the first week.
Can this be explained by sum of two decay functions, one faster than
the other?
- More to come?
Why would exponential decay be a good model box office income?
The best known example of exponential decay is from radioactivity. One
supposes large numbers of atoms. Over a short time interval each
atom has a certain probability of decaying. Each atom is an independent
agent and decides to decay or not by the flipping of some nano-coin.
Thus the rate of decrease of y, the number of atoms, would proportional
to the number atoms. Or y' = -ky. This ODE can be solved, all solutions
have the form y = A*exp(-k*t).
By analogy, we can think of the collection of people who will see
the movie as independent agents. Over short time periods, each person
independently decides to see or not to see the movie by the flipping
of some coin. The people who flip the coin are exactly the population
which will see the film and not all people. The size of this population
is what determines and the eventual gross. The coin flipping probability
just determines the `slope' of the curve which gives k and then
A can be computed from k and total gross.
How to see that the data is exponential
Plot the ln (or log_10) of the gross versus the time. Exponential decay
will show up if the data mostly lines up on a line. You can
buy semi-log graph paper, which automatically does the log function
for you. The slope of this line and ln y = ln(A*exp(-k*t) = -k*t + ln A
is -k. This link gives directions
for finding these constants using a TI-89.
Sometimes the double population model is a better fit
Going back to the ratioactive example, if there are two radioactive
isotopes with different half-lives. The observed curve of exponential
decay will have two straight line segments. An initial one from
the shorter half life element. But eventually only the longer half
life element remains, and this produces a second line segment with
less steep slope.
One can think of the movie going public divided into two groups
one of which is much more likely see the movie sooner. Again we
have looked at only the people that will purchase tickets, but
we have subdivided this population.
The curve would be of the form A_0*exp(-k_0*t) + A_1*exp(-k_1*t) for
example here is one attempt. (Perhaps a 3 audience model would be better
for this example?)
Work of mouth and increasing grosses
Some movies grow an audience. First assume the best k > 0 has been
found using the data after things have become the usual exponential
decay. We can estimate the size of the audience a_n at the beginning
of the n week, but setting g_n = a_n integral k exp(-k t) dt from 0 to 1.
Usually A = a_1, a_2 = A - a_1, a_3 = A - a_1 - a_2, but if the
audience is growing a_2 could be much bigger than a_1, so the estimate
of the audience at the n-th week would be a_1 + ... + a_n and we
would see some growth in these numbers.
For example the following data truncates the first week and some
trailing weeks in order to get a good fit. The fit suggests that the
k = 0.19768, and the audience is exp(17.656)/0.19768 = 46,548,252/0.19788
= 235,472,742. But the first week's take was only 26,681,262 which
would imply that the audience A(1-exp(-0.19768)) = 26,681,262 or
A = 155,830,695 so the audience grew 235,472,742 - 155,830,695 = 79,642,046
or roughly a factor of 50% in one week!
Raw Data
Warning, the data from imdb.com has shortcoming which are obvious
if you look hard enough. The data below has been slightly modified.
The data was not carefully re-checked. The raw data files are in
directory/folder