We investigated the performance of the collective intelligence of NFL fans
predicting the outcome of games as realized through the Vegas betting lines.
Using data from 2560 games (all post-expansion, regular- and post-season games
from 2002-2011), we investigated the opening and closing lines, and the margin
of victory. We found that the line difference (the difference between the
opening and closing line) could be used to retroactively predict divisional
winners with no less accuracy than 75% accuracy (i.e., "straight up"
predictions). We also found that although home teams only beat the spread 47%
of the time, a strategy of betting the home team underdogs (from 2002-2011)
would have produced a cumulative winning strategy of 53.5%, above the threshold
of 52.38% needed to break even.
In this article we show the relationship between the Pareto distribution and
the gamma distribution. This shows that the second one, appropriately extended,
explains some anomalies that arise in the practical use of extreme value
theory. The results are useful to certain phenomena that are fitted by the
Pareto distribution but, at the same time, they present a deviation from this
law for very large values. Two examples of data analysis with the new model are
provided. The first one is on the influence of climate variability on the
occurrence of tropical cyclones. The second one on the analysis of aggregate
loss distributions associated to operational risk management.
We develop a theory for solving continuous time optimal stopping problems for
non-linear expectations. Our motivation is to consider problems in which the
stopper uses risk measures to evaluate future rewards.
In this work we study the behavior of classical two-person, two-strategies
evolutionary games on networks embedded in a Euclidean two-dimensional space
with different kinds of degree distributions and topologies going from regular
to random, and to scale-free ones. Using several imitative microscopic
dynamics, we study the evolution of global cooperation on the above network
classes and find that specific topologies having a hierarchical structure and
an inhomogeneous degree distribution, such as Apollonian and grid-based
networks, are very conducive to cooperation. Spatial scale-free networks are
still good for cooperation but to a lesser degree. Both classes of networks
enhance average cooperation in all games with respect to standard random
geometric graphs and regular grids by shifting the boundaries between
cooperative and defective regions. These findings might be useful in the design
of interaction structures that maintain cooperation when the agents are
constrained to live in physical two-dimensional space.
Diffusion of information, spread of rumors and infectious diseases are all
instances of stochastic processes that occur over the edges of an underlying
network. Many times networks over which contagions spread are unobserved, and
such networks are often dynamic and change over time. In this paper, we
investigate the problem of inferring dynamic networks based on information
diffusion data. We assume there is an unobserved dynamic network that changes
over time, while we observe the results of a dynamic process spreading over the
edges of the network. The task then is to infer the edges and the dynamics of
the underlying network.
<br />We develop an on-line algorithm that relies on stochastic convex optimization
to efficiently solve the dynamic network inference problem. We apply our
algorithm to information diffusion among 3.3 million mainstream media and blog
sites and experiment with more than 179 million different pieces of information
spreading over the network in a one year period. We study the evolution of
information pathways in the online media space and find interesting insights.
Information pathways for general recurrent topics are more stable across time
than for on-going news events. Clusters of news media sites and blogs often
emerge and vanish in matter of days for on-going news events. Major social
movements and events involving civil population, such as the Libyan's civil war
or Syria's uprise, lead to an increased amount of information pathways among
blogs as well as in the overall increase in the network centrality of blogs and
social media sites.
Individual happiness is a fundamental societal metric. Normally measured
through self-report, happiness has often been indirectly characterized and
overshadowed by more readily quantifiable economic indicators such as gross
domestic product. Here, we examine expressions made on the online, global
microblog and social networking service Twitter, uncovering and explaining
temporal variations in happiness and information levels over timescales ranging
from hours to years. Our data set comprises over 46 billion words contained in
nearly 4.6 billion expressions posted over a 33 month span by over 63 million
unique users. In measuring happiness, we use a real-time, remote-sensing,
non-invasive, text-based approach---a kind of hedonometer. In building our
metric, made available with this paper, we conducted a survey to obtain
happiness evaluations of over 10,000 individual words, representing a tenfold
size improvement over similar existing word sets. Rather than being ad hoc, our
word list is chosen solely by frequency of usage and we show how a highly
robust metric can be constructed and defended.
On-line portfolio selection is a fundamental problem in computational
finance, which has been extensively studied across several research
communities, including finance, statistics, artificial intelligence, machine
learning, and data mining, etc. This article aims to provide a comprehensive
survey and a structural understanding of existing on-line portfolio selection
techniques in literature. From an on-line machine learning perspective, we
first formulate on-line portfolio selection as an on-line sequential decision
problem, and then survey a variety of state-of-the-art approaches in
literature, which are grouped into several major categories, including
benchmarks, "Follow-the-Winner" approaches, "Follow-the-Loser" approaches,
"Pattern-Matching" based approaches, and meta-learning algorithms. In addition
to the problem formulation and related algorithms, we also discuss the
relationship of these algorithms with the Capital Growth theory in order to
better understand the commons and differences of their underlying trading
ideas. This article aims to provide a timely and comprehensive survey for both
machine learning and data mining researchers in academia and quantitative
portfolio managers in financial industry to help them understand the state of
the art and facilitate their research or practical applications. We also
discuss some open issues and evaluate some emerging new trends for future
research directions.
This paper outlines a framework for the study of innovation that treats
discoveries as additions to evolving networks. As inventions enter they expand
or limit the reach of the ideas they build on by influencing how successive
discoveries use those ideas. The approach is grounded in novel measures of the
extent to which an innovation amplifies or disrupts the status quo. Those
measures index the effects inventions have on subsequent uses of prior
discoveries. In so doing, they characterize a theoretically important but
elusive feature of innovation. We validate our approach by showing it: (1)
discriminates among innovations of similar impact in analyses of U.S. patents;
(2) identifies discoveries that amplify and disrupt technology streams in
select case studies; (3) implies disruptive patents decrease the use of their
predecessors by 60% in difference-in-differences estimation; and, (4) yields
novel findings in analyses of patenting at 110 U.S. universities.
Information-communication technology promotes collaborative environments like
Wikipedia where, however, controversiality and conflicts can appear. To
describe the rise, persistence, and resolution of such conflicts we devise an
extended opinion dynamics model where agents with different opinions perform a
single task to make a consensual product. As a function of the convergence
parameter describing the influence of the product on the agents, the model
shows spontaneous symmetry breaking of the final consensus opinion represented
by the medium. In the case when agents are replaced with new ones at a certain
rate, a transition from mainly consensus to a perpetual conflict occurs, which
is in qualitative agreement with the scenarios observed in Wikipedia.
Recent academic work has developed a method to determine, in real time, if a
given stock is exhibiting a price bubble. Currently there is speculation in the
financial press concerning the existence of a price bubble in the aftermath of
the recent IPO of LinkedIn. We analyze stock price tick data from the short
lifetime of this stock through May 24, 2011, and we find that LinkedIn has a
price bubble.