.. _`models`: ============================== Probabilistic models in symfer ============================== Mathematically spoken, a *probabilistic model* consists of: - a set of named *variables*, usually :math:`X, Y, \ldots` - for each variable, a finite *domain*, often :math:`\{x_0, x_1, \ldots\}` - a *probability distribution* :math:`\prob{X,Y,\ldots}`: a function that maps every assignment :math:`X{=}x_0, Y{=}y_2, \ldots` to a probability between 0 and 1, for example :math:`\prob{X{=}x_0, Y{=}y_2, \ldots}=0.42`. The sum of all these probabilities should be 1. .. note:: As of now, *continuous* models are not supported. However, support for mixtures of Gaussians is under consideration for a future version. In |symfer|, variables and domain values can be represented by arbitrary Python values. However, as a convention we use strings for variable names; for domain values we use strings, ints or booleans. A very simple model can be constructed as follows: >>> import symfer as s >>> weather = {'Weather':['sunny','rainy']} >>> sprinkler = {'Sprinkler':['off','on']} >>> joint = s.Multinom([weather, sprinkler], [0.48, 0.32, 0.198, 0.02]) The list of probabilities is ordered such that the first assignment changes fastest, i.e. .. math:: \prob{\text{Weather}{=}\text{sunny},\text{Sprinkler}{=}\text{off}} &= 0.48\\ \prob{\text{Weather}{=}\text{rainy},\text{Sprinkler}{=}\text{off}} &= 0.32\\ \prob{\text{Weather}{=}\text{sunny},\text{Sprinkler}{=}\text{on}} &= 0.198\\ \prob{\text{Weather}{=}\text{rainy},\text{Sprinkler}{=}\text{on}} &= 0.02 The above approach may work fine for very small models, but becomes unwieldy very quickly when they get larger: for a model of thirty binary-domain variables, we would need a list of :math:`2^{30} \approx 10^9` probabilities. Therefore, probabilistic models are almost always defined by a *factored* probability distribution, like: .. math:: \prob{X{=}x,Y{=}y,Z{=}z} &= \prob{X{=}x} \cdot \cprob{Y{=}y}{X{=}x} \cdot \cprob{Z{=}z}{Y{=}y}\\ & \qquad\text{for all $x,y,z$} .. rst-class:: floater .. seealso:: More about :ref:`Bayesian networks `. This is an example of a *Bayesian network*, the most common form of a factored probability distribution, in which there is one factor for each variable (and this factor defines the *conditional probabilty distribution* over this variable given each assignment to its *parents*). We could construct this model as follows (given some parameters for the conditional probability distributions): >>> x = {'X':['x0','x1']} >>> y = {'Y':['y0','y1']} >>> z = {'Z':['z0','z1']} >>> factors = {} # an empty dictionary which we'll fill with factors >>> factors['X'] = s.Multinom([x],[0.7,0.3]) # P(X) >>> factors['Y'] = s.Multinom([y,x],[0.67,0.33,0.5,0.5]) # P(Y|X) >>> factors['Z'] = s.Multinom([z,y],[0.1,0.9,0.75,0.25]) # P(Z|Y) >>> joint = factors['X'].product(factors['Y'], factors['Z']) # P(X,Y,Z) The last line is equivalent to the mathematical definition above. It defines ``joint`` to be an array of 8 numbers, matched to 8 possible assignments for :math:`X`, :math:`Y` and :math:`Z`. Each number is a product of three elements: one from each factor array. For example, :math:`\prob{X{=}x_0,Y{=}y_1,Z{=}z_1} = 0.7 \cdot 0.33 \cdot 0.25`. |symfer| knows which element to pick from each array, because the variable names and domains are stored alongside them. In fact, this is such a convenient construction that it is mathematically formalized in :ref:`factor algebra` (the subject of the next section). To get an idea, in factor algebra the above definition looks like: .. math:: \textit{joint} \isdef &f_X \prodjoin f_Y \prodjoin f_Z\\ &\text{\small where $f_Y(X{=}x_0,Y{=}y_0)=0.67$, etc.} But let's return to handling models. Above, we might as well have introduced a new Python identifier for each factor >>> p_X = s.Multinom([x],...) >>> p_Y_given_X = s.Multinom([y,x],...) >>> p_Z_given_Y = s.Multinom([z,y],...) but we chose to put them in a dictionary ``factors`` instead. This is also the format that |symfer| uses when loading a model from an external file: >>> model = s.loadhugin('extended-student.net') >>> model.keys() ['C', 'D', 'G', 'I', 'H', 'J', 'L', 'S'] >>> model['L'] Multinom(dom=[{'L': ['l0', 'l1']}, {'G': ['g1', 'g2', 'g3']}],par=[0.1, 0.9, 0.4, 0.6, 0.99, 0.01]) .. seealso:: The :mod:`io ` module contains more functions for loading and saving files. The ``model`` loaded here doesn't contain an explicit definition of its probability distribution; all inference algorithms implicitly assume that it is the product of the factors. If you would need the joint probability distribution, it is easy enough to define it yourself: >>> joint = s.I().product(*model.values()) >>> s.evaluate(joint.sumto([])) Multinom(dom=[],par=[1.0]) In the second line, we check whether it sums to 1 (and according to the output, it does). This is already an example of probabilistic inference, albeit in a very crude way: here, |symfer| constructs the whole joint distribution, then sums all values. For the reason discussed above, this only works for very small models. Normally you should use an :ref:`inference algorithm `. To understand how inference algorithms work in |symfer|, read about :ref:`factor algebra`.