
Recent Posts
Recent Comments
Archives
Categories
Meta
Category Archives: statistics
Starting Probabilistic Document Retrieval
I want to work through some papers on probabilistic document retrieval mainly to find out the state of things in this area with regards to the depth of infiltration of generative models in this domain. Note that literature refers to … Continue reading
Posted in modeling, statistics
Tagged documents, modeling, retrieval, statistics, topic
Leave a comment
Reservoir Sampling
If you want to uniformly sample a handful of elements from a very large stream of data you probably don’t want to read it all into memory first. It would be ideal if you could sample while streaming the data. … Continue reading
Regressionguided Generative Models
A generative model is pretty pointless on its own unless the generative structure itself holds intrinsic interest. Hence, papers justify their generative models either by comparing its predictive performance against another model or by extending the model to accommodate for … Continue reading
Topic Coherence
Evaluating unsupervised topic models is tricky business. If the resulting model is not employed in retrieval, classification, or regression there really is no way of convincing someone of the model’s worth. You may, rightly, say that there is no use … Continue reading
Starting PartofSpeech Tagging
This is by no means the latest on the subject of probabilistic partofspeech tagging of documents but nevertheless provides a good starting point to look at the basic model along with training and testing data. This paper [1] takes a … Continue reading
Adding (more Relaxed) Constraints during Model Inference
In the previous post on posterior regularization we saw how to specify constraints during the step of expectation maximization that would otherwise be difficult to incorporate into the model itself. The constraints took the following form where we specified our … Continue reading
Adding Constraints during Model Inference
Coming up with a probabilistic model and its inference procedure is only half the work because it’s well known that just a single run of the inference procedure is hardly likely to give you a satisfactory answer. Out of the … Continue reading