The primary documentation reference material providing detailed information on the functions and algorithms within MADlib as well as background theory and references into the literature.
Information on initial installation and deployment of MADlib into a database instance. Includes guides for different installation paths against both Postgres and Pivotal platforms.
Introduction to themes and concepts in MADlib. The guide walks the user through an initial data load, training a model, inspecting a model, and scoring a model.
For developers who are interested in contributing to MADlib.
Additional material for individuals looking to contribute to the project is available on our community portal.
Videos from community events, meetups and conferences. Also includes step-by-step guides for commonly used algorithms.
Linear regression can be used to model a linear relationship of a scalar dependent variable to one or more explanatory independent variables.
Latent Dirichlet Allocation is a topic modeling function used to identify recurring themes in a large document corpus.
The summary function provides summary statistics for any data table. These statistics include statistics such as: number of distinct values, number of missing values, mean, variance, min, max, most frequent values, quantiles, etc.
Logistic regression can be used to predict a binary outcome of a dependent variable from one or more explanatory independent variables.
Elastic Net regularization is a regularization technique that can be implemented for either linear or logistic regression to help build a more robust model in the event of large numbers of explanatory independent variables.
Pricipal Component Analysis is a dimensional reduction technique that can be used to transform a high dimensional space into a lower dimensional space.
Apriori, is a technique for evaluating frequent item-sets, which allows analysis of what events tend to occur together. For instance what items customers frequently purchase in a single transaction.