Documentation

Latest User Guide

The primary documentation reference material providing detailed information on the functions and algorithms within MADlib as well as background theory and references into the literature.

Installation Guide

Information on initial installation and deployment of MADlib into a database instance. Includes guides for different installation paths against both Postgres and Pivotal platforms.

Quick Start Guide for Users

Introduction to themes and concepts in MADlib. The guide walks the user through an initial data load, training a model, inspecting a model, and scoring a model.

Quick Start Guide for Developers

For developers who are interested in contributing to MADlib.

Community Portal

Additional material for individuals looking to contribute to the project is available on our community portal.

MADlib YouTube Channel

Videos from community events, meetups and conferences. Also includes step-by-step guides for commonly used algorithms.

Example Use Cases

Linear Regression

Linear regression can be used to model a linear relationship of a scalar dependent variable to one or more explanatory independent variables.

Latent Dirichlet Allocation

Latent Dirichlet Allocation is a topic modeling function used to identify recurring themes in a large document corpus.

Summary

The summary function provides summary statistics for any data table. These statistics include statistics such as: number of distinct values, number of missing values, mean, variance, min, max, most frequent values, quantiles, etc.

Logistic Regression

Logistic regression can be used to predict a binary outcome of a dependent variable from one or more explanatory independent variables.

Elastic Net Regularization

Elastic Net regularization is a regularization technique that can be implemented for either linear or logistic regression to help build a more robust model in the event of large numbers of explanatory independent variables.

Principal Component Analysis

Pricipal Component Analysis is a dimensional reduction technique that can be used to transform a high dimensional space into a lower dimensional space.

Apriori

Apriori, is a technique for evaluating frequent item-sets, which allows analysis of what events tend to occur together. For instance what items customers frequently purchase in a single transaction.

k-Means

k-Means is a clustering method used to identify regions of similarity within a dataset. It can be used for many types of analysis including customer segmentation analysis.