2.1.0
User Documentation for Apache MADlib
bayes.sql_in File Reference

SQL functions for naive Bayes. More...

Functions

args_and_value_double argmax_transition (args_and_value_double oldmax, integer newkey, float8 newvalue)
 
args_and_value_double argmax_combine (args_and_value_double max1, args_and_value_double max2)
 
integer [] argmax_final (args_and_value_double finalstate)
 
aggregate integer [] argmax (integer key, float8 value)
 
void create_nb_prepared_data_tables (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, integer numAttrs, varchar featureProbsDestName, varchar classPriorsDestName)
 Precompute all class priors and feature probabilities. More...
 
void create_nb_prepared_data_tables (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, varchar numericAttrsColumnIndices, integer numAttrs, varchar featureProbsDestName, varchar numericFeatureStatsDestName, varchar classPriorsDestName)
 
void create_nb_classify_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 Create a view with columns (key, nb_classification) More...
 
void create_nb_classify_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar numericFeatureStatsSource, varchar destName)
 
void create_nb_classify_view (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 
void create_nb_classify_view (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, varchar numericAttrsColumnIndices, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 
void create_nb_probs_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 Create view with columns (key, class, nb_prob) More...
 
void create_nb_probs_view (varchar featureProbsSource, varchar classPriorsSource, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar numericFeatureStatsSource, varchar destName)
 
void create_nb_probs_view (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 
void create_nb_probs_view (varchar trainingSource, varchar trainingClassColumn, varchar trainingAttrColumn, varchar numericAttrsColumnIndices, varchar classifySource, varchar classifyKeyColumn, varchar classifyAttrColumn, integer numAttrs, varchar destName)
 

Detailed Description

Date
January 2011
See also
For a brief introduction to Naive Bayes Classification, see the module description Naive Bayes Classification.

Function Documentation

◆ argmax()

aggregate integer [] argmax ( integer  key,
float8  value 
)

◆ argmax_combine()

args_and_value_double argmax_combine ( args_and_value_double  max1,
args_and_value_double  max2 
)

◆ argmax_final()

integer [] argmax_final ( args_and_value_double  finalstate)

◆ argmax_transition()

args_and_value_double argmax_transition ( args_and_value_double  oldmax,
integer  newkey,
float8  newvalue 
)

◆ create_nb_classify_view() [1/4]

void create_nb_classify_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

The created relation will be

{TABLE|VIEW} destName (key, nb_classification)

where nb_classification is an array containing the most likely class(es) of the record in classifySource identified by key.

Parameters
featureProbsSourceName of table with precomputed feature probabilities, as created with create_nb_prepared_data_tables()
classPriorsSourceName of table with precomputed class priors, as created with create_nb_prepared_data_tables()
classifySourceName of the relation that contains data to be classified
classifyKeyColumnName of column in classifySource that can serve as unique identifier (the key of the source relation)
classifyAttrColumnName of attributes-array column in classifySource
numAttrsNumber of attributes to use for classification
destNameName of the view to create
Note
create_nb_classify_view can be called in an ad-hoc fashion. See Naive Bayes Classification for instructions.
Usage
  1. Create Naive Bayes classifications view:
    SELECT create_nb_classify_view(
        'featureProbsName', 'classPriorsName',
        'classifySource', 'classifyKeyColumn', 'classifyAttrColumn',
        numAttrs, 'destName'
    );
  2. Show Naive Bayes classifications:
    SELECT * FROM destName;

◆ create_nb_classify_view() [2/4]

void create_nb_classify_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  numericFeatureStatsSource,
varchar  destName 
)

◆ create_nb_classify_view() [3/4]

void create_nb_classify_view ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

◆ create_nb_classify_view() [4/4]

void create_nb_classify_view ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
varchar  numericAttrsColumnIndices,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

◆ create_nb_prepared_data_tables() [1/2]

void create_nb_prepared_data_tables ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
integer  numAttrs,
varchar  featureProbsDestName,
varchar  classPriorsDestName 
)

Feature probabilities are stored in a table of format

TABLE featureProbsDestName (
    class INTEGER,
    attr INTEGER,
    value INTEGER,
    cnt INTEGER,
    attr_cnt INTEGER
)

Class priors are stored in a table of format

TABLE classPriorsDestName (
    class INTEGER,
    class_cnt INTEGER,
    all_cnt INTEGER
)
Parameters
trainingSourceName of relation containing the training data
trainingClassColumnName of class column in training data
trainingAttrColumnName of attributes-array column in training data
numAttrsNumber of attributes to use for classification
featureProbsDestNameName of feature-probabilities table to create
classPriorsDestNameName of class-priors table to create
Usage
Precompute feature probabilities and class priors:
SELECT create_nb_prepared_data_tables(
    'trainingSource', 'trainingClassColumn', 'trainingAttrColumn',
    numAttrs, 'featureProbsName', 'classPriorsName'
);

◆ create_nb_prepared_data_tables() [2/2]

void create_nb_prepared_data_tables ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
varchar  numericAttrsColumnIndices,
integer  numAttrs,
varchar  featureProbsDestName,
varchar  numericFeatureStatsDestName,
varchar  classPriorsDestName 
)

◆ create_nb_probs_view() [1/4]

void create_nb_probs_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

The created view will be of the following form:

VIEW destName (
    key ANYTYPE,
    class INTEGER,
    nb_prob FLOAT8
)

where nb_prob is the Naive-Bayes probability that class is the true class of the record in classifySource identified by key.

Parameters
featureProbsSourceName of table with precomputed feature probabilities, as created with create_nb_prepared_data_tables()
classPriorsSourceName of table with precomputed class priors, as created with create_nb_prepared_data_tables()
classifySourceName of the relation that contains data to be classified
classifyKeyColumnName of column in classifySource that can serve as unique identifier (the key of the source relation)
classifyAttrColumnName of attributes-array column in classifySource
numAttrsNumber of attributes to use for classification
destNameName of the view to create
Note
create_nb_probs_view can be called in an ad-hoc fashion. See Naive Bayes Classification for instructions.
Usage
  1. Create Naive Bayes probabilities view:
    SELECT create_nb_probs_view(
        'featureProbsName', 'classPriorsName',
        'classifySource', 'classifyKeyColumn', 'classifyAttrColumn',
        numAttrs, 'destName'
    );
  2. Show Naive Bayes probabilities:
    SELECT * FROM destName;

◆ create_nb_probs_view() [2/4]

void create_nb_probs_view ( varchar  featureProbsSource,
varchar  classPriorsSource,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  numericFeatureStatsSource,
varchar  destName 
)

◆ create_nb_probs_view() [3/4]

void create_nb_probs_view ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)

◆ create_nb_probs_view() [4/4]

void create_nb_probs_view ( varchar  trainingSource,
varchar  trainingClassColumn,
varchar  trainingAttrColumn,
varchar  numericAttrsColumnIndices,
varchar  classifySource,
varchar  classifyKeyColumn,
varchar  classifyAttrColumn,
integer  numAttrs,
varchar  destName 
)