Methods and attributes description

#	Method or Atribute	Scope	Description
1	isFit	PolynomialModel Class	Attribute that defines if the model is already trained or not
2	solutions	PolynomialRegression Class	Array of floats that stores the solutions for the polynomial model
3	fit	PolynomialRegression Class	This method creates the solutions for the polynomial model based in the logic that will be described below
4	xArray	Method fit - PolynomialModel Class	Array parameter of x values that are used to train the model
5	yArray	Method fit - PolynomialRegression Class	Array parameter of y values that are used to train the model
6	degree	Method fit - PolynomialRegression Class	Paramter with the desired degree for the model
7	equationSize	Method fit - PolynomialRegression Class	Attribute that defines the number of equations that are going to be used in the model based in the degree of the model
8	nElements	Method fit - PolynomialRegression Class	Attribute that defines the number columns that are going to be used in the matrix for the model
9	equations	Method fit - PolynomialRegression Class	Matrix that stores the equations and solutions for the polynomial model
10	predict	PolynomialRegression Class	Function that returns the predicted values based on the traning done before by the model
10	xArray	Method predict - PolynomialRegression Class	Parameter with the x Array values to be predicted
11	yArray	Method predict - PolynomialRegression Class	Array with y values predicted by the model
12	calculateR2	Method calculateR2 - PolynomialRegression Class	Method that creates the r^2 error of the trained model
13	getError	Method getError - PolynomialRegression Class	Method that return the r^2 value for the trained model

Logic Used

In order to train the model to get an accurate prediction based in the input degree, it was used several math concepts to get a precise function for the model, in this library it was used the following concepts to develop the solution:

Least Squares
Gauss Jordan

Least Squares

This method was used to build a coefficent matrix for the equation system solution, this matrix is simetric and it is created based in an input degree that we call "m", the logic of least squares is pretty simple, it just consist in creating a matrix that contains a group of equations that increase their degree as we create new equations in the matrix, this matrix is later solved to get the coefficents for the solution of the Regression Model.

n	Σx	Σx^2	Σx^3	...	Σx^m	Σy
Σx	Σx^2	Σx^3	Σx^4	...	Σx^m	Σxy
Σx^2	Σx^3	Σx^4	Σx^5	...	Σx^m	Σx^2*y
...	...	...	...	...	...	...
Σx^m	Σx^m+1	Σx^m+2	Σx^m+3	...	Σx^m+n	Σx^m*y

Gauss Jordan

This method was used to build the solution for the model, once we have the equations matrix to be solved it was used Gauss Jordan to get the coefficent of the equations to be later stored in the array of solutions.

1	2	3	4	...	n	m
0	1	2	3	...	n	m
0	0	1	2	...	n	m
...	...	...	...	...	...	...
0	0	0	0	...	n	m

Usage

The usage of the PolynomialRegression Library is pretty simple you just have to follow the next steps

Import the library

In order to use the library you must import it into your html code, you can find the library in the dist folder as PolynomialRegression.js file

Training data

After importing the library into your html code you need to train the model. In order to train your model you must create 2 training array data that will be used when instanciating the library

Training the model

In orther to train the model you must create an instance of the PolynomialRegression library, then you must call the method fit, in this method you will send the xArray and yArray of training data followed by the degree of training for the model

Prediction

To create a prediction you must create an array of x values to be predicted, to create the prediction you only have to call the function precit and then you send the values to predict to the function

Decision Tree - ID3

Class Diagram

classDiagram DecisionTreeID3 -- NodeTree DecisionTreeID3 -- Feature DecisionTreeID3 -- Atrribute class NodeTree { +id : string = "" +tag : string = "" +value : string = "" +child : NodeTree[] = [] } class Feature { +attribute : string = "" +entropy : Float = -1 +gain : Float = -1 +primaryCount : Integer = 0 +secondaryCount : Integer = 0 +primaryPosibility : string = "" +secondPosibility : string = "" +updateFeature(_posibility : string ) bool +calculateEntropy(_p : Integer, _n: Integer ) Float } class Atrribute { +attribute : string = "" +features : any[] = [] +infoEntropy : = -1 +gain : = -1 +index : = -1 } class DecisionTreeID3 { +dataset : any[] = [] +generalEntropy : float = 0 +primaryCount : float = 0 +secondaryCount : float = 0 +primaryPosibility : string = "" +secondaryPosibility : string = "" +root : NodeTree = null +calculateEntropy(_p : Integer, _n : Integer) Number +train(_dataset : any[], _start : Integer) NodeTree +predict(_predict : any[], _root : NodeTree) NodeTree +recursivePredict(_predict : any[],_node :NodeTree) NodeTree +calculateGeneralEntropy(_dataset : any[], indexResult : Integer) Float +classifierFeatures(_dataset : any[], indexFeature : Integer, indexResult : Integer) Feature[] +calculateInformationEntropy(_features : Feature[]) Number +calculateGain(_generalEntropy : Number, _infoEntropy : Number) Number +selectBestFeature(_attributes : Attribute[]) Integer +generateDotString(_root : NodeTree) string +recursiveDotString(_root : NodeTree, _idParent : string) string }

DecisionTreeID3 Class

Properties

Name	Description
dataset	2-dimensional matrix that contains the data header in the first row and the last column contains the 2 possible classes.
generalEntropy	Stores the overall entropy of the data set.
primaryCount	Stores how many times the first class found appears in the data set.
secondaryCount	Stores how many times the second class found appears in the data set.
primaryPosibility	Stores the first class found in the data set.
secondaryPosibility	Stores the second class found in the data set.
root	It is the root of the tree resulting from training

Methods

Name	Description
calculateEntropy	Receives 2 parameters, the first is how many times the first label appears, the second parameter is how many times the second label appears and returns the calculation of the entropy equation
train	This method is in charge of generating the decision tree through the data set it receives, it returns the root node of the generated tree.
predict	This function classifies the 2xm matrix it receives as a parameter, starts the search from the node it receives as a parameter, returns the node with the class it belongs to or null if it is not able to classify.
recursivePredict	This function is used as an aid to traversing the decision tree.
calculateGeneralEntropy	This function analyzes the received data set, it counts the number of times the first and second class appear, the index parameter that it receives indicates the number of column in which the count must be performed, it makes use of the calculateEntropy function and returns the value of the entropy of the entire data set.
classifierFeatures	This function analyzes the received data set, using the index parameter, separates each data into the corresponding characteristic, and returns a list of characteristics.
calculateInformationEntropy	This function receives a list of characteristics and returns the value of the entropy of the information for the received characteristic.
calculateGain	This function receives the general entropy and the information entropy and returns the value of the profit.
selectBestFeature	This function receives a set of characteristics and returns the index of the characteristic with the highest gain.
generateDotString	This function is used to generate the string in the format that the visjs tool accepts to generate a tree type graph.
recursiveDotString	This function is auxiliary to traverse the tree and generate the string for visjs

Overview

This Machine Learning process works with the ID3 algorithm. It is a simple implementation to classify data matrices with the following characteristics

The data set is a matrix and contain the header.
The result column must be in the matrix last column.
The header is considering as Attribute.
Every data is considering as Feature.

Example Data Set:

[ ["Attr1", "Attr2", "Attr3", "Result"], ["lorem", "lorem", "lorem", "Class1"], ["lorem", "lorem", "lorem", "Class2"], ["lorem", "lorem", "lorem", "Class1"], ["lorem", "lorem", "lorem", "Class2"], ["lorem", "lorem", "lorem", "Class1"], ] Note: Attr#, lorem, Result and Class# can be any string.

Data to Predict:

The data to be predicted consists of a matrix with 2 rows and m columns. The first row will be the header, the second row will consist of the data for decision making.

[ ["Attr1", "Attr2", "Attr3"], ["lorem", "lorem", "lorem"], ] Note: The header must be in the data

Result of predict:

The result of the classification consists of an object of type NodeTree which has in its value attribute the classification of the data entered.

{ childs: [] id: "3eb4d4228163" tag: "Overcast" value: "Yes" }

Usage

Import the Library inside HTML page in a script tag.
Prepare a matrix to use as a data set.
Train the algorithm.
Predict data!.

LinearKMeans Class

Properties

Name	Description
k	Number of clusters in wich the points will be grouped.
data	Array of numbers representing each point on the X axis.
iterations	Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points.

Methods

Name	Parameters	Description
clusterize	k data iterations	This is the main method of the KMeans algorithm, which follows the following steps: k ammount of random points are selected from the data array without repetition. This are the potential clusters. The distance between each point on the data array and each potential cluster is calculated and stored in the same array. Each point gets assigned to the closest (less distance) potential cluster point. The mean and variance of each cluster group is calculated and stored. The points are reasigned to the potential clusters but this time the distance is calculated between the point and the cluster group mean. This step is repeated untill there are no more changes in the cluster groups. The sum of each group variance (total variance) of the potential cluster is stored. Steps 1 through 6 are repeated iterations times The potential cluster who has the lesser total variance is selected as the optimal solution, and send as the returned value.
distance	point_a point_b	This method returns the distance between two points (point_b - point_a)
calculateMeanVariance	arr	This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count.

Example Data

data = [ -99, -92, -89, -87, -83, -82, -78, -76, -70, -62, -57, -55, -50, -42, -35, -33, -32, -30, -27, -17, -12, -10, 0, 1, 2, 25, 29, 33, 39, 41, 53, 54, 67 ] k = 3 iterations = 3

Result of Linear KMeans clustering:

The result is showed on a graphic, each point has a color wich identifies the cluster group it is assigned to. The cluster points are drawn as red dots.

_2DKMeans Class

Properties

Name	Description
k	Number of clusters in wich the points will be grouped.
data	Array of coordinates [x,y] representing each point.
iterations	Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points.

Methods

Name	Parameters	Description
clusterize	k data iterations	This is the main method of the KMeans algorithm, which follows the following steps: k ammount of random points are selected from the data array without repetition. This are the potential clusters. The distance between each point on the data array and each potential cluster is calculated and stored in the same array. Each point gets assigned to the closest (less distance) potential cluster point. The mean and variance of each cluster group is calculated and stored. The points are reasigned to the potential clusters but this time the distance is calculated between the point and the cluster group mean. This step is repeated untill there are no more changes in the cluster groups. The sum of each group variance (total variance) of the potential cluster is stored. Steps 1 through 6 are repeated iterations times The potential cluster who has the lesser total variance is selected as the optimal solution, and send as the returned value.
distance	point_a point_b	This method returns the distance between two points
calculateMeanVariance	arr	This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count.

Example Data

data = [ [11,6], [4,2], [15,0], [10,6], [7,8], [9,12], [13,0], [5,1], [0,13], [7,5], [6,1], [3,6], [0,10], [14,10], [6,14], [6,4], [4,9], [5,14], [9,9], [13,8] ] k = 3 iterations = 10

Result of Linear KMeans clustering:

The result is showed on a graphic, each point has a color wich identifies the cluster group it is assigned to. The cluster points are drawn as red dots.

LinearKMeans Class

Properties

Name	Description
k	Number of clusters in wich the points will be grouped.
data	Array of numbers representing each point on the X axis.
iterations	Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points.

Methods

Name	Parameters	Description
clusterize	k data iterations	This is the main method of the KMeans algorithm, which follows the following steps: k ammount of random points are selected from the data array without repetition. This are the potential clusters. The distance between each point on the data array and each potential cluster is calculated and stored in the same array. Each point gets assigned to the closest (less distance) potential cluster point. The mean and variance of each cluster group is calculated and stored. The points are reasigned to the potential clusters but this time the distance is calculated between the point and the cluster group mean. This step is repeated untill there are no more changes in the cluster groups. The sum of each group variance (total variance) of the potential cluster is stored. Steps 1 through 6 are repeated iterations times The potential cluster who has the lesser total variance is selected as the optimal solution, and send as the returned value.
distance	point_a point_b	This method returns the distance between two points (point_b - point_a)
calculateMeanVariance	arr	This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count.

Bayes Method

classDiagram class MethodBayes { +m : attributes = [] +b : classes = [] +b : frecuencyTables = [] +b : attributesNames = [] +b : className = null +addAttribute(values : [], attributesNames : []) +addClass(values : [], className: String) +train() +probability(attributesName : String, cause: String, effect: String) +predict(causes : [], effect: String) +train() +toFrecuencyTable(values:[]) }

Bayes Class

Properties

Name	Description
attributes	Attributes that the model to evaluate will contain
classes	Classes that the model to evaluate will contain
frecuencyTables	Table containing the data of the probabilities of each event
attributesNames	Stores the name of each attribute registered in the model
className	Stores the value of the last class that was added

Methods

Name	Parameters	Description
addAttribute	values attributeName	Method used to add an attribute to the array of attributes that belong to a specific model. Its values parameter will contain the values to add the attribute and the name to associate it.
addClass	values className	Method used to add a class to the array of classes that belong to a specific model. Its values parameter will contain the values to add the class and the name to associate it.
train		Method used to train our model with its corresponding attributes and classes. This method also makes use of the frequency table to store the probabilities found.
probability	attributeName cause effect	Method used to calculate the probabilities of an attribute using its cause and effect
predict	cause effect	Method used to predict an outcome through a given event. To predict these events, the cause and effect parameters are used.
isModelValid		Method used to validate if a model is valid through its minimum requirements, such as the one that contains classes, attributes, etc.
toFrecuencyTable	values	Method used to store and return the frequency table that contains the probabilities, frequencies and values of the model

Example Data

Result of Method Bayes:

The result is shown below

NaiveBayes

Class Diagram

classDiagram class NaiveBayes { +causes : any[] = [] +insertCause(name : string, array : any[]) +predict(effect : string, events : []]) any[] +getSimpleProbability(event : []) +getConditionalProbability(eventA : [],eventB : []) +getCauseByName(cause_name:string) }

NaiveBayes Class

Properties

Name	Description
causes	Array of the causes to be taken in count for the prediction

Methods

Name	Parameters	Description
insertCause	effect (string) events (array)	This method fills the data usted to make the predictions. Each cause added must have the same events array as the others inserted, otherwise it will throw an error through console . The effect parameter represents the name of the column / cause added and it must be unique The events parameter recieves the data used for the column/cause
predict	effect (string) events (array)	This is the main method that runs the prediction for the data entered through on the parameters. The effect parameter represents the name of the column/cause for which you want to make the prediction The events parameter recieves an array of tuples [column_name, value] for known events. eg. [["name", value],["name", value],["name", value] ...]
getSimpleProbability	event (array)	This method returns the probability that an event ocurrs. Events are represented by tuples with the format ["column_name", value]. In other words it returns the probability of finding "value" on cause "column_name"
getConditionalProbability	event_A (array) event_B (array)	Returns probability of event_A occurring given that event_B occurs. Events are represented by tuples with the format ["column_name", value].
getCauseByName	cause_name (string)	Returns the array of values corresponding to the column/cause identified with the value recieved in the parameter.

Logistic Regression

Class Diagram

classDiagram LogisticModel <|-- LogisticRegression LogisticModel <|-- MultiClassLogistic class LogisticModel { } class LogisticRegression { + alpha: int , 0.001 + lambda: int, 0 + iterations: int, 100 + fit(data) + computeThreshold(X,Y) + grad(X,Y,theta) + h(x_i, theta) + transform(x) + cost(X,Y,theta) } class MultiClassLogistic { + alpha: int , 0.001 + lambda: int, 0 + iterations: int, 100 + fit(data,classes) + transform(x) }

Logistic Regression

Properties

Name	Description
alpha	percentage of certainty
lambda	Number of lambda of the function
iterations	number of iterations that the algorithm must perform

Methods

Name	Parameters	Description
fit	data (matrix) for simple classes (array) for multiclass	This method separates the input matrix into respective arrays of X, Y, and the current value (s). To later carry out the transformation and evaluation of the function The data parameter represents the matrix data to analyze The classes parameter recieves the data array of multiclass
computeThreshold	X (int0) Y (int)	This Method obtains the threshold of the coordinates obtained in X and Y
grad	X (int) Y (int) theta (int)	this method calculates and obtains the gradient with which the function will be evaluated
h	x_i (int) theta (int)	This method evaluates the X coordinate with the theta coefficient in the logistic regression function.
transform	X (int)	This method returns the transformation of the prediction, in this case if it is binary it checks if it is 1 or 0 and if it is multiclass it compares the classes with the array.
cost	X (int) Y (int) theta (int)	The method returns the cost of the function, this is achieved by evaluating the function in the equation of the method h ()

Software specification

Linear Regression

Class Diagram

Polynomial Regression

Class diagram

Methods and attributes description

Logic Used

Least Squares

Gauss Jordan

Usage

Import the library

Training data

Training the model

Prediction

Decision Tree - ID3

Class Diagram

DecisionTreeID3 Class

Properties

Methods

Overview

Example Data Set:

Data to Predict:

Result of predict:

Usage

KMeans - Linear and 2D Data

Class Diagram

LinearKMeans Class

Properties

Methods

Example Data

Result of Linear KMeans clustering:

_2DKMeans Class

Properties

Methods

Example Data

Result of Linear KMeans clustering:

LinearKMeans Class

Properties

Methods

Bayes Method

Bayes Class

Properties

Methods

Example Data

Result of Method Bayes:

NaiveBayes

Class Diagram

NaiveBayes Class

Properties

Methods

Logistic Regression

Class Diagram

Logistic Regression

Properties

Methods