# | Method or Atribute | Scope | Description |
---|---|---|---|
1 | isFit | PolynomialModel Class | Attribute that defines if the model is already trained or not |
2 | solutions | PolynomialRegression Class | Array of floats that stores the solutions for the polynomial model |
3 | fit | PolynomialRegression Class | This method creates the solutions for the polynomial model based in the logic that will be described below |
4 | xArray | Method fit - PolynomialModel Class | Array parameter of x values that are used to train the model |
5 | yArray | Method fit - PolynomialRegression Class | Array parameter of y values that are used to train the model |
6 | degree | Method fit - PolynomialRegression Class | Paramter with the desired degree for the model |
7 | equationSize | Method fit - PolynomialRegression Class | Attribute that defines the number of equations that are going to be used in the model based in the degree of the model |
8 | nElements | Method fit - PolynomialRegression Class | Attribute that defines the number columns that are going to be used in the matrix for the model |
9 | equations | Method fit - PolynomialRegression Class | Matrix that stores the equations and solutions for the polynomial model |
10 | predict | PolynomialRegression Class | Function that returns the predicted values based on the traning done before by the model |
10 | xArray | Method predict - PolynomialRegression Class | Parameter with the x Array values to be predicted |
11 | yArray | Method predict - PolynomialRegression Class | Array with y values predicted by the model |
12 | calculateR2 | Method calculateR2 - PolynomialRegression Class | Method that creates the r^2 error of the trained model |
13 | getError | Method getError - PolynomialRegression Class | Method that return the r^2 value for the trained model |
In order to train the model to get an accurate prediction based in the input degree, it was used several math concepts to get a precise function for the model, in this library it was used the following concepts to develop the solution:
This method was used to build a coefficent matrix for the equation system solution, this matrix is simetric and it is created based in an input degree that we call "m", the logic of least squares is pretty simple, it just consist in creating a matrix that contains a group of equations that increase their degree as we create new equations in the matrix, this matrix is later solved to get the coefficents for the solution of the Regression Model.
n | Σx | Σx^2 | Σx^3 | ... | Σx^m | Σy |
Σx | Σx^2 | Σx^3 | Σx^4 | ... | Σx^m | Σxy |
Σx^2 | Σx^3 | Σx^4 | Σx^5 | ... | Σx^m | Σx^2*y |
... | ... | ... | ... | ... | ... | ... |
Σx^m | Σx^m+1 | Σx^m+2 | Σx^m+3 | ... | Σx^m+n | Σx^m*y |
This method was used to build the solution for the model, once we have the equations matrix to be solved it was used Gauss Jordan to get the coefficent of the equations to be later stored in the array of solutions.
1 | 2 | 3 | 4 | ... | n | m |
0 | 1 | 2 | 3 | ... | n | m |
0 | 0 | 1 | 2 | ... | n | m |
... | ... | ... | ... | ... | ... | ... |
0 | 0 | 0 | 0 | ... | n | m |
The usage of the PolynomialRegression Library is pretty simple you just have to follow the next steps
In order to use the library you must import it into your html code, you can find the library in the dist folder as PolynomialRegression.js file
After importing the library into your html code you need to train the model. In order to train your model you must create 2 training array data that will be used when instanciating the library
In orther to train the model you must create an instance of the PolynomialRegression library, then you must call the method fit, in this method you will send the xArray and yArray of training data followed by the degree of training for the model
To create a prediction you must create an array of x values to be predicted, to create the prediction you only have to call the function precit and then you send the values to predict to the function
Name | Description |
---|---|
dataset | 2-dimensional matrix that contains the data header in the first row and the last column contains the 2 possible classes. |
generalEntropy | Stores the overall entropy of the data set. |
primaryCount | Stores how many times the first class found appears in the data set. |
secondaryCount | Stores how many times the second class found appears in the data set. |
primaryPosibility | Stores the first class found in the data set. |
secondaryPosibility | Stores the second class found in the data set. |
root | It is the root of the tree resulting from training |
Name | Description |
---|---|
calculateEntropy | Receives 2 parameters, the first is how many times the first label appears, the second parameter is how many times the second label appears and returns the calculation of the entropy equation |
train | This method is in charge of generating the decision tree through the data set it receives, it returns the root node of the generated tree. |
predict | This function classifies the 2xm matrix it receives as a parameter, starts the search from the node it receives as a parameter, returns the node with the class it belongs to or null if it is not able to classify. |
recursivePredict | This function is used as an aid to traversing the decision tree. |
calculateGeneralEntropy | This function analyzes the received data set, it counts the number of times the first and second class appear, the index parameter that it receives indicates the number of column in which the count must be performed, it makes use of the calculateEntropy function and returns the value of the entropy of the entire data set. |
classifierFeatures | This function analyzes the received data set, using the index parameter, separates each data into the corresponding characteristic, and returns a list of characteristics. |
calculateInformationEntropy | This function receives a list of characteristics and returns the value of the entropy of the information for the received characteristic. |
calculateGain | This function receives the general entropy and the information entropy and returns the value of the profit. |
selectBestFeature | This function receives a set of characteristics and returns the index of the characteristic with the highest gain. |
generateDotString | This function is used to generate the string in the format that the visjs tool accepts to generate a tree type graph. |
recursiveDotString | This function is auxiliary to traverse the tree and generate the string for visjs |
This Machine Learning process works with the ID3 algorithm. It is a simple implementation to classify data matrices with the following characteristics
[
Note: Attr#, lorem, Result and Class# can be any string.
["Attr1", "Attr2", "Attr3", "Result"],
["lorem", "lorem", "lorem", "Class1"],
["lorem", "lorem", "lorem", "Class2"],
["lorem", "lorem", "lorem", "Class1"],
["lorem", "lorem", "lorem", "Class2"],
["lorem", "lorem", "lorem", "Class1"],
]
The data to be predicted consists of a matrix with 2 rows and m columns. The first row will be the header, the second row will consist of the data for decision making.
[
Note: The header must be in the data
["Attr1", "Attr2", "Attr3"],
["lorem", "lorem", "lorem"],
]
The result of the classification consists of an object of type NodeTree which has in its value attribute the classification of the data entered.
{
childs: []
id: "3eb4d4228163"
tag: "Overcast"
value: "Yes"
}
Name | Description |
---|---|
k | Number of clusters in wich the points will be grouped. |
data | Array of numbers representing each point on the X axis. |
iterations | Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points. |
Name | Parameters | Description |
---|---|---|
clusterize |
|
This is the main method of the KMeans algorithm, which follows the following steps:
|
distance |
|
This method returns the distance between two points (point_b - point_a) |
calculateMeanVariance |
|
This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count. |
data = [
-99,
-92,
-89,
-87,
-83,
-82,
-78,
-76,
-70,
-62,
-57,
-55,
-50,
-42,
-35,
-33,
-32,
-30,
-27,
-17,
-12,
-10,
0,
1,
2,
25,
29,
33,
39,
41,
53,
54,
67
]
k = 3
iterations = 3
The result is showed on a graphic, each point has a color wich identifies the cluster group it is assigned to. The cluster points are drawn as red dots.
Name | Description |
---|---|
k | Number of clusters in wich the points will be grouped. |
data | Array of coordinates [x,y] representing each point. |
iterations | Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points. |
Name | Parameters | Description |
---|---|---|
clusterize |
|
This is the main method of the KMeans algorithm, which follows the following steps:
|
distance |
|
This method returns the distance between two points |
calculateMeanVariance |
|
This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count. |
data = [
[11,6],
[4,2],
[15,0],
[10,6],
[7,8],
[9,12],
[13,0],
[5,1],
[0,13],
[7,5],
[6,1],
[3,6],
[0,10],
[14,10],
[6,14],
[6,4],
[4,9],
[5,14],
[9,9],
[13,8]
]
k = 3
iterations = 10
The result is showed on a graphic, each point has a color wich identifies the cluster group it is assigned to. The cluster points are drawn as red dots.
Name | Description |
---|---|
k | Number of clusters in wich the points will be grouped. |
data | Array of numbers representing each point on the X axis. |
iterations | Amount of repetitions that the algorithm will perform. The iterations apply specifically to the randomization of possible cluster points. |
Name | Parameters | Description |
---|---|---|
clusterize |
|
This is the main method of the KMeans algorithm, which follows the following steps:
|
distance |
|
This method returns the distance between two points (point_b - point_a) |
calculateMeanVariance |
|
This method returns the mean and variance of an array of values. The mean of arr is the sum of all the data divided by the item count. The variance of arr is the sum of the substraction of each point and the mean, to the power of 2, divided by de item count. |
Name | Description |
---|---|
attributes | Attributes that the model to evaluate will contain |
classes | Classes that the model to evaluate will contain |
frecuencyTables | Table containing the data of the probabilities of each event |
attributesNames | Stores the name of each attribute registered in the model |
className | Stores the value of the last class that was added |
Name | Parameters | Description |
---|---|---|
addAttribute |
|
Method used to add an attribute to the array of attributes that belong to a specific model. Its values parameter will contain the values to add the attribute and the name to associate it. |
addClass |
|
Method used to add a class to the array of classes that belong to a specific model. Its values parameter will contain the values to add the class and the name to associate it. |
train | Method used to train our model with its corresponding attributes and classes. This method also makes use of the frequency table to store the probabilities found. | |
probability |
|
Method used to calculate the probabilities of an attribute using its cause and effect |
predict |
|
Method used to predict an outcome through a given event. To predict these events, the cause and effect parameters are used. |
isModelValid | Method used to validate if a model is valid through its minimum requirements, such as the one that contains classes, attributes, etc. | |
toFrecuencyTable |
|
Method used to store and return the frequency table that contains the probabilities, frequencies and values of the model |
The result is shown below
Name | Description |
---|---|
causes | Array of the causes to be taken in count for the prediction |
Name | Parameters | Description |
---|---|---|
insertCause |
|
This method fills the data usted to make the predictions. Each cause added must have the
same events array as the others inserted, otherwise it will throw an error through console .
|
predict |
|
This is the main method that runs the prediction for the data entered through on the
parameters.
|
getSimpleProbability |
|
This method returns the probability that an event ocurrs. Events are represented by tuples with the format ["column_name", value]. In other words it returns the probability of finding "value" on cause "column_name" |
getConditionalProbability |
|
Returns probability of event_A occurring given that event_B occurs. Events are represented by tuples with the format ["column_name", value]. |
getCauseByName |
|
Returns the array of values corresponding to the column/cause identified with the value recieved in the parameter. |
Name | Description |
---|---|
alpha | percentage of certainty |
lambda | Number of lambda of the function |
iterations | number of iterations that the algorithm must perform |
Name | Parameters | Description |
---|---|---|
fit |
|
This method separates the input matrix into respective arrays of X, Y, and the current value (s). To later carry out the transformation and evaluation of the function
|
computeThreshold |
|
This Method obtains the threshold of the coordinates obtained in X and Y |
grad |
|
this method calculates and obtains the gradient with which the function will be evaluated |
h |
|
This method evaluates the X coordinate with the theta coefficient in the logistic regression function. |
transform |
|
This method returns the transformation of the prediction, in this case if it is binary it checks if it is 1 or 0 and if it is multiclass it compares the classes with the array. |
cost |
|
The method returns the cost of the function, this is achieved by evaluating the function in the equation of the method h () |