Simple prediction using linear regression with python

data2 = pd.DataFrame(data1['kwh']) data2 kwh date 2012-04-12 14:56:50 1.256400 2012-04-12 15:11:55 1.430750 2012-04-12 15:27:01 1.369910 2012-04-12 15:42:06 1.359350 2012-04-12 15:57:10 1.305680 2012-04-12 16:12:10 1.287750 2012-04-12 16:27:14 1.245970 2012-04-12 16:42:19 1.282280 2012-04-12 16:57:24 1.365710 2012-04-12 17:12:28 1.320130 2012-04-12 17:27:33 1.354890 2012-04-12 17:42:37 1.343680 2012-04-12 17:57:41 1.314220 2012-04-12 18:12:44 1.311970 2012-04-12 18:27:46 1.338980 2012-04-12 18:42:51 1.357370 2012-04-12 18:57:54 1.328700 2012-04-12 19:12:58 1.308200 2012-04-12 19:28:01 1.341770 2012-04-12 19:43:04 1.278350 2012-04-12 19:58:07 1.253170 2012-04-12 20:13:10 1.420670 2012-04-12 20:28:15 1.292740 2012-04-12 20:43:15 1.322840 2012-04-12 20:58:18 1.247410 2012-04-12 21:13:20 0.568352 2012-04-12 21:28:22 0.317865 2012-04-12 21:43:24 0.233603 2012-04-12 21:58:27 0.229524 2012-04-12 22:13:29 0.236929 2012-04-12 22:28:34 0.233806 2012-04-12 22:43:38 0.235618 2012-04-12 22:58:43 0.229858 2012-04-12 23:13:43 0.235132 2012-04-12 23:28:46 0.231863 2012-04-12 23:43:55 0.237794 2012-04-12 23:59:00 0.229634 2012-04-13 00:14:02 0.234484 2012-04-13 00:29:05 0.234189 2012-04-13 00:44:09 0.237213 2012-04-13 00:59:09 0.230483 2012-04-13 01:14:10 0.234982 2012-04-13 01:29:11 0.237121 2012-04-13 01:44:16 0.230910 2012-04-13 01:59:22 0.238406 2012-04-13 02:14:21 0.250530 2012-04-13 02:29:24 0.283575 2012-04-13 02:44:24 0.302299 2012-04-13 02:59:25 0.322093 2012-04-13 03:14:30 0.327600 2012-04-13 03:29:31 0.324368 2012-04-13 03:44:31 0.301869 2012-04-13 03:59:42 0.322019 2012-04-13 04:14:43 0.325328 2012-04-13 04:29:43 0.306727 2012-04-13 04:44:46 0.299012 2012-04-13 04:59:47 0.303288 2012-04-13 05:14:48 0.326205 2012-04-13 05:29:49 0.344230 2012-04-13 05:44:50 0.353484 ... 65701 rows × 1 columns 

I have this dataframe with this index and 1 column. I want to make a simple prediction using linear regression with sklearn. I am very confused and I do not know how to set X and y (I want x values ​​to be time and y values ​​kwh ...). I am new to Python, so every help is valuable. Thanks.

+5
source share
5 answers

The first thing you need to do is split your data into two arrays X and Y. Each element of X will be a date, and the corresponding element of y will be associated with kwh.

After that, you will want to use sklearn.linear_model.LinearRegression to perform the regression. The documentation is here .

As with every sklearn model, there are two steps. You must match your data first. Then specify the dates at which you want to predict kwh in another array, X_predict and predict kwh using the prediction method.

 from sklearn.linear_model import LinearRegression X = [] # put your dates in here y = [] # put your kwh in here model = LinearRegression() model.fit(X, y) X_predict = [] # put the dates of which you want to predict kwh here y_predict = model.predict(X_predict) 
+15
source

The Predict () function takes a 2 dimensional array as arguments. So, if u wants to predict the value for a simple linear regression, then you should give the prediction value within 2 dimensional arrays, for example,

model.predict ([[2012-04-13 05:55:30]]);

If this is multiple linear regression, then

model.predict ([[2012-04-13 05: 44: 50,0.327433]])

+1
source

Regression Liner:

 import pandas as pd import numpy as np import matplotlib.pyplot as plt data=pd.read_csv('Salary_Data.csv') X=data.iloc[:,:-1].values y=data.iloc[:,1].values #split dataset in train and testing set from sklearn.cross_validation import train_test_split X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0) from sklearn.linear_model import LinearRegression regressor=LinearRegression() regressor.fit(X_train,Y_train) y_pre=regressor.predict(X_test) 
0
source

You can take a look at my code on Github where I predict temperature using twittering of a cricket insect with a simple linear regression model. I explained the code with comments

 #Import the libraries required import numpy as np import matplotlib.pyplot as plt import pandas as pd #Importing the excel data dataset = pd.read_excel('D:\MachineLearing\Machine Learning AZ Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls') x = dataset.iloc[:, :-1].values y = dataset.iloc[:, 1].values #Split the data into train and test dataset from sklearn.cross_validation import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42) #Fitting Simple Linear regression data model to train data set from sklearn.linear_model import LinearRegression regressorObject=LinearRegression() regressorObject.fit(x_train,y_train) #predict the test set y_pred_test_data=regressorObject.predict(x_test) # Visualising the Training set results in a scatter plot plt.scatter(x_train, y_train, color = 'red') plt.plot(x_train, regressorObject.predict(x_train), color = 'blue') plt.title('Cricket Chirps vs Temperature (Training set)') plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ') plt.ylabel('Temperature (in degrees Fahrenheit)') plt.show() # Visualising the test set results in a scatter plot plt.scatter(x_test, y_test, color = 'red') plt.plot(x_train, regressorObject.predict(x_train), color = 'blue') plt.title('Cricket Chirps vs Temperature (Test set)') plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ') plt.ylabel('Temperature (in degrees Fahrenheit)') plt.show() 

For more information, please visit

https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-

0
source

You must implement the following code.

 import pandas as pd from sklearn.linear_model import LinearRegression # to build linear regression model from sklearn.cross_validation import train_test_split # to split dataset data2 = pd.DataFrame(data1['kwh']) data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now. X = data2.iloc[:,0] # date column y = data2.iloc[:,-1] # kwh column Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20) linearModel = LinearRegression() linearModel.fit(Xtrain, ytrain) ypred = model.predict(Xtest) 

here ypred will give you the probabilities.

0
source

All Articles