Forecasting on Sample Super Store Data set using Python – Pandas & ARMA

Sales Forecasting in Python using ARMA and Pandas on Sample Super Store Data Set

In this blog we will forecast sales in SampleSuper Store Data set. The forecasting algorithm we will use is Moving Average. For this we will use statsmodel library of python.
For Pre Data Processing we will use Pandas Library of Python

So, the first is to read Sample – Super store Data from a pre stored Excel File.

We will use monthly average sale as our aggregating standard

In [1]:
from statsmodels.tsa.arima_model import ARMA
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_excel('C:\\My\\Video\\sales_forecast\\Sample - Superstore.xls',header=0)
data
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
Out[1]:
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City Postal Code Region Product ID Category Sub-Category Product Name Sales Quantity Discount Profit
0 1 CA-2017-152156 2017-11-08 2017-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson 42420.0 South FUR-BO-10001798 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2 0.00 41.9136
1 2 CA-2017-152156 2017-11-08 2017-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson 42420.0 South FUR-CH-10000454 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,… 731.9400 3 0.00 219.5820
2 3 CA-2017-138688 2017-06-12 2017-06-16 Second Class DV-13045 Darrin Van Huff Corporate United States Los Angeles 90036.0 West OFF-LA-10000240 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b… 14.6200 2 0.00 6.8714
3 4 US-2016-108966 2016-10-11 2016-10-18 Standard Class SO-20335 Sean O’Donnell Consumer United States Fort Lauderdale 33311.0 South FUR-TA-10000577 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5 0.45 -383.0310
4 5 US-2016-108966 2016-10-11 2016-10-18 Standard Class SO-20335 Sean O’Donnell Consumer United States Fort Lauderdale 33311.0 South OFF-ST-10000760 Office Supplies Storage Eldon Fold ‘N Roll Cart System 22.3680 2 0.20 2.5164
5 6 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West FUR-FU-10001487 Furniture Furnishings Eldon Expressions Wood and Plastic Desk Access… 48.8600 7 0.00 14.1694
6 7 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West OFF-AR-10002833 Office Supplies Art Newell 322 7.2800 4 0.00 1.9656
7 8 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West TEC-PH-10002275 Technology Phones Mitel 5320 IP Phone VoIP phone 907.1520 6 0.20 90.7152
8 9 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West OFF-BI-10003910 Office Supplies Binders DXL Angle-View Binders with Locking Rings by S… 18.5040 3 0.20 5.7825
9 10 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West OFF-AP-10002892 Office Supplies Appliances Belkin F5C206VTEL 6 Outlet Surge 114.9000 5 0.00 34.4700
10 11 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West FUR-TA-10001539 Furniture Tables Chromcraft Rectangular Conference Tables 1706.1840 9 0.20 85.3092
11 12 CA-2015-115812 2015-06-09 2015-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles 90032.0 West TEC-PH-10002033 Technology Phones Konftel 250 Conference phone – Charcoal black 911.4240 4 0.20 68.3568
12 13 CA-2018-114412 2018-04-15 2018-04-20 Standard Class AA-10480 Andrew Allen Consumer United States Concord 28027.0 South OFF-PA-10002365 Office Supplies Paper Xerox 1967 15.5520 3 0.20 5.4432
13 14 CA-2017-161389 2017-12-05 2017-12-10 Standard Class IM-15070 Irene Maddox Consumer United States Seattle 98103.0 West OFF-BI-10003656 Office Supplies Binders Fellowes PB200 Plastic Comb Binding Machine 407.9760 3 0.20 132.5922
14 15 US-2016-118983 2016-11-22 2016-11-26 Standard Class HP-14815 Harold Pawlan Home Office United States Fort Worth 76106.0 Central OFF-AP-10002311 Office Supplies Appliances Holmes Replacement Filter for HEPA Air Cleaner… 68.8100 5 0.80 -123.8580
15 16 US-2016-118983 2016-11-22 2016-11-26 Standard Class HP-14815 Harold Pawlan Home Office United States Fort Worth 76106.0 Central OFF-BI-10000756 Office Supplies Binders Storex DuraTech Recycled Plastic Frosted Binders 2.5440 3 0.80 -3.8160
16 17 CA-2015-105893 2015-11-11 2015-11-18 Standard Class PK-19075 Pete Kriz Consumer United States Madison 53711.0 Central OFF-ST-10004186 Office Supplies Storage Stur-D-Stor Shelving, Vertical 5-Shelf: 72″H x… 665.8800 6 0.00 13.3176
17 18 CA-2015-167164 2015-05-13 2015-05-15 Second Class AG-10270 Alejandro Grove Consumer United States West Jordan 84084.0 West OFF-ST-10000107 Office Supplies Storage Fellowes Super Stor/Drawer 55.5000 2 0.00 9.9900
18 19 CA-2015-143336 2015-08-27 2015-09-01 Second Class ZD-21925 Zuschuss Donatelli Consumer United States San Francisco 94109.0 West OFF-AR-10003056 Office Supplies Art Newell 341 8.5600 2 0.00 2.4824
19 20 CA-2015-143336 2015-08-27 2015-09-01 Second Class ZD-21925 Zuschuss Donatelli Consumer United States San Francisco 94109.0 West TEC-PH-10001949 Technology Phones Cisco SPA 501G IP Phone 213.4800 3 0.20 16.0110
20 21 CA-2015-143336 2015-08-27 2015-09-01 Second Class ZD-21925 Zuschuss Donatelli Consumer United States San Francisco 94109.0 West OFF-BI-10002215 Office Supplies Binders Wilson Jones Hanging View Binder, White, 1″ 22.7200 4 0.20 7.3840
21 22 CA-2017-137330 2017-12-09 2017-12-13 Standard Class KB-16585 Ken Black Corporate United States Fremont 68025.0 Central OFF-AR-10000246 Office Supplies Art Newell 318 19.4600 7 0.00 5.0596
22 23 CA-2017-137330 2017-12-09 2017-12-13 Standard Class KB-16585 Ken Black Corporate United States Fremont 68025.0 Central OFF-AP-10001492 Office Supplies Appliances Acco Six-Outlet Power Strip, 4′ Cord Length 60.3400 7 0.00 15.6884
23 24 US-2018-156909 2018-07-16 2018-07-18 Second Class SF-20065 Sandra Flanagan Consumer United States Philadelphia 19140.0 East FUR-CH-10002774 Furniture Chairs Global Deluxe Stacking Chair, Gray 71.3720 2 0.30 -1.0196
24 25 CA-2016-106320 2016-09-25 2016-09-30 Standard Class EB-13870 Emily Burns Consumer United States Orem 84057.0 West FUR-TA-10000577 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 1044.6300 3 0.00 240.2649
25 26 CA-2017-121755 2017-01-16 2017-01-20 Second Class EH-13945 Eric Hoffmann Consumer United States Los Angeles 90049.0 West OFF-BI-10001634 Office Supplies Binders Wilson Jones Active Use Binders 11.6480 2 0.20 4.2224
26 27 CA-2017-121755 2017-01-16 2017-01-20 Second Class EH-13945 Eric Hoffmann Consumer United States Los Angeles 90049.0 West TEC-AC-10003027 Technology Accessories Imation 8GB Mini TravelDrive USB 2.0 Flash Drive 90.5700 3 0.00 11.7741
27 28 US-2016-150630 2016-09-17 2016-09-21 Standard Class TB-21520 Tracy Blumstein Consumer United States Philadelphia 19140.0 East FUR-BO-10004834 Furniture Bookcases Riverside Palais Royal Lawyers Bookcase, Royal… 3083.4300 7 0.50 -1665.0522
28 29 US-2016-150630 2016-09-17 2016-09-21 Standard Class TB-21520 Tracy Blumstein Consumer United States Philadelphia 19140.0 East OFF-BI-10000474 Office Supplies Binders Avery Recycled Flexi-View Covers for Binding S… 9.6180 2 0.70 -7.0532
29 30 US-2016-150630 2016-09-17 2016-09-21 Standard Class TB-21520 Tracy Blumstein Consumer United States Philadelphia 19140.0 East FUR-FU-10004848 Furniture Furnishings Howard Miller 13-3/4″ Diameter Brushed Chrome … 124.2000 3 0.20 15.5250
9964 9965 CA-2017-146374 2017-12-05 2017-12-10 Second Class HE-14800 Harold Engle Corporate United States Newark 19711.0 East FUR-FU-10002671 Furniture Furnishings Electrix 20W Halogen Replacement Bulb for Zoom… 13.4000 1 0.00 6.4320
9965 9966 CA-2017-146374 2017-12-05 2017-12-10 Second Class HE-14800 Harold Engle Corporate United States Newark 19711.0 East OFF-PA-10000349 Office Supplies Paper Easy-staple paper 4.9800 1 0.00 2.3406
9966 9967 CA-2017-146374 2017-12-05 2017-12-10 Second Class HE-14800 Harold Engle Corporate United States Newark 19711.0 East OFF-EN-10004483 Office Supplies Envelopes #10 White Business Envelopes,4 1/8 x 9 1/2 109.6900 7 0.00 51.5543
9967 9968 CA-2018-153871 2018-12-11 2018-12-17 Standard Class RB-19435 Richard Bierner Consumer United States Plainfield 7060.0 East OFF-BI-10004209 Office Supplies Binders Fellowes Twister Kit, Gray/Clear, 3/pkg 40.2000 5 0.00 18.0900
9968 9969 CA-2018-153871 2018-12-11 2018-12-17 Standard Class RB-19435 Richard Bierner Consumer United States Plainfield 7060.0 East OFF-BI-10004600 Office Supplies Binders Ibico Ibimaster 300 Manual Binding System 735.9800 2 0.00 331.1910
9969 9970 CA-2018-153871 2018-12-11 2018-12-17 Standard Class RB-19435 Richard Bierner Consumer United States Plainfield 7060.0 East OFF-AP-10003622 Office Supplies Appliances Bravo II Megaboss 12-Amp Hard Body Upright, Re… 22.7500 7 0.00 6.5975
9970 9971 CA-2016-103772 2016-06-28 2016-07-02 Standard Class MP-17470 Mark Packer Home Office United States Smyrna 30080.0 South OFF-BI-10002867 Office Supplies Binders GBC Recycled Regency Composition Covers 119.5600 2 0.00 54.9976
9971 9972 CA-2016-103772 2016-06-28 2016-07-02 Standard Class MP-17470 Mark Packer Home Office United States Smyrna 30080.0 South OFF-AR-10000538 Office Supplies Art Boston Model 1800 Electric Pencil Sharpener, Gray 140.7500 5 0.00 42.2250
9972 9973 CA-2017-130225 2017-09-11 2017-09-17 Standard Class RC-19960 Ryan Crowe Consumer United States Houston 77041.0 Central OFF-EN-10000056 Office Supplies Envelopes Cameo Buff Policy Envelopes 99.5680 2 0.20 33.6042
9973 9974 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West TEC-PH-10004080 Technology Phones Avaya 5410 Digital phone 271.9600 5 0.20 27.1960
9974 9975 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West OFF-AR-10004752 Office Supplies Art Blackstonian Pencils 18.6900 7 0.00 5.2332
9975 9976 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West OFF-PA-10000743 Office Supplies Paper Xerox 1977 13.3600 2 0.00 6.4128
9976 9977 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West TEC-PH-10002496 Technology Phones Cisco SPA301 249.5840 2 0.20 31.1980
9977 9978 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West OFF-FA-10003467 Office Supplies Fasteners Alliance Big Bands Rubber Bands, 12/Pack 13.8600 7 0.00 0.0000
9978 9979 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West OFF-BI-10003727 Office Supplies Binders Avery Durable Slant Ring Binders With Label Ho… 13.3760 4 0.20 4.6816
9979 9980 US-2017-103674 2017-12-06 2017-12-10 Standard Class AP-10720 Anne Pryor Home Office United States Los Angeles 90032.0 West OFF-BI-10002026 Office Supplies Binders Ibico Recycled Linen-Style Covers 437.4720 14 0.20 153.1152
9980 9981 US-2016-151435 2016-09-06 2016-09-09 Second Class SW-20455 Shaun Weien Consumer United States Lafayette 70506.0 South FUR-TA-10001039 Furniture Tables KI Adjustable-Height Table 85.9800 1 0.00 22.3548
9981 9982 CA-2018-163566 2018-08-03 2018-08-06 First Class TB-21055 Ted Butterfield Consumer United States Fairfield 45014.0 East OFF-LA-10004484 Office Supplies Labels Avery 476 16.5200 5 0.20 5.3690
9982 9983 US-2017-157728 2017-09-22 2017-09-28 Standard Class RC-19960 Ryan Crowe Consumer United States Grand Rapids 49505.0 Central OFF-PA-10002195 Office Supplies Paper RSVP Cards & Envelopes, Blank White, 8-1/2″ X … 35.5600 7 0.00 16.7132
9983 9984 US-2017-157728 2017-09-22 2017-09-28 Standard Class RC-19960 Ryan Crowe Consumer United States Grand Rapids 49505.0 Central TEC-PH-10001305 Technology Phones Panasonic KX TS208W Corded phone 97.9800 2 0.00 27.4344
9984 9985 CA-2016-100251 2016-05-17 2016-05-23 Standard Class DV-13465 Dianna Vittorini Consumer United States Long Beach 11561.0 East OFF-LA-10003766 Office Supplies Labels Self-Adhesive Removable Labels 31.5000 10 0.00 15.1200
9985 9986 CA-2016-100251 2016-05-17 2016-05-23 Standard Class DV-13465 Dianna Vittorini Consumer United States Long Beach 11561.0 East OFF-SU-10000898 Office Supplies Supplies Acme Hot Forged Carbon Steel Scissors with Nic… 55.6000 4 0.00 16.1240
9986 9987 CA-2017-125794 2017-09-29 2017-10-03 Standard Class ML-17410 Maris LaWare Consumer United States Los Angeles 90008.0 West TEC-AC-10003399 Technology Accessories Memorex Mini Travel Drive 64 GB USB 2.0 Flash … 36.2400 1 0.00 15.2208
9987 9988 CA-2018-163629 2018-11-17 2018-11-21 Standard Class RA-19885 Ruben Ausman Corporate United States Athens 30605.0 South TEC-AC-10001539 Technology Accessories Logitech G430 Surround Sound Gaming Headset wi… 79.9900 1 0.00 28.7964
9988 9989 CA-2018-163629 2018-11-17 2018-11-21 Standard Class RA-19885 Ruben Ausman Corporate United States Athens 30605.0 South TEC-PH-10004006 Technology Phones Panasonic KX – TS880B Telephone 206.1000 5 0.00 55.6470
9989 9990 CA-2015-110422 2015-01-21 2015-01-23 Second Class TB-21400 Tom Boeckenhauer Consumer United States Miami 33180.0 South FUR-FU-10001889 Furniture Furnishings Ultra Door Pull Handle 25.2480 3 0.20 4.1028
9990 9991 CA-2018-121258 2018-02-26 2018-03-03 Standard Class DB-13060 Dave Brooks Consumer United States Costa Mesa 92627.0 West FUR-FU-10000747 Furniture Furnishings Tenex B1-RE Series Chair Mats for Low Pile Car… 91.9600 2 0.00 15.6332
9991 9992 CA-2018-121258 2018-02-26 2018-03-03 Standard Class DB-13060 Dave Brooks Consumer United States Costa Mesa 92627.0 West TEC-PH-10003645 Technology Phones Aastra 57i VoIP phone 258.5760 2 0.20 19.3932
9992 9993 CA-2018-121258 2018-02-26 2018-03-03 Standard Class DB-13060 Dave Brooks Consumer United States Costa Mesa 92627.0 West OFF-PA-10004041 Office Supplies Paper It’s Hot Message Books with Stickers, 2 3/4″ x 5″ 29.6000 4 0.00 13.3200
9993 9994 CA-2018-119914 2018-05-04 2018-05-09 Second Class CC-12220 Chris Cortes Consumer United States Westminster 92683.0 West OFF-AP-10002684 Office Supplies Appliances Acco 7-Outlet Masterpiece Power Center, Wihtou… 243.1600 2 0.00 72.9480

9994 rows × 21 columns

Now in this data set we have data for the year 2015, 2016, 2017 & 2018.
So, we will use the data from 2015 to 2017 as our training Data Set. And we will preidct 2018 Sales and compare it with Actual Sales value.

In this data set there are many products with different product categories. So, considering that each product category has different consumer behaviour and different seasonality effect we will select products for a particular category only.

So, to built a forecasting model we would only require below three columns :

(1) Order Date — > In order to compute the month and Year of transaction

(2) Cateogory — > To select transactions for only particular cateogory

(3) Sales — > Actual Value of sales

In [2]:
sales_data=data[['Order Date','Category','Sales']]
sales_data
Out[2]:
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Order Date Category Sales
0 2017-11-08 Furniture 261.9600
1 2017-11-08 Furniture 731.9400
2 2017-06-12 Office Supplies 14.6200
3 2016-10-11 Furniture 957.5775
4 2016-10-11 Office Supplies 22.3680
5 2015-06-09 Furniture 48.8600
6 2015-06-09 Office Supplies 7.2800
7 2015-06-09 Technology 907.1520
8 2015-06-09 Office Supplies 18.5040
9 2015-06-09 Office Supplies 114.9000
10 2015-06-09 Furniture 1706.1840
11 2015-06-09 Technology 911.4240
12 2018-04-15 Office Supplies 15.5520
13 2017-12-05 Office Supplies 407.9760
14 2016-11-22 Office Supplies 68.8100
15 2016-11-22 Office Supplies 2.5440
16 2015-11-11 Office Supplies 665.8800
17 2015-05-13 Office Supplies 55.5000
18 2015-08-27 Office Supplies 8.5600
19 2015-08-27 Technology 213.4800
20 2015-08-27 Office Supplies 22.7200
21 2017-12-09 Office Supplies 19.4600
22 2017-12-09 Office Supplies 60.3400
23 2018-07-16 Furniture 71.3720
24 2016-09-25 Furniture 1044.6300
25 2017-01-16 Office Supplies 11.6480
26 2017-01-16 Technology 90.5700
27 2016-09-17 Furniture 3083.4300
28 2016-09-17 Office Supplies 9.6180
29 2016-09-17 Furniture 124.2000
9964 2017-12-05 Furniture 13.4000
9965 2017-12-05 Office Supplies 4.9800
9966 2017-12-05 Office Supplies 109.6900
9967 2018-12-11 Office Supplies 40.2000
9968 2018-12-11 Office Supplies 735.9800
9969 2018-12-11 Office Supplies 22.7500
9970 2016-06-28 Office Supplies 119.5600
9971 2016-06-28 Office Supplies 140.7500
9972 2017-09-11 Office Supplies 99.5680
9973 2017-12-06 Technology 271.9600
9974 2017-12-06 Office Supplies 18.6900
9975 2017-12-06 Office Supplies 13.3600
9976 2017-12-06 Technology 249.5840
9977 2017-12-06 Office Supplies 13.8600
9978 2017-12-06 Office Supplies 13.3760
9979 2017-12-06 Office Supplies 437.4720
9980 2016-09-06 Furniture 85.9800
9981 2018-08-03 Office Supplies 16.5200
9982 2017-09-22 Office Supplies 35.5600
9983 2017-09-22 Technology 97.9800
9984 2016-05-17 Office Supplies 31.5000
9985 2016-05-17 Office Supplies 55.6000
9986 2017-09-29 Technology 36.2400
9987 2018-11-17 Technology 79.9900
9988 2018-11-17 Technology 206.1000
9989 2015-01-21 Furniture 25.2480
9990 2018-02-26 Furniture 91.9600
9991 2018-02-26 Technology 258.5760
9992 2018-02-26 Office Supplies 29.6000
9993 2018-05-04 Office Supplies 243.1600

9994 rows × 3 columns

Now using Data Frame we will select data where the value of Category is “Technology”

In [4]:
tech_sales_data = sales_data[(sales_data['Category'] == "Technology")]
tech_sales_data
Out[4]:
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Order Date Category Sales
7 2015-06-09 Technology 907.152
11 2015-06-09 Technology 911.424
19 2015-08-27 Technology 213.480
26 2017-01-16 Technology 90.570
35 2017-12-08 Technology 1097.544
40 2016-12-27 Technology 371.168
41 2018-09-10 Technology 147.168
44 2017-03-11 Technology 45.980
47 2017-06-20 Technology 45.000
48 2017-06-20 Technology 21.800
54 2017-12-11 Technology 1029.950
59 2017-06-17 Technology 30.000
62 2016-11-24 Technology 13.980
68 2015-12-05 Technology 167.968
86 2018-10-26 Technology 19.990
90 2017-09-17 Technology 73.584
100 2017-08-29 Technology 95.976
103 2016-11-13 Technology 238.896
106 2018-11-23 Technology 74.112
107 2018-11-23 Technology 27.992
109 2016-10-15 Technology 339.960
123 2017-06-12 Technology 68.040
130 2018-02-02 Technology 59.970
147 2017-12-05 Technology 384.450
148 2017-12-05 Technology 149.970
152 2017-03-13 Technology 203.184
159 2017-11-20 Technology 944.930
161 2016-12-28 Technology 54.384
165 2015-09-08 Technology 8159.952
170 2015-09-08 Technology 143.976
9839 2017-09-25 Technology 479.976
9840 2017-09-25 Technology 44.736
9860 2018-01-14 Technology 67.800
9862 2018-01-14 Technology 12.120
9864 2018-01-14 Technology 159.800
9878 2018-04-20 Technology 122.382
9882 2015-08-12 Technology 272.610
9884 2015-04-03 Technology 62.310
9885 2015-04-03 Technology 159.980
9888 2016-08-08 Technology 79.990
9891 2017-07-25 Technology 124.250
9892 2017-07-25 Technology 297.550
9895 2015-09-26 Technology 153.824
9904 2015-11-12 Technology 127.984
9922 2018-05-06 Technology 191.968
9926 2016-03-20 Technology 199.980
9929 2017-09-04 Technology 2799.960
9933 2015-07-11 Technology 164.850
9941 2018-11-24 Technology 223.580
9946 2015-03-02 Technology 151.200
9949 2018-06-01 Technology 83.970
9950 2018-06-01 Technology 39.890
9960 2018-09-16 Technology 18.000
9973 2017-12-06 Technology 271.960
9976 2017-12-06 Technology 249.584
9983 2017-09-22 Technology 97.980
9986 2017-09-29 Technology 36.240
9987 2018-11-17 Technology 79.990
9988 2018-11-17 Technology 206.100
9991 2018-02-26 Technology 258.576

1847 rows × 3 columns

Now, we will create three new column in Pandas Data Frame :

(1) Date – Contains Orde Date in datetime format form which Month & Year can be fetched

(2) year – Fetched from Date Column created above

(3) month – Fetched from Date column created above

In [5]:
tech_sales_data['date']=pd.to_datetime(tech_sales_data['Order Date'])
tech_sales_data['year'],tech_sales_data['month'] = tech_sales_data['date'].dt.year , tech_sales_data['date'].dt.month
tech_sales_data
C:\Sony\Downloads\WPy-3670\python-3.6.7.amd64\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
C:\Sony\Downloads\WPy-3670\python-3.6.7.amd64\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
Out[5]:
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Order Date Category Sales date year month
7 2015-06-09 Technology 907.152 2015-06-09 2015 6
11 2015-06-09 Technology 911.424 2015-06-09 2015 6
19 2015-08-27 Technology 213.480 2015-08-27 2015 8
26 2017-01-16 Technology 90.570 2017-01-16 2017 1
35 2017-12-08 Technology 1097.544 2017-12-08 2017 12
40 2016-12-27 Technology 371.168 2016-12-27 2016 12
41 2018-09-10 Technology 147.168 2018-09-10 2018 9
44 2017-03-11 Technology 45.980 2017-03-11 2017 3
47 2017-06-20 Technology 45.000 2017-06-20 2017 6
48 2017-06-20 Technology 21.800 2017-06-20 2017 6
54 2017-12-11 Technology 1029.950 2017-12-11 2017 12
59 2017-06-17 Technology 30.000 2017-06-17 2017 6
62 2016-11-24 Technology 13.980 2016-11-24 2016 11
68 2015-12-05 Technology 167.968 2015-12-05 2015 12
86 2018-10-26 Technology 19.990 2018-10-26 2018 10
90 2017-09-17 Technology 73.584 2017-09-17 2017 9
100 2017-08-29 Technology 95.976 2017-08-29 2017 8
103 2016-11-13 Technology 238.896 2016-11-13 2016 11
106 2018-11-23 Technology 74.112 2018-11-23 2018 11
107 2018-11-23 Technology 27.992 2018-11-23 2018 11
109 2016-10-15 Technology 339.960 2016-10-15 2016 10
123 2017-06-12 Technology 68.040 2017-06-12 2017 6
130 2018-02-02 Technology 59.970 2018-02-02 2018 2
147 2017-12-05 Technology 384.450 2017-12-05 2017 12
148 2017-12-05 Technology 149.970 2017-12-05 2017 12
152 2017-03-13 Technology 203.184 2017-03-13 2017 3
159 2017-11-20 Technology 944.930 2017-11-20 2017 11
161 2016-12-28 Technology 54.384 2016-12-28 2016 12
165 2015-09-08 Technology 8159.952 2015-09-08 2015 9
170 2015-09-08 Technology 143.976 2015-09-08 2015 9
9839 2017-09-25 Technology 479.976 2017-09-25 2017 9
9840 2017-09-25 Technology 44.736 2017-09-25 2017 9
9860 2018-01-14 Technology 67.800 2018-01-14 2018 1
9862 2018-01-14 Technology 12.120 2018-01-14 2018 1
9864 2018-01-14 Technology 159.800 2018-01-14 2018 1
9878 2018-04-20 Technology 122.382 2018-04-20 2018 4
9882 2015-08-12 Technology 272.610 2015-08-12 2015 8
9884 2015-04-03 Technology 62.310 2015-04-03 2015 4
9885 2015-04-03 Technology 159.980 2015-04-03 2015 4
9888 2016-08-08 Technology 79.990 2016-08-08 2016 8
9891 2017-07-25 Technology 124.250 2017-07-25 2017 7
9892 2017-07-25 Technology 297.550 2017-07-25 2017 7
9895 2015-09-26 Technology 153.824 2015-09-26 2015 9
9904 2015-11-12 Technology 127.984 2015-11-12 2015 11
9922 2018-05-06 Technology 191.968 2018-05-06 2018 5
9926 2016-03-20 Technology 199.980 2016-03-20 2016 3
9929 2017-09-04 Technology 2799.960 2017-09-04 2017 9
9933 2015-07-11 Technology 164.850 2015-07-11 2015 7
9941 2018-11-24 Technology 223.580 2018-11-24 2018 11
9946 2015-03-02 Technology 151.200 2015-03-02 2015 3
9949 2018-06-01 Technology 83.970 2018-06-01 2018 6
9950 2018-06-01 Technology 39.890 2018-06-01 2018 6
9960 2018-09-16 Technology 18.000 2018-09-16 2018 9
9973 2017-12-06 Technology 271.960 2017-12-06 2017 12
9976 2017-12-06 Technology 249.584 2017-12-06 2017 12
9983 2017-09-22 Technology 97.980 2017-09-22 2017 9
9986 2017-09-29 Technology 36.240 2017-09-29 2017 9
9987 2018-11-17 Technology 79.990 2018-11-17 2018 11
9988 2018-11-17 Technology 206.100 2018-11-17 2018 11
9991 2018-02-26 Technology 258.576 2018-02-26 2018 2

1847 rows × 6 columns

We will built a model to forecast monthly average sales of a product.

So, in data preparation we will group data based on year and month and aggregate it using mean() function on Pandas Data Frame

After aggregating we will select the data exclusing the year 2018 to get the training Data set

In [8]:
mean_sales=tech_sales_data.groupby(['year', 'month'],as_index=False).mean()
means_sales_train = mean_sales[(mean_sales['year']!=2018)]
means_sales_train
Out[8]:
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

year month Sales
0 2015 1 349.254444
1 2015 2 201.063750
2 2015 3 1477.780636
3 2015 4 353.670538
4 2015 5 355.550963
5 2015 6 401.712619
6 2015 7 285.861929
7 2015 8 341.098519
8 2015 9 663.860261
9 2015 10 397.933933
10 2015 11 495.105049
11 2015 12 417.863840
12 2016 1 420.486727
13 2016 2 246.355000
14 2016 3 369.425786
15 2016 4 301.647351
16 2016 5 529.227273
17 2016 6 238.346889
18 2016 7 471.407545
19 2016 8 500.794581
20 2016 9 396.190167
21 2016 10 261.094878
22 2016 11 391.370508
23 2016 12 531.821313
24 2017 1 255.457545
25 2017 2 766.182125
26 2017 3 582.914919
27 2017 4 676.841000
28 2017 5 686.492643
29 2017 6 454.782000
30 2017 7 390.264706
31 2017 8 261.416270
32 2017 9 457.668120
33 2017 10 1087.357724
34 2017 11 371.795329
35 2017 12 365.946918

In order to feed the traniang Data set in to ARMA function imported from statsmodel library we need to convert it in to python list

There are two parameters passed to ARMA function. First is tranining Data set and second is order. Here order is taken as (0,10) based on our training Data.

Model is then fitted or trained.

yhat is a varaible on which predictions or forecasts are stored. In the preict fucniton we will pass on two indices based on which the number of future forecasts are made. Like in below code the difference between two indices plus one will give the number of forecats. In below code 12 future forecats will be made

In [9]:
sales = means_sales_train['Sales'].tolist()
model = ARMA(sales, order=(0, 10))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(sales), len(sales)+11)
print(yhat)
[550.56875682 333.18444765 491.0191603  736.02698125 255.98230972
 323.15082923 399.21656013 467.50835595 478.7777673  405.95991681
 463.35587301 463.35587301]
C:\Sony\Downloads\WPy-3670\python-3.6.7.amd64\lib\site-packages\statsmodels\base\model.py:488: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
  'available', HessianInversionWarning)

Now we have yhat where the forecated sales is stored.

In the varibale actual we will store the sales for the year 2018 which we can compare with forecasted sales value.

Since we have 12 predicted values so the varibale x_axis will have value for x_axis where we will plot actual and forecasted sales.

We will use matplotlib library to plot actual vs forecasted sales.

In [10]:
actual = mean_sales[(mean_sales['year']==2018)]['Sales'].tolist()
x_axis = [1,2,3,4,5,6,7,8,9,10,11,12]
plt.plot(x_axis,actual,color='blue')
plt.plot(x_axis,yhat,color='green')
Out[10]:
[<matplotlib.lines.Line2D at 0x1b1b8c94358>]