Accurate Forecast Improvement Approach for Short Term Load Forecasting Using Hybrid Filter-Wrap Feature Selection

Accurate hybrid ﬁlter– wrap approach is quite important for short term load forecasting as it not only improve forecasting accuracy performance, but also could effectively avoid converging prematurely. The importance of input selection-features is an essential part to develop models. Currently and dynamic surroundings, energy demand, quantity and values are becoming unpredictable and progressively volatile. Increasing amount of decision-making procedures in the industries in terms of energy require a wide-ranging outlook of the uncertain forthcoming. This paper explains the selection method for the proposed hybrid filter-wrapper whose primary composition includes Personal Modular Impactor (PMI) based filter technique and the Firefly Algorithm (FA) based filter wrapper. The filter wrapper planning technique involves the selection of the best corresponding inputs by a predefined model-free technique that measures the specific relationship between the output selection and the input variable. FA wrapper based technique is more useful compared to the filter procedure. Modular Impactor (MI) is a technique mostly preferred by individuals to measure the dependency of variables and commonly used to select input features and in other key fields.


The Technique for the Proposed Hybrid Filter-Wrapper
This particular part explains the selection method for the proposed hybrid filter-wrapper whose primary composition includes Personal Modular Impactor (PMI) based filter technique and the Firefly Algorithm (FA) based filter wrapper. The filter wrapper planning technique involves the selection of the best corresponding inputs by a predefined model-free technique that measures the specific relationship between the output selection and the input variable. FA wrapper based technique is more useful compared to the filter procedure. Modular Impactor (MI) is a technique mostly preferred by the individuals to measure the dependency of variables and is commonly used to select input features and in other key fields.
There is no assumption in PMI method on the nature of certain relations between the input variables. Thus, MI is often utilized and more accurate in determining the dependency of input variables that possess nonlinear relations compared to estimation (for instance correlation analysis and principal component) that consider the specific lin ear relations between the respective input variables. This resulted in the main issue of redundancy since MI cannot account for a candidate's variable interdependency.
PMI refers to the statistical old version method that is nonlinear in nature and based on partial correlation, even though its primary use has been shown in such fields as water resource analysis, humidity, and rainfall analysis, environmental modeling, and stream flow forecast. According to Bowden et al. (2005), the estimation of probability densities involves the application of the city block distance kernel. The x variable and y variable denote the residual data available in variable x and y considering the effect caused by predictor(s) z. The input variable with the most PMI value is included as a new predictor of the performance of characteristic and feature selection centered on PMI. The following is a brief outline of the procedure for the PMI centered input attribute main choose for our proposed technique.
Step No 1-Let Z represents the number of particular input values that are selected and explained as a null sector at the beginning of the algorithm and Z out represents a set of the input main variables not selected.
The algorithm applies MAPE and FA feature selection to perform its search process through a particular number of individuals referred to as fireflies. These specific fireflies are often updated basically from one step to the other. Each and every firefly relates to a particular attribute subset. In the approach, the search space often modeled to represent a multi-dimensional Boolean region while the nature of the feature selection is largely considered. An individual representation serves as an example in Figure 1. Every firefly in the approach is practically assumed to have a particular attraction to other fireflies no matter their set condition, sex, behaviors and the main pleasing is about their visibility and fireflies light intensity. The attractiveness of a given Firefly is referred to as current minus MAPE. Implying that the miniature MAPE Firefly (that is in relation with specific attribute subset) receives the bigger chance of survival. Firefly movement i to be attracted by another Firefly j that is more attractive can be represented by Eq (1).
Where rand refers arbitrary selected integer generator which is identically selected and equally dole out [0, 1], and represent the main transfer function of the hyperbolic tangent sigmoid.

Our Suggested Hybrid Feature Selection Method
The STFL distinctive selection comprises of two objectively. The first objective is to locate the essential features together with the loads that can be able to appropriately improve the future forecasts of the loads. The second objective is determining a minimal group of key attributes with minimum superfluity that can greatly minimize the costs of computation of the entire model procedure without changing the precision of the forecasting process. A high quality main filter-wrapper technique for STLF is advanced to tackle the goals outlined. Firstly, the PMI-based technique is utilized to remove the number of repetitive features as well as irrelevant features to minimize the comple xity witnessed in the Samuel Atuahene, Yukun Bao, Patricia Semwaah Gyan, Yao Yevenyo Ziggah Accurate Forecast Improvement Approach for Short Term Load Forecasting Using Hybrid Filter-Wrap Feature Selection 40 search region. Then, according to the minimized attribute subset attained by the Personal Modular Impactor-based filter technique, a Filter Approach-based wrapper approach is utilized to determine a minor group of characteristics with the forecasted hybrid technique thus takes significant impact of the accuracy of filter method and the usefulness of the wrapper method to improve the advantages of every method. Figure 2 is an illustration of the suggested main hybrid filter-wrapper attribute selection method.

Description of Dataset
To validate the future prediction of the success of the proposed main hybrid filter wrapper feature characteristic selection technique for STLF utilizing three actual datasets, SVR obtained from a prominent electrical utility in North -America and GEFcom 2014, were utilized for validation and verification purposes in this particular study. The primary dat aset involves consideration and analysis of the North-American electric utility relating to the hourly temperature in farads ((°F) and the hourly rate in (MW). The data collection period ranges from January 1, 1986, to October 14, 1994. Four given months (i.e., April, January, October and July) were selected during the year 1991 as the relative testing periods. The specific months selected was based on the representation of winter, autumn, summer, and the changing durations of the specific year and therefore, produce an excellent approach to be used to test the selected method.
Data obtained from the preceding three months then similar particulars obtained in the previous years was utilized as the reference group to determine the predicting selected model. Even though many data used to compute the results was obtained from the historical information, the study found out that including more data to more computational costs and this does not result in the improvement of the forecasting accuracy. For the study to replicate the real world applications/synopsis, the datasets were mainly examined to provide a completely distinct and independent relationship to the given training groups of data and were not used in the learning process.
The selected immediately after the first database amounts to each hour temperatures (in °Farads) obtained from 12 weather stations located in some regions of the United States and hourly loads (in kilowatts) obtained from 21 distinct geographical areas. The initial data was organized for the use of the forecasting load track of GEFcom2013. After the completion of the database, it was published in the appendix found in the Hong et al. (2014) ranging from the period January 1, 2005, to July 7, 2009. The key objective of this main research is to determine the suggested selected filterwrapper attribute selection techniques performance against many well-analyzed and laid down techniques for future load prognostication and prediction instead of contributing in the process of competition. Thus, the study focused on the next day forecast every hour load for the previous month (July 2, 2009, to July 30, 2009) and ignored the task of a particular competition. The research determined that the demand patterns change significantly over the years. Thus, the three months were selected over the last three years to be used as the reference set to assess the prognosticating predicted Samuel Atuahene, Yukun Bao, Patricia Semwaah Gyan, Yao Yevenyo Ziggah Accurate Forecast Improvement Approach for Short Term Load Forecasting Using Hybrid Filter-Wrap Feature Selection 41 approach.
The researchers have to study for the specific weather stations for every specific area before develop ing the model since the corresponding locations of weather base, and this particular dataset is not identified. Referring to Ben & Hyndman (2015), an individualistic selected group of data for validation was utilized to determine the type of station to be used for every region. The main weather camp that possesses the best result was isolated to determine the prognosticating and prediction approach for the month used for the testing process. Some processing procedures and methods were required before modeling the datasets. For instance, various unaccounted load amounts in the original data group were occupied by the average values of the neighboring selected values. There is a significant increase in demand in the year 2009 for zone 11 available in the secondary selected dataset. Utilizing the information from Ben & Hyndman (2015), we calculated the criterion utilized previously and then after the increase, and then the distinction was recorded and eliminate the jump in the year 2009.
Following , some outliers in the process were detected and removed. To avoid scaling issues and unit issues, every input variable in the study was linearly identified to the ultimate range starting from [0, 1] utilizi ng their particular selection of minimum values and maximu m values correspondingly. The forecasting predicted value amounts will be rescaled backward due to the reserves of forecasting performance, and the linear transformation is computed based on the preliminary scaling of the collected -data. In order to develop the required pool of features representing candidates to the prediction approach, by obtaining the electricity utilization load from the beginning set dataset to act as a reference, the study determined some of the features of th e loads selected after every one hour.  days. Nevertheless, it is a very challenging task to forecast on holidays since the typical conditions are rare and distinct from regular workdays and other holidays. In this study, the researchers took the weekends and holidays to be the same for simplicity purposes.
Time (Hour) Figure 4: Loads witnessed after every one hour during each day of a week. Figure 4 portrays the mean hourly-predicted load for the same day for every day for a period of one week. There is a variation in the load from one hour to another due to the consumers' expected behavior. Also, it is very simple to determine that the expected curves available on all working days, without considering Friday, since it possesses the same magnitudes and shapes. This act as proper proof that the working days have higher load demands than the weekends. However, Friday has a load level between levels of weekends and working days. Moreover, the relationship that exists between the temperature and the load demand is portrayed in Figure 5 below, which led to the observation of a nonlinear relationship between them. This is a confirmation that the temperatures are worth applying to the input variables.
Temperature ( 0 F) Figure 5: Relationship between temperature and load. Putting into consideration the short run behavior and each day periodicity features of each one hour loads, this study selects each hour load's amounts of the preceding 12 hours and same hours in the last week as the prime inputs integers of a certain prediction model. There was an addition of temperature variables every period region for which the test load was added with the predicted temperature for a particular forecasted hour. The average measured temperature value on a given hour and date within the previous five years was utilized to represent the predicted temperature. Many, who participated in GEFCom2014, for instance, Ben & Hyndman (2015) and Charlton, used the average temperature to represent the predict in their approaches. Besides of the scenario, each hour and daily events indicators were utilized and used to codify the hours of a day as well as seven days per week. The can didate set of inputs variables to be utilized to predict the value of the load L (t) can be summarized in the diagram below in Eq.
Where L (T -i) means the lag behind load of ( t -i); where T (t -i) indicates the previous temperature of t -i; T (t) b representing the condition having the previous period of 0, which represent the predicted.
The temperature for a certain predicting hour; and HI (t), DI (t), signify each hour and daily event results consecutively. DI (t) results to -2 for most of the weekends and set public, days set 0 for Thursday, and 1 for all days for working with the exception for Thursday. HI (t) amounts to 1, 3, 4, 5, 6… 25 for the equivalent hours. In total, there are (11+6) + (12+6) + 2=41 main inputs in the set. Based on the same predictions, the input variables pool for the load numbers in the following dataset group (except region 8) is referred and shown by Eq.
(2) at the time of modeling the load series representing load 9. Region 9 of the second set of data contains extremely different patterns of demand that do not seem to be corresponding to the values of the temperature. Thus, the researchers excluded the information on the temperature Eq.
(2) at the period of changing the load series located in Region 9.

Completion Metrics
This particular study proves the forecasting model performance by utilizing the MAPE in Eq.
(3) and MASE in Eq. (4) respectively. These can be defined as: Widely used standard of measurement, which measures the difference in percentage between the predictable and estimated standards. In which N denotes the prediction horizon, Lt+i is the is the actual value at period t + i , and 20 predicting value at period t+ i. As proposed in this article, the next day (24 hours) the short-term predicted load was predicted repetitively, therefore, numerous prediction periods N is equal to 24. The MAPE one of the most extensively used standard of measurement that measures the given percentage difference between the expected and estimated values.
The small values obtained in MAPE integers represent the closest predictions to the real integers. Thus, the MASE refers to a reduced error which is particularly scaled by a unique predictive model. The value is often less than one particularly if the prediction is much better compared to the other technique, and the lesser the MASE, the best the results of the prediction relating to the unique technique. The MASE is often the most highly advised measure since it is less responsive to be used by the outliers and simple to interpret.

Identified Counterparts and their Implementations
There are five other feature selection techniques selected to prove the be nefits that result from the proposed wrapper technique for STL utilizing SVR and used as reference points for purposes of comparison. These counterparts and the approach proposed are written and abbreviated as shown below: (1) Full-The SVR prediction and predicting model with all the required inputs.
(3) Two Stage-SVR approach with two-stage attribute selection.
(4) PMI-SVR method applied composed of the PMI based technique for actual characteristic -identification.
(6) H-FW-SVR approach with the suggested hybrid filter-wrapper procedure for attribute selection. SVR has been continuously applied as a forecasting modeler in all the methods mentioned above. One essential feature in SVR training and implementation involves the setting of a certain hyperparameters and kernel functions. This experiment identified the selected reference function to represent the kennel function after the main experimentation. The efficiency of SVR is not a big concern since the data sequence is not lengthy. Therefore, we conduct the matrix search to the parameters C, and Utilization of SVR was evaluated on the period of one -day ahead prediction. The multiple stages of head forecasts were determined via recursion.
There is a note to be taken on the unit variable to be used to predict the other hour load although, there are several studies that investigated the strategies used for many steps ahead predicting the research is of no concern to us since we utilized recursion strategy only in the study. All the experiments conducted during the study were carried out in MATLAB 2014 utilizing the computer with the features Compaq Core 4 Duo Central Processing Unit T6850, 6.00 Gigahertz, and 4 GB Random Access Memory. The difference of the 6 SVR methods was the difference in the methods used. The values containing 'MI' and 'Two Stage' were specifically s group based on their initial preferences.
The program loading number indicated in PMI was specifically located to twenty. The population size of filter parameters was initially at 31, attractiveness 001, coefficient 121, and stopping procedure set as indicated; some changes amount to 151, or there is the lack of advancement in the fitness for almost twenty respective selected steps. Finally, we did not include each of the models 12 times, and the results of improvement mean reported.  '), and MAPE of the relating ('MAPE') as portrayed. The value with that has the smallest number is written boldly. As indicated in the table, although all the values of imputing dataset were used for developing the predicting model, the results of the forecasting step of full is worst, with higher MAPE compared to others, which necessitates the feature selection of STLF. This is as a result of the irrelevant features that cause the model to overfitting. The other 5 main feature selection techniques involve reduction in dimensionality, and the space models are then obtained without necessarily reducing the prediction accuracy.

Results -The North American Electrical Utility
'MI' and 'Two Stage' generates relatively high amount of inputs among the three filter techniques than 'PMI'. This occurs since MI-based approach removes the unwanted characteristics by calculating the MI value that is between each specific variable and output. This means that there are several repetitive features used in the identified inputs by utilizing 'MI'.

Samuel Atuahene, Yukun Bao, Patricia Semwaah Gyan, Yao Yevenyo Ziggah
Accurate Forecast Improvement Approach for Short Term Load Forecasting Using Hybrid Filter-Wrap Feature Selection 45 'Two Stages' removes features that are repetitive in the next step but it does not consider the already predictive values. On the other hand, 'PMI' obtains essential characteristics (those that are so important and not redundant) stage by stage based on the partial information given. 'FA' has the relatively higher amount of inpu ts compared to the hybrid 'H-FW', and PMI' which is primarily difficulties caused by the procedure in turning the 41-proportional search region.
The mathematical calculation time ('Tfs (min)' shown in Table 1 takes more time due to the presence of the SVR and the search region of the filter which is high dimensional in nature. 'H-FW' can identify the ideal characteristic subset well organized due to the small search region. The more inputs values selected increases the period for forecasting and training ('Ttf (minimu m)') H-FW based model is often higher in terms of other counterparts, but this is allowable for daily conclusion making. Figure 6 illustrates the input subset selection using other feature selection methods comparing the accuracy of the prediction per month. In Table 1, the values of the reduced input sets enabled an interpretation in an intuitive manner. For insta nce, the previous 12 hours involves several recent hour loads (preceding 1-6 hours) that relate to the current loads, are essential for prediction. Instead of utilizing the past various hourly loads, the loads representing the same hour are essential in forecasting, including the lagged loads obtained from the previous 36 and 168 hours. Also, the condition at the one hour if forecasting (T:1) has been predicted as one of the inputs for the 5 attribute selection criterion that portray that Figure.7 presents the average predicted accuracy for every week during the testing period. The results confirm the supremacy stated in the proposed main hybrid-filter wrapper technique based on forecasting method and other techniques in a week. Moreover, it is noted that load forecasts represent higher errors in weekends than in workdays. This is due to particular load patterns of Friday and weekends are the district to normal days, which Figure.4 has also indicated.    1,2,3,4,24,48,72,96,120,144,168 T :0,1,2,3,4,5,6,7,8,9,10,11,12,168

Statistical Test Analysis
Two-stage statistical analyses are utilized to prove if the corresponding outcome acquired by the 6 predicting models on the 4 months that were put to the test shows that Case 2 was significantly different based on the Case 1 and 20 zonal loads. The statistical analysis utilizes the freedman tests and stepwise, both of which involve strong statistical evidence, used for juxtaposition by a combination of the data obtain ed from the two main occurrences. The Friedman inspection is used to calculate the mean rank got by each model beyond both the data, to determine if remarkable differences are available between the predicting approaches based on the values of the mean ranks. Table 2 shows the differentiation of the MASE for various models in relation to Case 2. Table 3 shows the Freadman rank and the p -value of Davenport and Iman test MAPE. The part highlighted in bold shows the best technique with the lowest rank. The table portrays a pvalue of Davenport and Iman test to be lower than the significance level 0.05. This shows the statistical difference in the results of the six models. To determine the statistical importance between the results obtained and the other main perspective, a step wise test, called the Holon test, was utilized at the important stages of 0.2 and 0.06. Table 4 shows the results of the first 4 hypotheses that are prohibited (at the importance levels of 0.05) because their p-value is less than the value of the adjusted amounts (alpha/i). the last approach FA is so small that the result is rejected at 0.01 level.

Conclusion
Attribute selection is an important step in STLF to simplify the interpretation of dataset and process of the learning forecasting model. This study proposes the main high -quality filter-wrapper attribute selection technique to address the issue relating to the characteristics selection issue. The suggested hybrid technique comprises of an FA -based wrapper approach and a PMI filter technique. Firstly, the filter procedure is utilized to do away with the redundant and irrelevant features, to produce an input values. A wrapper technique is applied to the minimized subset to determine the small set of features with large prediction accuracy. Exploratory results have indicated that the approach suggested can be able to identify leases input integers compared to other old methods and it is a more efficient wrapper method.
Thus, the hybrid filter-wrapper technique is an excellent alternative for the process of attribute selection in STLF. Therefore, in this particular study, the most common input values only were greatly considered in the model choosing the procedure. Forthcoming work needs to be done to analyze other essential factors and more lagged factors to improve prediction accuracy. Since the load methods of remarkable days are distinct from regular weekdays, leading to a particula r prediction model for the given special days is worth research attempt. Other areas to be done in future research comprise an extra comprehensive juxtaposition of the approach suggested with other established highest -level models, and in the process apply the proposed technique for future long-term electricity price forecasting and load prediction.