D1.c Pattern or feature prediction using machine learning

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed by human. It usually refers to the changes in systems that perform tasks associated with artificial intelligence (AI) [Nilsson 1996]. One of the most popular machine learning algorithms is the Support Vector Machine algorithm (SVM). It is a set of supervised learning methods for classification, regression and outliers detection. According to the scikit-learn documentation, major advantages of the SVM are that it is: i) effective in high dimensional spaces, ii) still effective even if the number of dimensions is greater than the number of samples, and iii) memory efficient. Besides, it is particularly well-suited for small- or medium-sized datasets [Geron2017]. Common steps of performing predictions employing the SVM are: a) building a database containing a number of samples previously observed; b) determining input features which are physically related to the to be predicted variable; c) finding optimal SVM parameters; d) building a prediction engine with determined input features and optimal parameters; and e) finally, performing prediction using the established engine. Two major aspects within this project were focused on: A1. Taking advantage of numerous observations of different solar patterns and activities (e.g., emergence of Active Regions, Solar Cycle, flares eruption, and employing the modern ML algorithms we developed fast and accurate predictions of such solar activities and patterns. A2. Building prediction engines, providing well-designed open-source Python examples and user interfaces (UI), yielding the community a tool to perform their own applications for future predictions.

Tool name: PUMA (Prediction Using Machine Learning)
Developers: J. Liu, C. Nelson, R. von Fay-Siebenburgen
Main Contact: Robertus (Robertus@sheffield.ac.uk)
Basic description: Predict patterns of features using DKIST data
Language: Python/SunPy
Resource needed to use: laptop, desktop
Host location: Sheffield
Current status: Fully developed
6-month plan to availability: Complete full documentation of underlying algorithm
Status of documentation: The concept and the testing of algorithm under review. Source code is available upon request, so is the pdf of the submitted paper.
Test status: Tested on CME arrival.
How to reference tool in publication: Cite associated scientific paper in ApJ by Liu et al. (2017) and this award (SP2RC/Sheffield)