Paper 1#

Paper title: Detection of Alfvén Eigenmodes on COMPASS with Generative Neural Networks

Link: https://www.tandfonline.com/doi/full/10.1080/15361055.2020.1820805

  1. Introduction This paper addresses the challenge of automating the detection of chirping Alfvén eigenmodes (AEs) observed on the COMPASS tokamak. These modes are believed to be driven by runaway electrons (REs) and are important for studying the nonlinear interaction between REs and electromagnetic instabilities, including RE mitigation and loss. Currently, detection relies on manual analysis of spectrograms from magnetic probes, a process that is labor-intensive due to the rarity of these events. The authors explore the use of machine learning techniques, specifically generative neural networks, to automate this detection process using both a small manually labeled database and a larger unlabeled database of COMPASS experiments.

Key Ideas:

Problem: Manual detection of rare chirping Alfvén eigenmodes (AEs) in tokamak spectrograms is inefficient and requires significant effort. Significance of AEs: AEs driven by runaway electrons (REs) offer a unique opportunity to study RE physics, crucial for RE mitigation and understanding their losses in fusion reactors. AEs can also serve as a diagnostic tool for plasma equilibrium parameters. COMPASS Observations: Chirping AEs on COMPASS occur in the frequency range of 0.5 to 2 MHz, exhibit a bursty character, often appear after sawtooth crashes with increased RE losses, and their frequency can chirp up or down by approximately 0.1 MHz within about 1 ms. Proposed Solution: Employ machine learning, specifically generative neural networks (based on Variational Autoencoders - VAEs), to automate the detection of these chirping AEs in spectrograms. Approach: Train models using a small labeled dataset and a large unlabeled dataset to overcome the limitations of scarce labeled data. Quotes:

“Chirping Alfvén eigenmodes were observed at the COMPASS tokamak. They are believed to be driven by runaway electrons (REs), and as such, they provide a unique opportunity to study the physics of nonlinear interaction between REs and electromagnetic instabilities, including important topics of RE mitigation and losses.” “So far, their detection has required much manual effort since they occur rarely.” “We strive to automate this process using machine learning techniques based on generative neural networks.”

  1. Chirping Modes on the COMPASS Tokamak The paper explains that chirping modes on COMPASS are indirectly observed in spectrograms of a magnetic U-probe. Spectrograms cover a 0.3-second experiment and a frequency range of 0 to 2.5 MHz. Due to the varying lengths of experiments and the large size of full spectrograms, the data is preprocessed into fixed-size square patches (128x128 pixels) for labeling, training, and validation. These patches represent 6.54 ms in time and 0.62 MHz in frequency, a size deemed sufficient to capture most chirping modes. Patches manually labeled as containing a chirping mode are considered positively labeled data.

Key Ideas:

Observation Method: Chirping AEs are identified through spectrograms derived from magnetic U-probe signals on the COMPASS tokamak. Data Preprocessing: Spectrograms are divided into 128x128 pixel patches to create a uniform input size for the neural networks. Labeling: Patches are manually labeled by experts as either containing (positive label) or not containing (negative label) a chirping mode. Quote:

“Chirping modes on the COMPASS tokamak can be observed indirectly in spectrograms of a magnetic U-probe.” “Therefore, for labeling, training, and validation purposes, we have split spectrograms into square patches of the same size. We have chosen the size of a single spectrogram patch to be 128� 128 pixels, which covers 6.54 ms in the time axis and 0.62 MHz in the frequency axis and which is a feasible input size for current convolutional neural network architectures. It is also enough to capture most of a typical chirping mode as can be seen in Fig. 1d.”

  1. Model Structure The authors implemented two main approaches based on generative neural networks, specifically Variational Autoencoders (VAEs):

3.1 One-Class Model:

This approach treats the detection problem as anomaly or outlier detection. The model learns a representation of one class of data (either spectrogram patches with chirping modes or patches without them). During testing, the model computes the negative log-likelihood of a new sample under the learned generative distribution or the Mean Squared Error (MSE) of its reconstruction. High values of these metrics indicate potential out-of-class samples (i.e., chirping modes if the model was trained on non-chirping data, and vice-versa). Training can be done with either labeled data of the class of interest or with unlabeled data (assuming the target anomaly is rare). The latter approach leverages the large unlabeled database. 3.2 Two-Stage Model:

This model aims to utilize both labeled and unlabeled data more effectively. Stage 1: A convolutional generative autoencoder (VAE or its variants using different divergence measures like MMD and JSD with VampPrior) is trained on the large unlabeled dataset. This stage learns a low-dimensional representation of the general topology of the input space. Stage 2: A classifier (k-Nearest Neighbors - kNN or Gaussian Mixture Model - GMM) is trained on the encoded labeled data (the latent space representation learned in the first stage). By using MMD or JSD and VampPrior in the first stage, the authors aim to enforce separation of the encoded data into clusters, making the classification task in the second stage easier. Key Ideas:

Generative Autoencoders (VAEs): Used as powerful estimators of high-dimensional distributions and for learning low-dimensional representations. One-Class Approach: Trains a model on one type of data (e.g., non-AE spectrograms) to identify the other (AE spectrograms) as anomalies based on reconstruction error or likelihood. Two-Stage Approach: Separates feature learning (using unlabeled data in an autoencoder) from classification (using labeled data on the learned features). Divergence Measures: Kullback-Leibler Divergence (KLD), Maximum Mean Discrepancy (MMD), and Jensen-Shannon Divergence (JSD) are explored within the VAE framework to optimize the latent space representation. VampPrior: Used as a flexible prior distribution in VAEs, allowing the model to learn the structure of the latent space more effectively. Quotes:

On VAEs: “Generative models based on the variational autoencoder (VAE) paradigm have been used because they are powerful estimators of high-dimensional distributions, suitable for modeling image data. They do not require labels for training, which is a limiting factor for classification neural networks that overfit when not supplied with enough labeled data.” On One-Class Model: “A generative autoencoder can be readily used for this task if we set pθðxÞ � pðxÞ to be the distribution of the class of our interest. Then, there are two modes of training the autoencoder. In the first mode, we model the distribution of patches that contain a chirping mode… In the second mode, we can choose the class of interest to be of the patches that do not contain a chirping mode. This is closer to an anomaly detection formulation of the pro-blem…” On Two-Stage Model: “The second model is designed to make the most use of both labeled and unlabeled data. It exploits the ability of generative autoencoders to produce a low-dimensional uncorrelated representation of high-dimensional image data. It consists of two stages. The first stage is a convolutional generative autoencoder trained with unlabeled data. Its task is to learn the general topology of the input space and encode input data. The second stage is a classifier that is trained on encoded labeled data.”

  1. Experimental Setup The experimental setup involved using 40 preprocessed spectrograms from which 370 non-overlapping 128x128 pixel patches were extracted and manually labeled. This formed the labeled training dataset. Additionally, a larger unlabeled database of 330,000 patches from 2000 spectrograms was created.

For the one-class model, training was performed both on a subset of positively labeled spectrograms and on the large unlabeled dataset. Cross-validation was used with ten different train/test splits.

For the two-stage model, the labeled data was split into 80/20 training/testing subsets, repeated ten times for cross-validation.

The paper details the architectures of the encoders and decoders used in both models, involving convolutional layers, maxpooling, dense layers, transposed convolutions, and upscaling. Residual blocks (ResNets) and batch normalization were also employed. Hyperparameter optimization was performed for both models, with parameters like the dimensionality of the latent space (d), the scaling parameter of the IMQ kernel (γ), regularization weights (λ), the number of prior components (N), the number of neighbors (k) in kNN, and the number of GMM components (M) being tuned. The models were implemented in Julia and trained on an NVIDIA Titan V GPU.

Key Ideas:

Datasets: A small manually labeled dataset and a large unlabeled dataset of spectrogram patches were used. Training Strategies: Different training approaches were used for the one-class (supervised on positive/negative labels, unsupervised) and two-stage (unsupervised pre-training followed by supervised classification) models. Cross-Validation: A ten-fold cross-validation strategy was employed to evaluate the performance and robustness of the models. Network Architectures: Convolutional autoencoders with specific layer configurations, activation functions (ReLU), and optimization algorithms (RMSProp) were utilized. Hyperparameter Optimization: A range of hyperparameters were tuned to find the best performing models. Quote:

“Every spectrogram was divided into patches of size 128� 128 pixels. Out of 40 preprocessed spectrograms, 370 nonoverlapping patches were extracted and labeled. This results in a labeled training data set Xl; Yf g;Xl ¼ xif gi; xi 2 R 128�128�1; Y ¼ yif g; yi 2 0; 1f g of samples Xl and labels Y , where Y ¼ 1 if a patch contains a chirping mode. Also, an unlabeled database Xu of 330 000 patches coming from 2000 spectrograms was created.”

  1. Results The performance of the models was evaluated using the Area Under the Receiver Operating Characteristic curve (AUC) and precision@k score (precision at the k-highest scoring samples), as these metrics are relevant for ranking potential chirping mode events for manual inspection.

5.1 One-Class Model Optimization:

Modeling the distribution of non-Alfvén (no chirping mode) spectrograms generally yielded better results than modeling Alfvén spectrograms, except for the KLD-based VAE which failed for the Alfvén target class. A plain autoencoder (MSE loss) achieved comparable performance to more complex VAE variants in some cases. Precision in the top samples was generally low for the Alfvén target class. 5.2 Two-Stage Model:

The two-stage model generally outperformed the one-class model. The kNN classifier in the second stage was superior to the GMM classifier. Using MMD regularization in the first-stage autoencoder seemed to produce the best results, possibly due to its ability to create a well-separated latent space. The authors demonstrated that the autoencoding step is beneficial for overcoming the curse of dimensionality and improving the performance of the subsequent classifier compared to directly applying kNN to the high-dimensional input space. 5.3 Influence of Train/Test Splitting Methodology:

The study highlighted the importance of splitting data at the spectrogram level rather than the patch level to avoid overoptimistic performance estimates. Splitting at the patch level led to higher and less realistic AUC values, indicating that patches within the same spectrogram are more similar than those from different spectrograms. Key Ideas:

Evaluation Metrics: AUC and precision@k were used to assess the model performance in identifying chirping modes. Model Comparison: The two-stage model generally performed better than the one-class model. Importance of Latent Space Learning: The autoencoder in the two-stage model helps to reduce dimensionality and learn meaningful features, improving classification performance. Data Splitting Bias: Splitting data at the patch level can lead to biased and overly optimistic performance evaluations due to correlations within spectrograms. Quotes:

On Evaluation: “Because of this, we evaluate the model performance by computing the area under the receiver operating curve (AUC), which is a standard measure for binary classification problems and also by precision@k score, which is the precision at the k-highest scoring samples.” On Two-Stage Superiority: “…it has been shown that both models are viable options in chirping mode identifica-tion, although the latter one proved to be superior.” On Data Splitting: “This indicates that the posi-tively labeled patches in a single spectrogram are much more similar to each other than to those in different spectrograms, as only a relatively low num-ber of neighbors are sufficient for optimal perfor-mance. Also, the variance of Fig. 5a is much higher, again indicating larger differences across spectro-grams. If we continued with the splitting on the level of patches, we would have a biased and too optimistic estimate of performance before putting the framework into production.” 6. Conclusion The paper concludes that both the one-class and two-stage models based on generative autoencoders offer viable options for the automated detection of chirping Alfvén eigenmodes in tokamak spectrograms. However, the two-stage model, which combines unsupervised feature learning with supervised classification on the latent space, demonstrated superior performance.

The authors also emphasize the critical importance of proper cross-validation splitting at the spectrogram level for obtaining realistic performance estimates and the need for careful evaluation using metrics relevant to the operational context (e.g., precision at the top-ranked samples).

Future work includes exploring more appropriate evaluation measures, incorporating data from other magnetic diagnostics, and expanding the labeled dataset for a more thorough evaluation and improved applicability to COMPASS operation.

Key Ideas:

Viability of Generative Models: Generative autoencoders show promise for automating the detection of rare events like chirping AEs. Superiority of Two-Stage Approach: Separating feature learning and classification seems to be a more effective strategy for this task. Importance of Realistic Evaluation: Proper data splitting and relevant evaluation metrics are crucial for assessing the real-world utility of the models. Future Directions: Include more data sources, refine evaluation methods, and expand the labeled dataset for further improvement. Quote:

“Our task was identification of anomalous phenomena, i.e., chirping AEs, in graphical representations of signals measured during the operation of a tokamak. To this end, we have proposed two models based on generative autoenco-ders. The first model learned the distribution of normal data and identified chirping modes as out-of-class samples of this distribution. The second model implemented a two-stage learning approach. A regularized convolutional VAE trained on unlabeled data was successfully combined with a classifier trained with a smaller labeled data set. It has been shown that both models are viable options in chirping mode identifica-tion, although the latter one proved to be superior.” “We have also shown the need for proper cross- validation splitting of data in the evaluation phase and out-lined the need for careful evaluation in order for the model to be useful in real-world application.”