TBSSvis: Visual Analytics for Temporal Blind Source Separation

Temporal Blind Source Separation (TBSS) is used to obtain the true underlying processes from noisy temporal multivariate data, such as electrocardiograms. TBSS has similarities to Principal Component Analysis (PCA) as it separates the input data into univariate components and is applicable to suitable datasets from various domains, such as medicine, finance, or civil engineering. Despite TBSS's broad applicability, the involved tasks are not well supported in current tools, which offer only text-based interactions and single static images. Analysts are limited in analyzing and comparing obtained results, which consist of diverse data such as matrices and sets of time series. Additionally, parameter settings have a big impact on separation performance, but as a consequence of improper tooling, analysts currently do not consider the whole parameter space. We propose to solve these problems by applying visual analytics (VA) principles. Our primary contribution is a design study for TBSS, which so far has not been explored by the visualization community. We developed a task abstraction and visualization design in a user-centered design process. Task-specific assembling of well-established visualization techniques and algorithms to gain insights in the TBSS processes is our secondary contribution. We present TBSSvis, an interactive web-based VA prototype, which we evaluated extensively in two interviews with five TBSS experts. Feedback and observations from these interviews show that TBSSvis supports the actual workflow and combination of interactive visualizations that facilitate the tasks involved in analyzing TBSS results.


Introduction
Multivariate measurements of a phenomenon are common in many domains.
Medical doctors place electrodes on a patient's body to analyze processes such as brain activity, eye movements, or heart rhythm. Civil engineers measure vibrations on different parts of a structure, such as a bridge, to detect possible faults. Financial managers invest money in stocks, which are in a way sensors of economic processes, to gain wealth. Common to all these examples is the time-oriented data and the assumption that data from different sensors is in some way correlated and/or influenced by noise. However, analysts are usually only interested in the "true" underlying processes.
To obtain these processes, analysts turn to Blind Source Separation (BSS).
BSS comprises established methods for signal separation that were applied, among others, in the mentioned domains of medicine [1,2,3], civil engineering [4] and finance [5]. Temporal Blind Source Separation (TBSS) refers to a subset of BSS methods that specifically account for temporal correlation. TBSS is similar to Principal Component Analysis (PCA) in the sense that i) TBSS methods work on any multivariate dataset with quantitative variables, ii) they work on measured data only (hence "blind") and iii) separate it into a linear combination of uncorrelated components, like PCA. Unlike PCA, TBSS accounts for temporal correlation and often requires complex tuning parameters. As both TBSS and PCA can be considered forms of dimension reduction, analysts use TBSS and PCA for similar reasons, like data analysis or modeling/prediction. During these activities, it is at some point necessary to inspect components visually. Like with PCA, components are hidden until the separation algorithm is executed, but TBSS's complex parameter space severely complicates the issue: It is known that parameter settings greatly influence the result, but not in which way a change in parameters translates to change in components. Experts regard automated analysis by extensive sampling [6] not a feasible option and there is little guidance from the literature, which parameters to pick. Because a ground truth is rarely available, TBSS analysis is inherently open-ended and exploratory as there are no known insights to confirm. The workflow of TBSS analysts can broadly be described as i) pick a parameter setting, ii) see if obtained components are useful or interesting and if not, go to i).
Some challenges make TBSS difficult to use in practice. Despite the important role of visualization in their workflow, the current tool used by the analysts does not support them well in this regard. Analysts need to manually program static visualizations, which requires time they could otherwise spend on data analysis. Another challenge is the amount of components. Each parametrization on a p-variate dataset yields a set of p components that need inspection and comparison to previous sets. Analysts are, for example, interested in commonly found components, but very quickly confronted with hundreds of components to consider. This is a common task in ensemble visualization [7], but made more difficult by components appearing in sets instead of one by one. Also, when comparing multiple results, analysts will eventually find competing options for their final choice. As there is usually no ground truth available to compare the result to, analysts need detailed ways to compare individual results to make an informed decision.
Visual analytics (VA [8]) as defined by Keim et al. [9] "combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets". Considering the strong focus of BSS analysis on visual inspection on multiple levels of detail, in combination with mentioned challenges, we propose applying VA principles to overcome these. We designed TBSSvis according to Munzner's Nested Model [10] for the TBSS method "generalized Second Order Blind Identification" (gSOBI) [11]. We chose gSOBI because it is recent and well suited to real-world datasets due to its flexibility (see Section 3). The source code of TBSSvis is available at https://github.com/npiccolotto/tbss-vis.
Our primary research contribution is a design study [12] for TBSS, which improves the visualization community's knowledge about an area that it did not explore so far. Specifically we provide: • A task abstraction for TBSS which we obtained through a user-centered visualization design process with TBSS experts (Section 5).
• A VA design for gSOBI, a TBSS method, that supports the abstracted tasks by combining visualizations, interactions, and guidance methods (Section 6).
• Confirmation of the effectiveness of our design in two interviews with five TBSS experts (Section 8).
As part of this design study we put well-established visualization techniques together to support the identified tasks. These, together with a set-aware clustering scheme (Section 6.3), are our secondary contribution. They include a multivariate autocorrelation function plot and the application of a slope graph to sets of time series.

Related Work
In the following we elaborate on different approaches to visualize and compare time series, ensembles, and models.

Time Series Visualization
Temporal data is ubiquitous in many domains such as finance, health, or biology, and has been visualized for centuries since the first line graph was introduced by Playfair [13]. Various other visual encodings have been proposed afterwards, such as tile maps, sparklines, or horizon graphs [14]. They use different visual variables [15] such as position, color, or slope, and therefore exhibit different perceptual properties, which makes them suitable for different analysis tasks. E.g., Gogolou et al. [16] investigated the relation between different time series visualization idioms and perceived similarity. They recommend to use horizon graphs when local variations in temporal position or speed is important, while others (line graph, color band) are better suited for notions of similarity where amplitude is less important.
Many time series can be visualized by juxtaposition, as is the case in Live-RAC [17]. Various system measures (columns) are displayed per machine (rows) in a space-filling table design, using semantic zooming to change the level of detail between color bars, sparklines, and labeled line graphs. When not using all available space, one could use small multiples [13] in different arrangements.
For instance, Stitz et al. [18] arrange small multiples of stocks by price and price change in a user-selected time frame. Liu et al. [19], on the other hand, lay them out with a modified Multidimensional Scaling (MDS) algorithm such that similar items are near each other.
Superimposed encodings trade decreased usage of display space for legibility, as they do not scale well after a couple of variables due to occlusion. An example besides the well known superimposed line graph is the braided graph [20], which superimposes multiple area-based marks. Because of the varying data dimensionality in TBSS, superimposition is generally not a promising strategy.
To keep features of long time series visible, designers often turn to focusand-context techniques such as lenses [21]. In the simplest case, a lens mainly enlarges an area of interest, such as in SignalLens [22]. But more complex interactions are possible, such as in ChronoLenses [23], where users can combine and stack multiple lenses. An alternative to interaction is to reduce the visualized data either in a data-driven [24] or visualization-driven way, e.g., by line simplification [25]. However, if expected features in the data are not known in advance, as is the case in TBSS, one risks that important features are removed.

Ensemble Visualization
The goal in ensemble visualization is to make sense of a set of similar complex data items, such as trajectories, often produced by a simulation with perturbed parameter settings. Component sets obtained from different TBSS parametrizations constitute such an ensemble. Ensemble visualization has its origin in meteorology [26], but since expanded to more domains [7]. Analytic tasks for ensemble data [7] indicate popular strategies, such as comparing members or grouping them by similarity, to support the stated goal. Existing works [27,28] often use popular clustering techniques (with domain-specific distance functions) to support the latter task. This is not straightforward in TBSS as one has to take care to not mix members of different sets into the same cluster. We discuss our approach in Section 6.3.
Time is a common part of ensemble data, but not a requirement [29,30,31,32]. One possible case is when ensemble members are univariate time series, such as for Köthur et al. [33], who encoded the correlation between members in a heatmap to support comparison of two ensembles. More commonly, other data types have an associated time dimension such as multivariate data [34], particle data [27], network security data [35], or spatial data [36].

VA for Construction and Comparison of Models
Sedlmair et al. [6] devise a framework for visual parameter analysis of models, but when applied, analysts still have to make themselves familiar with the solution space. As there is often no a-priori goal, model construction can be characterized as exploratory analysis.
VA has supported the construction of different kinds of models such as linear regression [37], logistic regression [38], time series [39], dimension reduction [40], or classification [41] and also related tasks like variable selection [42].
Reducing available models to a set of candidates and choosing the final model are tasks that require comparisons of models. The final choice depends on many factors. Recently, VA tools have been developed to compare machine learning models, to ensure their predictions are fair and free of bias [43,44]. Comparison tools for demand forecasting [45], decision tree [46], and regression [47] models exist too.

Temporal Blind Source Separation
The statistical analysis of multiple measurements taken at different times is a challenging task. Often, such multivariate time series are analyzed by transforming the data in certain simple ways to uncover latent processes which generated the data. Probably the most used method for such a task is the classical PCA, which uses linear transformations of the data that result in components which have highest variance and are uncorrelated. Uncorrelated implies that the covariance between the found linear combinations is zero. The linear combinations are given by diagonalizing the covariance matrix. Furthermore, as the nature of the transformation is linear, interpretations of the results can be carried out by the simple and well studied loadings-scores scheme. However, PCA might not be the best choice when the data at hand shows dependencies in time, as the main source of information is in that case not covariance, but rather serial dependence. Serial dependence is characterized by autocovariance, i.e., covariance between measurements separated in time by a given lag. In analogy to PCA it would be desirable to find linear combinations of the multivariate time series data which are not only uncorrelated marginally (zero covariance between variables at each time step), but also uncorrelated in time (zero autocovariance between variables for any lag). TBSS is a field of multivariate statistics that studies methods delivering the former desired properties. Generally, BSS is a well established model-based framework. It assumes that the observed data are a linear mixture of latent components, which are considered usually easier to model and/or more meaningful for interpretation than multivariate models.
The goal of BSS is to recover these components based on the observed data alone. BSS is formulated and used for many types of data, as outlined in recent reviews [1,48,49,50]. In the following we outline the concept of TBSS.
The model of TBSS considered here is x t = Ac t , where x t denotes the 7 observed p-variate time series, A is the full-rank p × p mixing matrix and c t = (c 1,t , . . . , c p,t ) is the set of p latent components, which should be estimated.
Thus the goal is to find a p × p unmixing matrix W = (w 1 , . . . , w p ) , such that c t = W x t up to sign and order of the components in c t . To facilitate the recovery, the assumption is made that the components in c t have Cov(c t ) = I p and are uncorrelated (or independent) with mutually distinct serial dependence. This means, for example, that all cross-moment matrices, such as autocovariance matrices, of c t are diagonal matrices.
A very first approach for TBSS is denoted as Second-Order Blind Identification (SOBI) algorithm [51,52,53]. It finds the linear combinations of the data which make autocovariance matrices for several lags as diagonal as possible. Hence, found components are uncorrelated marginally and uncorrelated in time. It is well known in the statistical analysis of time series data, that time series emerging from different scientific fields have different key characteristics.
For example, financial time series are not well characterized by autocovariance matrices, but instead higher-order moments carry the most information. This is denoted as stochastic volatility and in the TBSS literature it is shown that SOBI fails for such time series [54]. Higher-order moments relate often to skewness and kurtosis and, for example in our context, to the covariance of the squared data and are meant to detect more unusual observations (heavy tails). To overcome this issue, a new TBSS method denoted as a variant of SOBI (vSOBI) [54] was introduced. Similar to SOBI, vSOBI finds the latent time series by diagonalizing matrices of lagged fourth moments. Uncovered latent components are uncorrelated marginally and additionally have zero fourth-order dependence.
Generally, time series might carry information both in the autocovariance and in the higher-order time dependence, thus a combination of SOBI and vSOBI might deliver the best results. Indeed, Miettinen et al. [11] suggested such a method, referred to as generalized SOBI (gSOBI). It diagonalizes several autocovariance matrices (SOBI part) and several matrices of lagged fourth moments (vSOBI part). This method has three rather involved tuning parameters.
The first one b ∈ [0, 1] weighs SOBI versus vSOBI, where SOBI (b = 1) and 8 vSOBI (b = 0) are the extreme cases. The second (k 1 ) and third (k 2 ) tuning parameters provide the sets of lags used for the SOBI and vSOBI part, respectively. A lag is a time interval given by a number of time steps, the size of which is determined by the resolution of the underlying time series. For instance, a lag of 6 in an hourly observed thermometer refers to an interval of 6 hours. Common default values for gSOBI are b = 0.9, k 1 = {1, . . . , 12} and k 2 = {1, 2, 3}, but Miettinen et al. [11] also show that the selection of lag sets and weight has a huge impact on the performance. Vague guidelines for these tuning parameters exist in the community, such as lag sets should not be too small and not be too large, and the lags should be chosen so that the corresponding (cross-)moment matrices for the latent components have diagonal values far apart. Thus, parameter selection in the context of SOBI is a highly complex problem with no practical solution yet [55,56].
The R implementation of gSOBI used in the following is available in the package tsBSS [57]. We call one execution of gSOBI a run. As outlined before it yields a set of p univariate time series, which we call components. The outcomes of multiple runs with varying parameter settings form an ensemble, where each member corresponds to a single run. A member has the used parameters k 1 , k 2 and b associated, as well as the output of gSOBI. The latter is either the component set c t and the estimated unmixing matrixŴ , or nothing, in case the (cross-)moment matrices could not be diagonalized in a predefined number of iterations. We call a run succeeding or failing, depending on the outcome.

Datasets
In this section we introduce two datasets, one from the medical domain and one from the financial domain, along with reasons why TBSS analysis of them can be desired. Analysis of both datasets shares similar tasks. For instance, analysts are interested in relevant parameter subspaces, common components and alternatives to them, as well as the stability of obtained results. We formalize typical tasks and questions involved in TBSS analysis in Section 5.

Financial data
Goods, currencies, and company stocks are traded every day at high frequencies. In simple terms, investors make money by buying something at a price X and selling it later at a price Y larger than X. To maximise Y − X in a short time frame the idea here is to find a volatile collection of currencies or stocks (a portfolio), i.e., one that is subject to sudden and extreme changes in value.
To do so, we look at the daily exchange rate of 23 currencies to Euro between the years 2000-2012 (23 variables, 3 139 time steps). We preprocess the data to get logarithmic returns, a common measure in quantitative finance when the temporal behavior of return is of interest. The first three variables are shown in Figure 1a.

Medical data
An electrocardiogram (ECG) is a recording of the heart's electrical activity. To obtain it, electrodes are placed on the patient's skin. These electrodes detect small electrical changes which occur due to muscle de-and repolarization. ECGs are important for medical analysis as many cardiac abnormalities show deviations to the normal ECG pattern. Analysis of fetal ECGs may detect problems during fetal development, such as fetal distress. While invasive methods exist to measure the fetal ECG directly, a non-invasive method is often preferred as it does not harm neither mother nor fetus. The fetal ECG is visible in the mother's ECG, but it is weak and mixed with, e.g., respiratory noise or frequency interference (compare first three rows in Figure 1b). Using TBSS on ten seconds of the ECG of a pregnant woman (8 dimensions, 2 500 time steps), we try to extract the fetal ECG following previous work [2].

Task Abstraction
In this section we present a task abstraction for TBSS. We structure it according to the data-users-tasks triangle by Miksch and Aigner [58] and use the terminology by Brehmer and Munzner [59] for tasks. We developed the abstraction together with the visualizations in an iterative design process following Munzner's Nested Model [10] with three collaborators, who are co-authors of this paper and experts in BSS. In this user-centered design process model, we first conducted unstructured interviews in order to understand their problems and made ourselves familiar with literature they provided. After that, we discussed our assumptions and ideas regularly with them over a course of nine months. We discussed iteratively developed prototypes ranging from handdrawn sketches, to static digital images, to an interactive application which is described in Section 6. During these sessions, we also questioned our current understanding of their tasks either implicitly through visualization designs or explicitly through discussions. In the end, we interviewed five TBSS experts, who did not collaborate with us on the design, to further validate our abstracted tasks (Section 8). The presented task abstraction is a reflection on this process.
We touched upon the involved data with TBSS in Section 3 already. These are a multivariate time series (input data), one real and two sets of integers (TBSS parameters b, k 1 and k 2 ) and a set of univariate time series (latent components). The temporal dimension is discrete and linear.

Users
Our users are data analysts or data scientists with formal education in statistics/math and basic knowledge of BSS. They may also be experts in a specific application domain, like medicine or finance. They work mostly with R [60], a language and environment for statistical computation in which most BSS researchers publish their implementations. The preferred work environment is RStudio, a popular text-based development environment for R. Currently, they use built-in plotting functionality, and sometimes they use, for example, ggplot2 to build customized visualizations. The output of either option is a static visualization, of which RStudio by default displays only one at a time. Because of this, our users are accustomed to well known static statistics visualizations such as histograms, line graphs, box plots, etc.

Tasks
During this user-centered design process we identified the following tasks, which we describe using the abstraction terminology by Brehmer and Munzner [59].
The high-level workflow can be separated into three phases, which are depicted in Figure 2: Analysts first inspect the raw input data, continue to finding parameter settings, and then analyze obtained components. Given the exploratory nature of their analysis process, analysts switch between the latter two phases until they feel they exhausted the parameter space or obtained a useful result.
Generally analysts want to discover observations or derive a modified dataset with reduced dimensionality. There are two main targets of analysis. One of them are the components, which are mostly analyzed as sets. Still, analysts want to discover and explore interesting components, whatever interesting means in the data domain. The other analysis targets are the parameters. Analysts look for a "stable" result, i.e., one that can be obtained with rather diverse parameter settings. The assumption is that its components then more likely represent real processes. To this end, they need to compare components and parameters analysts need guidance through the parameter space and the ability to compare possible parameters in some meaningful way to find a promising setting.

Visualization Design & Justification
In this section, we present the visualization design we obtained based on the task abstraction (Section 5) and implemented in a web-based prototype for gSOBI. A design goal was to make TBSSvis generic enough to allow its use in many application domains, because, like PCA, TBSS is a domain-independent method. We designed TBSSvis for inputs with the length of up to 5 000 time steps, and up to 50 dimensions. While these limits do not accommodate extreme cases, like fMRI data (100+ dimensions, 100 000+ time steps), we expect it is enough for many applications.
While we implemented visualizations for all abstracted tasks, for brevity we will focus on an illustrative subset of those. Specifically, we will discuss visualizations for tasks that pertain to • identifying and comparing components (or sets thereof), • identifying and comparing used parameters, and • comparing possible parameters.
TBSSvis is built of three screens, which are depicted with their connection to analysis phases in Figure 2. The Input Visualization screen shows the raw input data, a feature requested by our collaborators. The Ensemble screen allows exploration of parameter settings and components. Finally, the Parameter Selection screen is used to select new parameter settings. We will focus on the latter two. How presented visualizations work together is illustrated in the usage scenarios (Section 7).

Time Series Visualization and Interactions
Time series are plotted vertically aligned to facilitate comparison and ordered by variable name (for input variables) or by an interestingness function (for latent components, see below). The display of and interaction with all time series in TBSSvis is handled by the same logic as shown in Figure 3. Due to the length and amount of time series, we employ semantic zooming and at first save display space by drastically reducing their Y axis and omitting any labels by default. This can be changed with interaction: On hover, we display axis labels for the hovered time series. The Y axis can be increased individually by another interaction. If an analyst is interested in a contiguous subset of the time series, it is possible to zoom in with brushing, which will affect all time series in the application. Both the semantic and temporal zoom can be reset with interactions recommended by Schwab et al. [62].
As described in Section 3, the order of components is not defined. In practice, this means that analysts use measures which are sign-independent to compare components, such as absolute Pearson correlation, and impose an order by sorting components according to a function. We will call this a degree-ofinterestingness function (DOI), and require it to be any function f : R n → R that maps a time series of length n to a single number. Because TBSS is a domain-independent method, many DOI functions could be useful [63] depending on what the domain's interesting features are. E.g., for detailed cardiac analysis, different widths and types of ECG wave patterns could be mined.
Based on discussions with our collaborators we use the absolute third (skew-

Color
According to Mackinlay [15], color is the most effective visual variable for nominal data after position, and, therefore, often used to encode different data classes. In multiple views, the same classes should be encoded with the same palette [65]. Because humans can only reasonably distinguish a few different colors, we cannot statically assign colors to all ensemble members. We, therefore, use a user-controlled dynamic assignment of colors of a qualitative palette to encode data related to user-selected members.  [67,68], but using them with all components as they are has a major drawback: The clustering scheme will group components from the same set, which our collaborators found undesirable. The grouping should respect the set structure in the data and group components only between sets, not within them.
Additional requirements we gathered for the clustering scheme are that it should not depend on a distance metric (unlike, e.g., k-means) and produce an existing data case as cluster representative (again unlike, e.g., k-means). The former is related to the similarity measure for components suggested by our collaborators, the difference in absolute Pearson correlation dist cor = 1 − |cor(c i , c j )|. Since we do not know if it supports the triangle inequality, we should not rely on it.
The latter requirement stems from the design principle to show actual data over visual abstractions.
We developed a custom clustering scheme to achieve these requirements.
Starting from the realization that we basically want k-medoids as it does not need a distance and produces existing representatives (medoids), we looked for a way to constrain the clustering process to obey the set structure. Constrained versions exist for k-means [69], but we did not find one for k-medoids. However, it was possible to adapt it using a k-means-like formulation of k-medoids [70].
Constraints in our case are of the type cannot-link, i.e., they express which data cases must not be grouped into the same cluster. We add one cannotlink constraint per pair of elements that belong to the same set. For m sets containing p data cases each this amounts to mp(p − 1)/2 constraints in total.
Algorithm 1 shows pseudocode of our custom clustering scheme. We show the clustering result to the analyst with the following visualizations.

Clustering Quality and Number of Partitions
The constrained k-medoids clustering takes one user-provided parameter, which is the desired number of partitions. We use a scented widget [72] to allow setting this parameter in an informed way (Figure 4, A). The bar chart in the widget shows the average cluster separation as a clustering quality measure for a given number of clusters. Therefore, values with high bars suggest the number of meaningfully different components in all currently available sets.

Component Overview
The cluster medoids are shown underneath vertically aligned in a list, sorted by the DOI rank of the medoid (Figure 4, B). To further support Task C4, we show a histogram to the left of the medoid. The histogram shows the rank distribution of the contained components in their respective sets. Additionally, we encode dist cor to the cluster medoid with opacity. This way, stable (stacked bars with high opacity) and unstable (scattered bars with low opacity) components have distinct histogram shapes.
Analysts can inspect components in a cluster by clicking the "eye" icon, after which the list item expands and lists contained components in the same fashion as cluster medoids. Clicking a bar in the histogram or a time series label selects the associated ensemble member.

Slope Graph
Components of selected sets are visible in a separate view, again vertically aligned and sorted by DOI (Figure 4, C). Each selected set has a unique assigned color and all associated data is shown in this color. Multiple selections are juxtaposed horizontally in columns, which can be rearranged by the analyst. Analysts can inspect components visually as they are, or they can also display a slope graph between columns. Lines of the slope graph connect similar components, and thickness encodes similarity from high correlation (thick) to low. This way, it is easy to see stable (thick, single, mostly straight lines) and unstable components (no or thin, multiple, tilted lines), their rank changes and 20 set similarity at a glance.

Tasks I1/C2: Identify/Compare Used Parameters
Parameter space analysis [6] is another important task for BSS experts, where they are mainly interested in sensitivity analysis and partitioning. We facilitate these tasks with tailored visualizations ( Figure 5).

Similarity Views
Similarity of so far obtained component sets, as well as selected parameters, are shown in three separate dimensionally-reduced views. Marks that are close to each other suggest similar components and k 1 /k 2 parameters. Multidimensional Scaling (MDS) is an appropriate dimension reduction technique for global cluster analysis according to recent publications [73,74]. We use non-metric MDS [75] as we do not always have a distance metric. As MDS will project elements with same values in high-dimensional space to the same low-dimensional points, we would soon run into an occlusion problem-consider an analyst who keeps lag sets the same, but varies only the weight. There are a couple of ways to deal with occlusion, most notably lenses [21]. However, our users are not used to complex interactions, so we changed the tradeoff between position accuracy and occlusion. As an implementation of CorrelatedMultiples [19] was not available, we only rasterize the MDS plot and move overlapping points to the next free cell. When hovering over a point, the other points will change their size proportionally to the original dissimilarity, thereby allowing analysts to investigate projection errors.

Parameter Comparison
To compare weights of different parametrizations, we encode triangle marks on a shared axis. Triangles are stacked in case they otherwise completely occlude each other. To compare lag sets, we use interweaved histograms where the color saturation of a bar encodes the lag size to give an additional visual hint of the lag distribution, and to be consistent with the encoding in the lag selection (Section 6.5.1). Figure 6 shows how they are generated. First, individual bars

Task C5: Compare Possible Parameters
To obtain a new result, analysts need to select parameters. They consist in the case of gSOBI of two lag sets and one weight (Section 3).
To facilitate this selection process, we used the guidance design framework [76] to design appropriate guidance [61]. Analysts do not know which lags to select and are generally aware of this knowledge gap. As discussed in Section 5.2, the analysis goal is to obtain a new/interesting result. Issues occur in the phase of lag selection, because the space of possible lag sets is huge. Analysts currently do not use additional information about lags, mostly due to time constraints.
The knowledge gap lies in the execution and relates to the input data. We opt for orienting guidance, because analysts select lags also based on past experience and domain knowledge, so stronger guidance could be detrimental, and because our guidance input is not (cannot be) the "true" data: We compute it from the input data, which are per BSS model a linear combination of the components we are interested in. Based on the input data, we calculate guidance output per lag that help relate them to each other: Guidance Output (GO) 1: Calendar relation. We compute which lag fits best to intervals in bigger calendar granules. The benefit of this is two-fold.
First, lags are abstract and do not consider the calendar used in the data, so thinking in terms of days, weeks, etc., is a more intuitive alternative for someone familiar with the data. Second, it allows us to organize lags by filtering to those which correspond to a difference in a given calendar granule, thereby reducing the amount of lags to reason about. GO4: Cross-moment matrix diagonality. This can only be computed when a parametrization of a successful run is refined, i.e., an unmixing matrix estimate exists. It shows the analyst which selected lags had an impact on the diagonality of autocovariance and fourth cross-cumulant matrices. It can be understood as feedback into the guidance system.

Lag Selection
We support selection of a single lag set with multiple coordinated views (see Figure 7). The lag size is encoded with color saturation, to make long, medium, and short lags distinguishable in all views, which is roughly how analysts reason about lag sets.
A parallel coordinates plot (PCP) displays all lags corresponding to a selected calendar granule (Figure 7, A), which can be configured by the user. to black. When analysts see interesting patterns, they can select cells, and the respective input data and components will be shown underneath the matrices ( Figure 8). This allows to investigate the relationship between inputs and components. Task C3 is also supported, for which we encode a BSS-specific similarity measure [77] in a heatmap with a univariate color scale.

Usage Scenarios
In this section we describe how the designed visualizations (Section 6) allow insights into the presented datasets (Section 4). The financial dataset was used in our user studies (Section 8), while we added the medical dataset ourselves to provide broader context to the reader. The usage scenarios we describe are based on what we learned during aforementioned user studies and also during discussions with our collaborators.

Financial data
We load the financial dataset (Section 4.1) of 23 currency exchange rates to Parameter Selection. We set the weight b to zero and do not use SOBI part (k 1 parameter) at all, following our initial hypothesis. In the Lag Selection (Section 6.5.1) for k 2 we quickly select lags that correspond to 1-3 days, 1-4 weeks, 1-3 months and 1 year intervals in the underlying calendar. We do it this way because the other guidance outputs do not seem informative due to the 27 amount of noise in the dataset. The newly computed result is colored green in TBSSvis and automatically selected. We look at its components and compare it to the two identical results. The Slope Graph (Section 6.3.3) shows many thick lines that connect identical components. As we want to find currencies to invest in, we turn to the Component Overview again. The histograms suggest that the first couple of components are common in all results, i.e., are stable.
We therefore pick three that have volatile segments outside of 2008/2009 to rule out a global financial crisis as the cause for volatility. The Unmixing Matrix visualization (Section 6.6) shows which currencies are associated with these components (Figure 8). We will ask our financial advisor about investing in Thai bhat, US dollars, Turkish lira, or Philippine pesos.

Medical data
We load the ECG dataset (Section 4.2) from a pregnant woman into TBSSvis.
Looking at the raw inputs in the Input Visualization we can confirm that the fetal heart signal is visible in the mother's ECG. We start with 9 precomputed parameter settings, 6 of which succeed. The Clustering Quality (Section 6.3.1) suggests that 8-10 meaningfully different components were obtained (Figure 4, A). We set the clustering to 10 partitions. A healthy fetus has a heart rate of 110-160 beats/minute on average, which is higher than that of an adult (60-100). A candidate component for the fetal heart signal, which shows peaks of increased frequency, is readily visible as 4th in the Component Overview (Section 6.3.2). The rank histogram next to the cluster medoid shows that components in the cluster are very similar, which is confirmed by looking at them directly (Figure 4, B). We select a couple of results containing this component to compare their parameters. In Section 6.4.2 we see that the parameters vary wildly, and the fetal heart signal was found using long and short lags for either lag set with different weights. This, along with the absence of other candidate components, suggests that we found the correct signal. A medical doctor would be able to inspect the obtained fetal ECG wave patterns in detail and determine whether or not it is healthy. Looking at the values of the three parameter settings that did not produce results, we can also form an initial hypothesis about the useful parameter subspace ( Figure 5). What they had in common was i) a weight b between 0.25 and 0.6 and ii) lags that were distributed over the whole range instead of sticking to either the short or long end. Thus, when trying to find new parameters for this dataset, we would steer clear of those properties.

Evaluation
To assess the usefulness of our visualization design, we conducted two interviews with five TBSS experts external to the project. Our research questions were: RQ1 What are advantages and disadvantages of TBSSvis in comparison to their current tools?
RQ2 Does TBSSvis in fact support the analysis tasks?
RQ3 What are possible improvements to TBSSvis?
We decided for an Expert Review [78] using interviews, as no comparable tool for a quantitative evaluation exists and qualitative data allows much deeper insights. Two interview cycles were conducted: The first to gather initial external feedback and supporting evidence for our task abstraction, and the second to verify that this feedback was integrated accordingly. They lasted 2.5 hours and 1 hour, respectively.

Participants
Participants were the same for both interviews and previous collaborators of box plots. I tend to stick with these basic kinds of plots (...)".

Methodology
The interviews were conducted and recorded via Zoom with explicit consent by participants. 1 Two researchers were involved in each interview, one tasked with moderation and one took notes. Participants used TBSSvis on their own machines and shared their screen during usage. We used Zoom annotations to point out relevant parts of TBSSvis when necessary.
Both sessions were structured the same. We compiled a text explanation with images of TBSSvis, so that participants can familiarize themselves with it beforehand. The tutorial document was sent to participants together with the consent form ahead of the interview. Steps during the interview were as follows: 1. (Only in first session.) We conducted a structured interview about their background and experiences with (T)BSS.

We gave participants a structured introduction to interactions and visu-
alizations in TBSSvis. The dataset used was synthetic and unfamiliar to them. We asked participants to solve small tasks to practice what we explained. We skipped these tasks when we either saw that they understood it, or when we were short on time.
3. (Optional.) Participants were allowed to further use TBSSvis for some minutes on their own.

We asked participants to conduct an open analysis on the dataset used in
Section 4.1, which most have worked on in the past, and articulate their thoughts and plans ("think aloud"). We pointed out parts of TBSSvis they did not use or consider so far.
5. We discussed tasks, visualizations, interactions and possible further improvements in an unstructured fashion. Before we finished the session, we encouraged participants to use TBSSvis more without our supervision. 1 As of manuscript submission, the TU Wien has a Pilot Research Ethics Committee.
Approaching it for peer review of research with human participants is not required by the TU Wien, and its response is non-binding. Therefore we do not provide an official ethics approval.
Nonetheless, we believe we conducted our research adhering to sufficient ethical standards.
To answer RQ2 we found it sufficient to check whether or not participants can interpret our visualizations, and if visualizations show the necessary data in the right moment to support their tasks. To do so, we analyzed the recorded video and notes after each session. We looked for articulated suggestions, discussions, and situations where users interacted with visualizations. These instances were transcribed and grouped by tasks (Section 5.2). Feedback and possible issues of participants were noted, deduplicated, and presented to our collaborators.
Subsequent discussions then informed changes to the first design, which we confirmed in the second interview.
The interview guide, tutorial documents, datasets, and our transcripts of the interviews are available as supplementary material.

Expert Feedback
We describe evidence for our research questions in this section.

RQ1: Advantages and Disadvantages
Our participants agreed that TBSSvis has clear advantages compared to current tools used and greatly improves the analysis process. E5 even said that TBSSvis is "an absolute time saver" and "very useful for applied work". The majority of them mentioned that it is easier than in RStudio to compare components, matrices, and parameters. As for disadvantages, there is one very basic: RStudio allows more flexible and specialized computations than TBSSvis. However, this was not explicitly mentioned by participants. Some said it took time to put everything together, but all our participants managed to do so quickly. A few plots were difficult to understand at first, but after explanations it was relatively easy to use for all participants. In addition, we observed some participants having trouble with idioms that are common and popular in the visualization community, such as PCPs and multiple linked views, which could be overcome by visual literacy efforts.

RQ2: Supported Tasks
In this section, we discuss how TBSSvis supports analysis tasks (Section 5).
We provide quotes from participants to let them speak for themselves, but their sentiment is shared by the majority and not an isolated opinion.
Identify used parameters (I1): The tabular overview ( Figure 5, A) was considered "really useful" (E2) and participants thought it "makes a lot of sense" (E4). Identify cross-moment diagonality (I3): It is "something I don't usually have the time and energy to compute" (E5) and "very interesting" (E1), but also something they do not regularly use for their analysis today.
Identify components (I4): Our participants found the added interactivity compared to RStudio very useful.
Compare success (C1): They had no trouble with visual encodings, but participants sometimes forgot that failure is an option.
Compare parameters (C2): While the interweaved lag histograms were easy to interpret, it took some time for participants to realize that it is a regular histogram with hidden bins (Figure 6). Similarity projections of parameters ( Figure 5, B) were rarely used by our participants. A possible explanation is because histograms show more data and participants worked with only 5-7 parametrizations, they could use their working memory. We believe their benefits would have become apparent with more parametrizations.
Compare unmixing matrices (C3): Some (E3-E5) mentioned that interpret- Compare possible parameters (C5): After we introduced participants to individual views and interactions, they learned quickly how to use it and found it useful and convenient. They understood how and why to filter visualized lags, but were not sure about the data-driven calendar-based approach, presumably because they currently analyze data detached from any calendar. Participants appreciated the PCP with its dimensions, even though they sometimes did not know right away how to interpret all of them: For example, E2 asked what the eigenvalue metric means, what the optimal choice is, and if lower or higher is better. Participants were also sometimes irritated by the number of dimensions, as they depend on the outcome of the refined run.

RQ3: Possible Improvements
When asked about improvements to TBSSvis, we got responses mainly pertaining to the parameter selection. E4 would prefer if the syntax to directly select lags matched commands available in R. E2-E4 often ended up with an empty selection in the PCP because they expected brushes to be combined with union instead of intersection. They also want to select all filtered lags and remove all selected lags at once. Aside from the lag selection improvements, more DOI functions would be appreciated. We added one measure for periodicity [64] following one participant suggestion. E5 suggested to support loading precomputed results, possibly from other TBSS methods. E2 asked for more legends, explanations, and a stronger guidance degree. E1 suggested the ability to freely reorder components everywhere, and providing alternative color palettes. With E1 we also discussed the option of showing correlations between input data in the Input Visualization screen as another sanity check.

Reflection and Discussion
Reflecting on our findings and lessons learned during our design study with experts in BSS, we claim that TBSSvis supports tasks involved with TBSS analysis (Section 5) and encourages usage of TBSS in various application domains.
Despite differences in what an application domain considers interesting in latent dimensions (e.g., doctors might search for specific wave patterns, while investors look for sudden and extreme changes), many tasks are the same. We showed this transferability to financial and medical datasets in Section 7. We developed and evaluated TBSSvis with TBSS experts, who are our primary intended users. They worked with many domain experts in the past to apply TBSS in their respective fields. Their practical experience with different use cases for TBSS informed our visualization design (Section 6). Therefore, based on the mostly positive feedback by our interview participants, we expect that TBSSvis can be useful in many application domains.
In line with the design study methodology [79], we used well known visualization idioms and data mining algorithms, applied them in a new context and extended them as necessary. As a consequence, individual parts of TBSSvis will be useful to other visualization researchers and designers. For instance, a slope graph usually shows categorical data cases and their change of rank by line slope. We adapted it to time series by encoding similarity in line thickness. In our user studies it was considered an easy-to-understand visualization to visually compare sets of time series. The clustering scheme (Section 6.3) is useful whenever members of sets should be clustered and set membership must be taken into account. It works with any dissimilarity measure because it is based on k-medoids. Set-typed data is prevalent [80], so we expect this to be useful to others.

Design Process
Following the recommendations of the data-users-tasks design triangle [58] our proposed visualizations are close to what TBSS experts are used to and therefore quite simple. We also did not include more advanced interactions than highlighting, filtering, hovering, or brushing because TBSS experts come from a text-based software where even these do not exist. Looking back, we think this was a good decision, as in our interviews some participants had initially trouble using, e.g., the PCP.
What was difficult for us visualization researchers during the design is the domain-independence of TBSS. Our goal, therefore, was to make TBSSvis applicable in a wide range of domain-specific contexts, e.g., in medicine or finance.
But both size and complexity of the data vary considerably among the domains, as do the definitions of "interesting" features and the location and role of TBSS in the data processing pipeline [81]. Therefore, we opted in the end for simple interactions and generic/extendable approaches, such as the use of DOI functions, to avoid a "lock-in" to any specific application domain.

Limitations and Future Work
We discuss some limitations in our paper. Most study participants used the financial dataset (Section 4.1) at some point in the past to test varying TBSS methods. Although participants fit well to our user description (Section 5.1), they were not as intimately familiar with the dataset as it is often the case in visualization-related evaluations. Had this been the case, we may have found additional analysis goals and insights. Nevertheless, we maintain that our study methodology and participant selection was sufficient and appropriate to investigate how TBSSvis impacts involved tasks (Section 5.2). Participants used TBSSvis for around 45 minutes in total on their own terms. More time using it may have surfaced more necessary analysis tasks or improvement suggestions.
As part of our future work, we would like to integrate the suggested improvements by our experts, support larger datasets and allow provision of custom DOI functions.

Summary and Conclusion
We presented TBSSvis, a VA solution for TBSS. TBSS is in a way similar to PCA, in that it can be used to analyze suitable datasets from any application domain, such as biomedical analysis, finance, or civil engineering. Unlike PCA, TBSS properly accounts for temporal correlation and requires complex tuning parameters. Because of these parameter settings, TBSS analysis is inherently open-ended and exploratory as there are no known insights to confirm. TBSSvis is based on a task abstraction and visualization design that we developed together in a user-centered design process with TBSS experts. We evaluated the final interactive prototype with five other TBSS experts, who did not participate in the design process, by conducting two interviews. Feedback from these shows that TBSSvis supports the actual workflow and combination of interactive visualizations that facilitate the tasks involved in analyzing TBSS results-this process was previously a laborious back-and-forth for which analysts had to manually program static visualizations and data mining algorithms. TBSSvis also provides guidance to facilitate the analysis of the data at hand and informed parameter selection, which was previously mostly a guessing game.