Workflow

In this section you will learn how to determine the most sufficient state configuration from histograms, to obtain the relative state populations and to estimate the associated cross-sample variability.

Initial data are the trajectories from Trace processing’s output.

The procedure includes four steps:

  1. Import histogram
  2. Build histogram
  3. Determine the most sufficient state configuration
  4. Estimate state populations and associated cross-sample variability
  5. Export data

Import histogram

Initial histograms come from Trace processing’s output for simulation-, video-, and trajectory-based projects, or from an external ASCII file.

In the three first cases, you can skip this step and go directly to the next one.

In the latter case, a new histogram-based project must be created. This implies to import the histogram file and define the data structure in the file. After the project creation is completed, it is recommended to save it to a .mash file that should regularly be overwritten in order to keep traceability and access to the results.

To create a new histogram-based project:

  1. Open the experiment settings window by pressing New project in the project management area and selecting import histogram.

  2. Import the histogram file and define your experiment setup by configuring tabs:

    Import
    Divers

    If necessary, modify settings in Divers any time after project creation.

  3. Define how data are structured in the file by configuring tab File structure.

  4. Finalize the creation of your project by pressing Save; the experiment settings window now closes and the interface switches to module Histogram analysis.

  5. Save modifications to a .mash file by pressing Save project in the project management area.

Notes: Only one histogram can be imported in a histogram-based project. Beside, histogram-based projects only have access to the module Histogram analysis.


Build histogram

To build an histogram, data are limited to specific boundaries and sorted into bins of specific size. Ideally, each state population appears as a Gaussian-shaped peak in the histogram.

The bin size has a substantial influence on the histogram shape: large bins will increase the overlap between neighbouring peaks until the extreme case where all peaks are merged in one, whereas short bins will flatten the peaks until the extreme case where no peak is distinguishable.

Histogram boundaries are important as they define the range of data considered for analysis. Large data ranges can include outliers that would bias the state analysis and narrow ranges can exclude relevant contribution for population analysis.

The histogram limits and bin size have to be carefully chosen in order to enhance the natural shape of data distribution without altering it.

Effect of histogram bin size

To build the histogram:

  1. Select the data and molecule subgroup to analyze in the Data list and Molecule subgroup list, respectively

  2. Set parameters:

    Histogram binning
    Overflow bins

    The histogram is instantly built with the new parameters and displayed in the Visualization area


Determine the most sufficient state configuration

In histogram analysis, states are identified as histogram peaks that are ideally modelled by a Gaussian distribution. Therefore, the overall histogram can be described by the sum of J Gaussian distributions, with J the number of states.

In the case of well-separated peaks, J is easily determined by eye, but most of the time, the histogram peaks overlap each other and can’t be accurately distinguished.

Histogram peak overlap

One way of objectively identifying the number of overlapping peaks in a histogram is to, first, find the Gaussian mixtures that describe the data the best for different J, and then to compare optimum models with each other. As the goodness of fit, or model likelihood, fundamentally increases with the number of components, inferred models can be compared via:

  • the improvement in goodness of fit
  • the Bayesian information criterion (BIC)

In the first case, a certain improvement in model likelihood is expected when adding a new component to the model, e. g. an increase of 20%. Here, the most sufficient model is the least complex model for which adding a component does not fulfil this requirement.

In the second case, the BIC is used to rank models according to their sufficiency, with the most sufficient model having the lowest BIC.

To determine the most sufficient state configuration:

  1. If not already done, select the data and molecule subgroup to analyze in the Data list and Molecule subgroup list, respectively

  2. Set parameters:

    Maximum number of Gaussians
    Model penalty

  3. Start inference of state configurations by pressing Start analysis; after completion, the display is instantly updated with the most sufficient Gaussian mixture


Estimate state populations and associated cross-sample variability

The relative population Xj of a state j is associated to the probability of finding a molecule in state j any time in the sample.

It can be estimated from the histogram by integrating each peak to a value Sj and calculating the ratio:

X_{j} = \frac{S_{j}}{\sum_{j'=1}^{J}( S_{j'})}

For well-separated histogram peaks, Sj values are calculated using thresholds between peaks. For overlapping peaks, a mixture of J Gaussians is fitted to the histogram and Gaussian integrals are used as Sj values.

Estimation of state's relative populations

The outcome of such analysis are single estimates of relative state populations, meaning that they carry no information about the variability across the molecule sample.

One way to evaluate the cross-sample variability of relative state populations is to use the bootstrap-based analysis called BOBA-FRET. BOBA-FRET applies to both threshold and Gaussian-fitting methods, and infers the bootstrap mean μX,j and bootstrap standard deviation σX,j of state populations for the given sample.

Estimation of cross sample variability with BOBA-FRET

To calculate relative state populations:

  1. If not already done, select the data and molecule subgroup to analyze in the Data list and Molecule subgroup list, respectively

  2. Set parameters in Method settings

  3. Import parameters of one of the inferred state configurations by selecting the configuration in Inferred models and pressing >>

  4. Adjust parameters in Thresholding or Gaussian fitting and respectively press Start or Fit to calculate state relative populations


Export data

Histograms, analysis results and analysis parameters can be exported to ASCII files and PDF figures; see Remarks for more information.

To export data to files:

  1. If not already done, select the data and molecule subgroup to export in the Data list and Molecule subgroup list, respectively

  2. Press EXPORT... and select the desired destination to start writing files.


Remarks

The type of exported files depends on which analysis was carried on; see Export analysis results for more information.