Analysis of high-dimensional flow-cytometry using UMAP and Phenograph-clustering

Before starting:

1. Acquire data on Flow cytometer of interest

2. Export to flowjo/Fix compensation

3. Gate on live events – To increase granularity of analyses you can specifically gate on a sub-population (i.e. Tregs, CD4s etc.)

4. Export gated events/sample as individual FCS files: In the export window choose format as: FCS

Installation

First run these steps if the pipeline is not installed on your computer (expand code by clicking the Code button in the lower right)

REQUIREMENTS

if(!require(devtools)){
  install.packages("devtools") 
}
library(devtools)
install_github("sekalylab/unsupflowhelper")

Load libraries

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(unsupflowhelper))
suppressPackageStartupMessages(library(flowCore))
suppressPackageStartupMessages(library(patchwork)) ## to combine multiple plots in a single plot
knitr::opts_chunk$set(echo = TRUE,time_it = TRUE)


set.seed(1)

library(future)
plan("multicore", workers = 8)
options(future.globals.maxSize = 8000 * 1024^2)

Flow Object

The flow object is a new data structure that will contain:

  1. A flowSet
  2. Additional parameters (i.e. excluded markers, transformation cofactors, statistical test results, etc)

It is the basis of how our pipeline functions

The CreateFlowObject function has multiple optional arguments that can be supplied. By default, it has preset values for subsampling, minimum number of events, equalization across samples.

You can either supply a flowSet from flowCore (generated from FCS files), or a list of filenames (CSV or FCS) as input to create a Flow Object.

If you have csv files exported from an older analysis, you can use them as well. Files are assumed to be from the same experiment, and markers in the same order and spelled the same way. WARNING: this is not the main intended way to run the analysis, FCS files are preferable going forward.

The FlowSet

FlowSets (and the individual flowframes within for each sample) can interact with multiple R packages and are thus one of the best ways to managing flow data in R. This will allow to:

  • Adjust compensations
  • Plot histograms/scatter plots
  • Batch correct
  • etc.

R can directly read FCS files with the flowCore package. All you need to do is supply the path of a list of fcs files. and run read.FlowSet(). If you wish to skip directly to the Flow Object, you can simply supply file paths.

Option 1: Interactive Mode

## Default call (BASIC)
flow_object <- CreateFlowObject(name = "test_stain", interactive =T) ## identifier for the dataset
Interactive mode GUI
Interactive mode GUI

Option2: Provide files

#1. List files 

fileLS <- list.files("path/to/your/inputfiles", 
                     pattern = "*.fcs$",  #or '*.csv$` if your data is in csv. 
                     full.names = T)

flow_object <- CreateFlowObject(files = fileLS, 
                            name = "test_stain") ## identifier for the dataset

Test dataset

The pipeline includes a test dataset to get familiar with the functions.

data("cd4_flow")

HOW-TO : Interact with a flow object

See embedded markers

Get_MarkerNames(cd4_flow, select = "all") 
##      FJComp-APC-A  FJComp-APC-Cy7-A FJComp-APC-R700-A    FJComp-BB515-A 
##             "ID2"            "CD39"            "CD28"            "CCR5" 
##    FJComp-BB630-A    FJComp-BB660-A    FJComp-BB700-A    FJComp-BB750-A 
##          "SLAMF6"           "CD101"            "IFNg"           "NR4A1" 
##   FJComp-BUV395-A   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV615-A 
##            "BCL2"             "CD3"             "CD4"          "CD45RA" 
##   FJComp-BUV661-A   FJComp-BUV737-A   FJComp-BUV805-A    FJComp-BV421-A 
##            "CD27"             "CD8"          "CD45RO"            "TCF7" 
##    FJComp-BV480-A    FJComp-BV570-A    FJComp-BV605-A    FJComp-BV650-A 
## "DUMP(CD14_CD19)"            "Ki67"            "IRF4"            "CD69" 
##    FJComp-BV711-A    FJComp-BV750-A    FJComp-BV786-A     FJComp-DAPI-A 
##           "EOMES"            "CCR7"            "TBET"              "LD" 
##       FJComp-PE-A FJComp-PE-CF594-A   FJComp-PE-Cy5-A   FJComp-PE-Cy7-A 
##             "TOX"             "TNF"            "CD95"             "PD1"
# You can also show only the 'excluded' or 'included' markers. see `?Get_MarkerNames()`

Get sample metadata

Get_MetaData(cd4_flow)
filename name discrete_group discrete_group2 continuous_var
Sample_1 export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs Sample_1 Control Time1 8
Sample_2 export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs Sample_2 Test Time1 10
Sample_3 export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs Sample_3 Control Time2 3
Sample_4 export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs Sample_4 Test Time2 15
Sample_5 export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs Sample_5 Control Time1 7
Sample_6 export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs Sample_6 Test Time1 18
Sample_7 export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs Sample_7 Control Time2 8

Exclude additional parameters (OPTIONAL)

By default, the FSC/SSC/Time parameters are excluded.

However, if you pregated using additional markers, such as CD3, or Live/dead stains, it’s highly recommended that you exclude them from the unsupervised analysis.

If you wish to exclude additional parameters, you can use the exclude_parameters function.

If you need a reminder of the available marker names in your flow object, you can still look with Get_MarkerNames(cd4_flow)

## View excluded markers
Get_MarkerNames(cd4_flow, select = "excluded", show_channel_name = T)
##   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV737-A    FJComp-BV480-A 
##             "CD3"             "CD4"             "CD8" "DUMP(CD14_CD19)" 
##     FJComp-DAPI-A 
##              "LD"
# Exclude additional markers
cd4_flow <- exclude_parameters(flow_object = cd4_flow,
                              exclude_markers = c("CD3", "LD", "DUMP(CD14_CD19)", "CD4", "CD8") )

## View excluded markers
Get_MarkerNames(cd4_flow, select = "excluded", show_channel_name = T)
##   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV737-A    FJComp-BV480-A 
##             "CD3"             "CD4"             "CD8" "DUMP(CD14_CD19)" 
##     FJComp-DAPI-A 
##              "LD"

Add metadata

If you have sample metadata, such as group, timepoints, treatments, or otherwise, this is where you add it to your flow object. It can be added at any time, or not at all, but some options for plotting and statistics will require it.

You need to have a dataframe, usually imported from a csv or spreadsheet, OR you can use interactive mode to load a file.

  • If you have continuous (numerical) variables, make sure R read them as numeric (class(metadata_df[["continuous_variable"]]) should be numeric, not character).
  • If you have discrete groups, you can factor them if you wish them ordered in a certain way, otherwise R will treat them in alphabetical order. e.g. metadata_df[["discrete_group"]] <- factor(metadata_df[["discrete_group"]], levels = c("Control", "Acute", "Chronic"))
  • The dataframe REQUIRES a column that matches the samplenames in the flow object, to be specificed with the ‘key_column’ parameter.

Option 1: Interactive mode

cd4_flow <- add_sample_metadata(flow_object = cd4_flow, interactive = T)

test <- CreateFlowObject(name = "test", interactive = T)
Add Metadata Interactive Mode
Add Metadata Interactive Mode

Option 2: Provide data.frame

#read_metadata
metadata_df <- read.csv("input/metadata_flow_vignette_demo.csv")

## Add  metadta to flow_object
cd4_flow <- add_sample_metadata(flow_object = cd4_flow, metadata = metadata_df, key_column = "filename", flowobj_column = "filename")

#View phenoData
Get_MetaData(cd4_flow)
filename name discrete_group discrete_group2 continuous_var
Sample_1 export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs Sample_1 Control Time1 1
Sample_2 export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs Sample_2 Test Time2 10
Sample_3 export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs Sample_3 Test Time1 14
Sample_4 export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs Sample_4 Test Time1 20
Sample_5 export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs Sample_5 Control Time1 4
Sample_6 export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs Sample_6 Control Time2 9
Sample_7 export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs Sample_7 Control Time2 2

Find cofactors (OPTIONAL)

For the unsupervised analysis, we can use an inverse hyperbolic sine transformation of the data. This requires division by a cofactor, but different channels require different cofactors depending on how strongly they are expressed.

This method attempts at finding an optimal cofactor per channel, by incrementally increasing the cofactor value until only a small portion (0.1%) of events fall under zero when transformed.

## Find cofactors function
cd4_flow <- find_cofactors(flow_object = cd4_flow)

#View cofactors
cd4_flow$parameters$cofactors
##    ID2   CD39   CD28   CCR5 SLAMF6  CD101   IFNg  NR4A1   BCL2 CD45RA   CD27 
##    735    885    720    295    110    830   1595   2215     50    570    840 
## CD45RO   TCF7   Ki67   IRF4   CD69  EOMES   CCR7   TBET    TOX    TNF   CD95 
##   2895    345   1220    140   1475   2495   2685   2640   2765   3320    810 
##    PD1 
##    910

Add reference controls (OPTIONAL)

While staining controls will not get used for clustering and dimension reduction, they will be useful when comparing cluster expression to the baseline, and interpreting not only marker relative expression, but whether it is positive or negative.

Controls are not necessary for EVERY marker: a FMO for a strong marker such as CD4 on T cells is rarely warranted. So, in the absence of a given control for its corresponding marker, the Unstained control will be used. The Unstained control is therefore necessary.

## [1] "Sample A8 (export_Unstained_A8_A08_T Lymphocytes.fcs) only contains 691 events. Using all 691 events"
# List control files
## Names must fit with markernames in the flowSet's `markernames()`

## Option 1
#You can run the function interactively: it will open up a browser window to select the 
#files on your computer for each desired marker, and then adds them to your flow_object. 
#You can also add controls for excluded markers for some plotting functions.
cd4_flow <- add_staining_controls(cd4_flow)


## Option 2
# Add staining controls manually from a list 
control_LS <- list("Unstained" = "input/controls/export_Unstained_A8_A08_T Lymphocytes.fcs",
                   "EOMES" = "path_to_your_EOMES_file.fcs",
                   "T-BET" = "path_to_your_T-BET_file.fcs")

cd4_flow <- add_staining_controls(cd4_flow, control_list = control_LS, interactive = F)

Run UMAP and clustering

# run UMAP
cd4_flow <- UMAP_flow(cd4_flow,transformation = "asinh" )
## [1] "Performing dimension reduction using UMAP..."
## Alternative: use another transformation on the data. 
## Available options are `log`, `cytofAsinh`, or `none`. 
## The latter uses the biexponential data as is. 
cd4_flow_biexp <- UMAP_flow(flow_object = cd4_flow,
                     transformation = "none")
## [1] "Performing dimension reduction using UMAP..."
# Run Phenograph
cd4_flow <- Cluster_flow(cd4_flow, k = 60)
## [1] "Performing clustering using RPhenograph..."
##   Finding nearest neighbors...DONE ~ 53.449 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.263 s
##   Build undirected graph from the weighted links...DONE ~ 3.835 s
##   Run louvain clustering on the graph ...DONE ~ 15.798 s
##   Return a community class
##   -Modularity value: 0.6643943 
##   -Number of clusters: 12
cd4_flow_biexp <- Cluster_flow(cd4_flow_biexp, k = 60)
## [1] "Performing clustering using RPhenograph..."
##   Finding nearest neighbors...DONE ~ 47.58 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.265 s
##   Build undirected graph from the weighted links...DONE ~ 3.783 s
##   Run louvain clustering on the graph ...DONE ~ 12.127 s
##   Return a community class
##   -Modularity value: 0.6806686 
##   -Number of clusters: 14

Cluster annotation (OPTIONAL)

Having louvain labels is a good basis for unbiased analysis. But perhaps, after looking at the data, you are able to discern the biological identity of those louvain clusters and wish to have your plots reflect that instead of just numbers.

You can provide a data.frame, where one column (called louvain) contains ALL the louvain clusters from your analysis. All additional columns (that you name however you see fit) contain your labels of choice.

The function provided for this is called annotate_clusters and can be used multiple times

cd4_flow <- annotate_clusters(cd4_flow, interactive = T)
#View cluster annotations
Get_ClusterAnnotations(cd4_flow)
louvain Cell Type Granular population
8 1 CD4 Memory CD4 Tcm
1 2 CD4 Effector CD4 Tte
9 3 CD4 Memory CD4 Tem
11 4 CD4 Naive CD4 Naive
12 5 CD4 Memory CD4 Tem
5 6 CD4 Memory CD4 Tem
3 7 CD4 Naive CD4 Naive
2 8 CD4 Memory CD4 Tem
6 9 CD4 Naive CD4 Naive
10 10 CD4 Naive CD4 Naive
4 11 CD4 Memory CD4 RARO Tscm
NA NA NA NA

Plotting

Many visualizations are available within the pipeline, without needing to resort to FlowJo. That being said, if that’s your preferred way of doing things, look for how to export the flow object to FCS at the end.

HOW-TO: Save a plot

Before exploring different visual options, you need to be familiar with how to save your plots to file. All plotting options in the pipeline use ggplot2 and thus can use ggsave for exporting visuals to file.

Alternatively, you can use R’s default image engines: pdf(), svg(), tiff(), jpeg(), png(), then the plot, followed by dev.off()

When using a plotting function, you can either save it to a variable (p <- plotting_function(...)), or use it directly within ggsave (ggsave(plotting_function(...), device = "pdf")).

For more details on ggsave, look in the documentation using ?ggsave

RECOMMENDED: when exporting UMAP plots, you should export to a jpeg format, to avoid very large files.

#1. Option 1
p <- plotting_function(...)

ggsave(p, filename = "output/UMAP_plot.tiff", device = "tiff", width = 7, height = 7, units = "in", dpi = 300))


#. Option 2
tiff("output/UMAP_plot.tiff", width = 7, height = 7, units = "in", dpi = 300)
print(p)
dev.off()

UMAP plot

The raw visualization of UMAP dimension reduction. Often known as DimPlot for Seurat aficionados.

Most of the options you find here will also be found

  • You can split.by metadata discrete columns (up to 3 at a time)
  • You can group.by a metadata discrete column OR louvain clusters to color cells.
  • You can choose to display only a few selected cluster labels when group.by is set to "louvain" by passing a vector of clusters, e.g. c(1,4,6)
## BASIC UMAP plot: simple call
# Example 1: Raw plot
plot_umap(flow_object = cd4_flow) 


# Example 2a: Louvain clusters
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain")

# Example 2b: Louvain clusters
plot_umap(flow_object = cd4_flow_biexp, 
          group.by = "louvain")

# Example 3: Louvain clusters separated by a metadata variable
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          split.by = "discrete_group")

# Example 4: Select only a subset of cluster labels to show
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          label_filter = c(1,4,5))

# Example 5: Color by cluster annotations OR experimental groups
plot_umap(flow_object = cd4_flow, 
          group.by = "Granular population")

# Example 6: Remove labels, and replace with legend
plot_umap(flow_object = cd4_flow, 
          group.by = "Granular population",
          label = F)

# Example 7: Louvain clusters separated by multiple metadata variables
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          split.by = c("discrete_group","discrete_group2"))

 # Example 8: Color by one metadata variable and split by another. 
plot_umap(flow_object = cd4_flow,
          group.by = "discrete_group",
          split.by = "discrete_group2",
          dot_size = 1,
          ncols = 1)

Examples

1: Default

2a: Louvain clusters (asinh transform)

2b: Louvain clusters (biexponential transform)

3: Split by metadata variable

4: Filter cluster labels

5: Color by cluster annotation

6: Color by experimental groups; no label

7: Split by multiple metadata variables

8: Combine split.by and group.by

Density Plot

Instead of displaying clusters or metadata groups, these plots are meant to display point density, in a pseudocolour distribution similar to FlowJo.

Similar to UMAP_plot, you can split.by up to 3 variables.

#1
plot_density_umap(cd4_flow) 

#2

plot_density_umap(cd4_flow,  
                  split.by = "discrete_group",
                  dot_size = 0.5)

Examples

1: Default

2: Split

UMAP heat plot

Visualization of the expression of individual markers on the UMAP plot

You can also visualize fluorescent markers that were excluded.

#1
plot_umap_heat(cd4_flow, markers = "CD45RA" ) 

#2
plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 100) + 
  plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 10000)


#3
plot_umap_heat(cd4_flow, markers = c("CD45RA", "CD45RO", "CCR7", "EOMES", "CD27"), legend_height = 8) 




#3 Combine heatmaps into a complex arrangement

Examples

1: Default

2: Quantile Saturation

3: Complex combination

Cluster visuals

Cluster heatmap

# 1 Basic relative expression heatmap

plot_cluster_heatmap(cd4_flow)


#2 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T)

#3 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T, annotation = "Granular population")

#4 Subset louvain clusters
plot_cluster_heatmap(cd4_flow, show_control = T, louvain_select = c(1,5,7,12,9))

#5 Include Absolute expression annotation
plot_cluster_heatmap(cd4_flow, show_control = T, add_mfi = T)

Examples

1: Default

2: With Controls

3: With cluster annotations

4: Louvain cluster subset

5: + MFI annotation

Cluster histogram

#1
plot_cluster_histograms(cd4_flow, marker = "CD45RA") #1

#2

plot_cluster_histograms(cd4_flow, marker = "EOMES", show_control = T) #2

#3
plot_cluster_histograms(cd4_flow, marker = "CD45RA", clusters = c(1,2,5)) #3

Examples

1: Default

2: Include control

3: Cluster Filter

Cluster jitterplot

#1
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain")

#2
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain", 
                   louvain_select = c(1,5,7), 
                   value = "absolute")

#3

cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain",
                   group = "discrete_group")


#4
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "Granular population", 
                   group = "discrete_group")

Examples

1: Default

2: Absolute counts + cluster filter

3: Compare metadata groups

4: Using cluster annotations

Cluster stacked barplot

#1


cluster_stacked_barplot(cd4_flow) 

#2
cluster_stacked_barplot(cd4_flow, group = "discrete_group", annotation = "Granular population") 

Examples

1: Default

2: Group.by and split.by

Extract data

Retrieve expression data

df <- get_flowSet_matrix(cd4_flow,add_sample_id = T, annotations = T )

freq_df <- get_cluster_frequencies(cd4_flow)

Statistical comparisons

While an exhaustive statistical support for all possible experimental designs is beyond the scope of this pipeline, we can provide simple support for simpler designs, such as correlation or discrete statistics with metadata outcomes.

These tests are meant for initial quick exploration of your data: they are unlikely to be the perfect test for your specific hypothesis. For instance, in outcomes with more than 2 levels, consider one-way ANOVA with a post-hoc TukeyHSD test if doing a parametric analysis. Or in a paired design, adapt the test accordingly.

These are not testing for statistical assumptions either: normality, equality of variances, etc.

R package rstatix is very helpful in quickly doing those additional tests.

Regression

#1: Correlate louvain cluster relative_frequencies with a given continuous outcome
correlate_clusters(cd4_flow, group.by = "louvain", outcome = "continuous_var", method = "spearman")

#2 Correlate cluster annotations instead of louvain
correlate_clusters(cd4_flow, group.by = "Granular population", outcome = "continuous_var", method = "pearson")

Examples

1: Louvain vs outcome
louvain outcome rho statistic p.value method padj
1 continuous_var 0.220 43.89140 0.641 Spearman 0.9328
10 continuous_var -0.036 58.01810 0.939 Spearman 0.9390
11 continuous_var 0.160 46.91855 0.728 Spearman 0.9328
2 continuous_var -0.090 61.04525 0.848 Spearman 0.9328
3 continuous_var -0.180 66.09050 0.699 Spearman 0.9328
4 continuous_var -0.230 69.11765 0.613 Spearman 0.9328
5 continuous_var 0.130 48.93665 0.788 Spearman 0.9328
6 continuous_var -0.180 66.09050 0.699 Spearman 0.9328
7 continuous_var 0.250 41.87330 0.585 Spearman 0.9328
8 continuous_var -0.490 83.24435 0.268 Spearman 0.9328
9 continuous_var -0.110 62.16540 0.814 Spearman 0.9328
2: Cluster annotation vs outcome
Granular population outcome p statistic p.value conf.low conf.high method padj
CD4 Naive continuous_var -0.051 -0.1139557 0.914 -0.7742781 0.7301467 Pearson 0.95
CD4 RARO Tscm continuous_var 0.150 0.3388990 0.748 -0.6799366 0.8113503 Pearson 0.95
CD4 Tcm continuous_var 0.031 0.0686768 0.948 -0.7394540 0.7660474 Pearson 0.95
CD4 Tem continuous_var 0.029 0.0659301 0.950 -0.7400100 0.7655396 Pearson 0.95
CD4 Tte continuous_var 0.057 0.1268631 0.904 -0.7274442 0.7765762 Pearson 0.95

Discrete statistics

Akin to correlation against a continuous variable, you can also provide a discrete metadata variable for contrast statistics (T-tests, Wilcoxon tests, etc).

#1: Louvain cluster relative_frequencies non-parametric with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "non-parametric")

#2: Louvain cluster relative_frequencies T-test with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "parametric")

Examples

1: Non-Parametric
louvain group1 group2 n1 n2 statistic p.value Test outcome
1 Control Test 4 3 7.0 0.857 Non-parametric Wilcoxon Test discrete_group
10 Control Test 4 3 4.0 0.629 Non-parametric Wilcoxon Test discrete_group
11 Control Test 4 3 4.0 0.629 Non-parametric Wilcoxon Test discrete_group
2 Control Test 4 3 7.0 0.857 Non-parametric Wilcoxon Test discrete_group
3 Control Test 4 3 6.0 1.000 Non-parametric Wilcoxon Test discrete_group
4 Control Test 4 3 10.0 0.229 Non-parametric Wilcoxon Test discrete_group
5 Control Test 4 3 8.0 0.629 Non-parametric Wilcoxon Test discrete_group
6 Control Test 4 3 10.0 0.229 Non-parametric Wilcoxon Test discrete_group
7 Control Test 4 3 2.0 0.229 Non-parametric Wilcoxon Test discrete_group
8 Control Test 4 3 10.0 0.229 Non-parametric Wilcoxon Test discrete_group
9 Control Test 4 3 4.5 0.719 Non-parametric Wilcoxon Test discrete_group
2: Parametric
louvain group1 group2 n1 n2 statistic df p.value Test outcome
1 Control Test 4 3 0.6475306 4.995724 0.5460 Parametric 2 tailed T-test discrete_group
10 Control Test 4 3 -0.8266370 2.711678 0.4750 Parametric 2 tailed T-test discrete_group
11 Control Test 4 3 -1.0853855 3.962754 0.3390 Parametric 2 tailed T-test discrete_group
2 Control Test 4 3 0.4935197 4.510478 0.6450 Parametric 2 tailed T-test discrete_group
3 Control Test 4 3 -0.0672052 3.422912 0.9500 Parametric 2 tailed T-test discrete_group
4 Control Test 4 3 1.3411261 4.998986 0.2380 Parametric 2 tailed T-test discrete_group
5 Control Test 4 3 0.5560583 4.430742 0.6050 Parametric 2 tailed T-test discrete_group
6 Control Test 4 3 1.7331882 4.986497 0.1440 Parametric 2 tailed T-test discrete_group
7 Control Test 4 3 -1.7002722 4.945435 0.1500 Parametric 2 tailed T-test discrete_group
8 Control Test 4 3 2.1301794 4.929100 0.0872 Parametric 2 tailed T-test discrete_group
9 Control Test 4 3 -0.4398776 3.708202 0.6840 Parametric 2 tailed T-test discrete_group

Given that the function provided for this only looks at one column, this would not work if you wanted to look at the interaction of 2 variables (e.g. a 2 group design at 2 timepoints).

To circumvent this, you can create an aggregated column in the metadata like in the following example, and perform the test on that new column

new_meta <- pData(cd4_flow$flowSet) %>%
            unite(col = "aggregate_outcome", discrete_group, discrete_group2, sep = "::" ) %>%
            mutate(aggregate_outcome = as.factor(aggregate_outcome)) %>%
            rownames_to_column("SampleID")
cd4_flow <- add_sample_metadata(cd4_flow, metadata = new_meta)


clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "aggregate_outcome", method = "parametric")

Exporting your analysis

Exporting to a FCS file

A FCS file can hold the new UMAP and louvain values generated. It can also hold the metadata, albeit not as words. A workaround is to encode the discrete variables as number, and export a csv key as a dictionary.

Thus, you’d be able to explore your data in an environment you might be more familiar with.

export_to_fcs(cd4_flow,
              filename = "output/data/demo_unsupervised.fcs")
## [1] "output/data/demo_unsupervised.fcs"

Exporting to RDS

If you want to save your analysis as a whole, you can export your flow object to R’s RDS format, to then either load at a later time or share.

saveRDS(cd4_flow, file = "output/demo_flow_object.RDS")
#You can read it later using
cd4_flow <- readRDS("output/demo_flow_object.RDS")

Export to spreadsheet

If you wish to perform your statistical analyses or some plotting outside of R, and wish to access the values embedded in the flow object, you can use the export_to_spreadsheet function.

Each tab will represent a different type of values, i.e.:

  • The fluorescence values annotated with metadata variables
  • Cluster counts/frequencies
  • Cluster Median Fluorescence Intensity
export_to_spreadsheet(cd4_flow, filename = "output/demo_object.xlsx")

Session info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
##  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
##  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Toronto
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.2     flowCore_2.12.0     unsupflowhelper_1.2
##  [4] forcats_1.0.0       stringr_1.5.0       dplyr_1.1.2        
##  [7] purrr_1.0.1         readr_2.1.4         tidyr_1.2.0        
## [10] tibble_3.2.1        ggplot2_3.4.2       tidyverse_1.3.2    
## 
## loaded via a namespace (and not attached):
##   [1] mathjaxr_1.6-0              RColorBrewer_1.1-3         
##   [3] rstudioapi_0.15.0           jsonlite_1.8.7             
##   [5] magrittr_2.0.3              farver_2.1.1               
##   [7] rmarkdown_2.23              fs_1.6.2                   
##   [9] zlibbioc_1.46.0             vctrs_0.6.3                
##  [11] ggpointdensity_0.1.0        base64enc_0.1-3            
##  [13] rstatix_0.7.0               htmltools_0.5.5            
##  [15] haven_2.5.0                 broom_1.0.0                
##  [17] cellranger_1.1.0            googlesheets4_1.1.1        
##  [19] sass_0.4.6                  KernSmooth_2.23-22         
##  [21] bslib_0.5.0                 plyr_1.8.8                 
##  [23] lubridate_1.9.2             cachem_1.0.8               
##  [25] igraph_1.5.0                mime_0.12                  
##  [27] lifecycle_1.0.3             pkgconfig_2.0.3            
##  [29] Matrix_1.6-0                R6_2.5.1                   
##  [31] fastmap_1.1.1               shiny_1.7.4.1              
##  [33] numDeriv_2016.8-1.1         digest_0.6.33              
##  [35] ggnewscale_0.4.9            colorspace_2.1-0           
##  [37] S4Vectors_0.38.1            irlba_2.3.5.1              
##  [39] ggpubr_0.4.0                labeling_0.4.2             
##  [41] metadat_1.2-0               cytolib_2.12.0             
##  [43] fansi_1.0.4                 colorRamps_2.3.1           
##  [45] timechange_0.2.0            polyclip_1.10-4            
##  [47] httr_1.4.6                  abind_1.4-5                
##  [49] compiler_4.3.1              gargle_1.5.1               
##  [51] withr_2.5.0                 ConsensusClusterPlus_1.64.0
##  [53] backports_1.4.1             metafor_4.2-0              
##  [55] carData_3.0-5               DBI_1.1.3                  
##  [57] hexbin_1.28.3               highr_0.10                 
##  [59] ggforce_0.4.1               ggsignif_0.6.4             
##  [61] MASS_7.3-60                 FlowSOM_2.4.0              
##  [63] tools_4.3.1                 googledrive_2.1.1          
##  [65] httpuv_1.6.11               glue_1.6.2                 
##  [67] nlme_3.1-162                promises_1.2.0.1           
##  [69] grid_4.3.1                  checkmate_2.2.0            
##  [71] Rtsne_0.16                  cluster_2.1.4              
##  [73] generics_0.1.3              gtable_0.3.3               
##  [75] tzdb_0.4.0                  MetaCyto_1.18.0            
##  [77] rmdformats_1.0.4            data.table_1.14.8          
##  [79] hms_1.1.3                   xml2_1.3.5                 
##  [81] car_3.1-0                   utf8_1.2.3                 
##  [83] BiocGenerics_0.46.0         RcppAnnoy_0.0.21           
##  [85] ggrepel_0.9.3               RANN_2.6.1                 
##  [87] pillar_1.9.0                later_1.3.1                
##  [89] tweenr_2.0.2                lattice_0.21-8             
##  [91] RProtoBufLib_2.12.0         CytoML_2.8.0               
##  [93] tidyselect_1.2.0            RBGL_1.76.0                
##  [95] ggcyto_1.24.1               pbapply_1.7-2              
##  [97] knitr_1.43                  gridExtra_2.3              
##  [99] bookdown_0.34               flowWorkspace_4.12.0       
## [101] Rphenograph_0.99.1          scattermore_0.8            
## [103] stats4_4.3.1                xfun_0.39                  
## [105] Biobase_2.60.0              matrixStats_1.0.0          
## [107] pheatmap_1.0.12             stringi_1.7.8              
## [109] ncdfFlow_2.46.0             yaml_2.3.7                 
## [111] evaluate_0.21               codetools_0.2-19           
## [113] Rgraphviz_2.44.0            graph_1.78.0               
## [115] cli_3.6.1                   uwot_0.1.16                
## [117] RcppParallel_5.1.7          xtable_1.8-4               
## [119] munsell_0.5.0               jquerylib_0.1.4            
## [121] modelr_0.1.8                Rcpp_1.0.11                
## [123] readxl_1.4.3                dbplyr_2.2.1               
## [125] fastcluster_1.2.3           XML_3.99-0.14              
## [127] parallel_4.3.1              ellipsis_0.3.2             
## [129] assertthat_0.2.1            reprex_2.0.2               
## [131] scales_1.2.1                ggridges_0.5.4             
## [133] writexl_1.4.2               crayon_1.5.2               
## [135] rlang_1.1.1                 cowplot_1.1.1              
## [137] rvest_1.0.3

  1. RPM Bioinfo Solutions, ↩︎

  2. Emory University, ↩︎

  3. Allen Institute for Immunology↩︎