Analysis of high-dimensional flow-cytometry using UMAP and Phenograph-clustering

Before starting:

1. Acquire data on Flow cytometer of interest

2. Export to flowjo/Fix compensation

3. Gate on live events – To increase granularity of analyses you can specifically gate on a sub-population (i.e. Tregs, CD4s etc.)

4. Export gated events/sample as individual FCS files: In the export window choose format as: FCS

Installation

First run these steps if the pipeline is not installed on your computer (expand code by clicking the Code button in the lower right)

REQUIREMENTS

if(!require(devtools)){
  install.packages("devtools") 
}
library(devtools)
install_github("sekalylab/unsupflowhelper")

Load libraries

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(unsupflowhelper))
suppressPackageStartupMessages(library(flowCore))
suppressPackageStartupMessages(library(patchwork)) ## to combine multiple plots in a single plot
knitr::opts_chunk$set(echo = TRUE,time_it = TRUE)


set.seed(1)

library(future)
plan("multicore", workers = 8)
options(future.globals.maxSize = 8000 * 1024^2)

Flow Object

The flow object is a new data structure that will contain:

A flowSet
Additional parameters (i.e. excluded markers, transformation cofactors, statistical test results, etc)

It is the basis of how our pipeline functions

The CreateFlowObject function has multiple optional arguments that can be supplied. By default, it has preset values for subsampling, minimum number of events, equalization across samples.

You can either supply a flowSet from flowCore (generated from FCS files), or a list of filenames (CSV or FCS) as input to create a Flow Object.

If you have csv files exported from an older analysis, you can use them as well. Files are assumed to be from the same experiment, and markers in the same order and spelled the same way. WARNING: this is not the main intended way to run the analysis, FCS files are preferable going forward.

The FlowSet

FlowSets (and the individual flowframes within for each sample) can interact with multiple R packages and are thus one of the best ways to managing flow data in R. This will allow to:

Adjust compensations
Plot histograms/scatter plots
Batch correct
etc.

R can directly read FCS files with the flowCore package. All you need to do is supply the path of a list of fcs files. and run read.FlowSet(). If you wish to skip directly to the Flow Object, you can simply supply file paths.

Option 1: Interactive Mode

## Default call (BASIC)
flow_object <- CreateFlowObject(name = "test_stain", interactive =T) ## identifier for the dataset

Interactive mode GUI

Option2: Provide files

#1. List files 

fileLS <- list.files("path/to/your/inputfiles", 
                     pattern = "*.fcs$",  #or '*.csv$` if your data is in csv. 
                     full.names = T)

flow_object <- CreateFlowObject(files = fileLS, 
                            name = "test_stain") ## identifier for the dataset

Test dataset

The pipeline includes a test dataset to get familiar with the functions.

data("cd4_flow")

HOW-TO : Interact with a flow object

See embedded markers

Get_MarkerNames(cd4_flow, select = "all")

##      FJComp-APC-A  FJComp-APC-Cy7-A FJComp-APC-R700-A    FJComp-BB515-A 
##             "ID2"            "CD39"            "CD28"            "CCR5" 
##    FJComp-BB630-A    FJComp-BB660-A    FJComp-BB700-A    FJComp-BB750-A 
##          "SLAMF6"           "CD101"            "IFNg"           "NR4A1" 
##   FJComp-BUV395-A   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV615-A 
##            "BCL2"             "CD3"             "CD4"          "CD45RA" 
##   FJComp-BUV661-A   FJComp-BUV737-A   FJComp-BUV805-A    FJComp-BV421-A 
##            "CD27"             "CD8"          "CD45RO"            "TCF7" 
##    FJComp-BV480-A    FJComp-BV570-A    FJComp-BV605-A    FJComp-BV650-A 
## "DUMP(CD14_CD19)"            "Ki67"            "IRF4"            "CD69" 
##    FJComp-BV711-A    FJComp-BV750-A    FJComp-BV786-A     FJComp-DAPI-A 
##           "EOMES"            "CCR7"            "TBET"              "LD" 
##       FJComp-PE-A FJComp-PE-CF594-A   FJComp-PE-Cy5-A   FJComp-PE-Cy7-A 
##             "TOX"             "TNF"            "CD95"             "PD1"

# You can also show only the 'excluded' or 'included' markers. see `?Get_MarkerNames()`

Get sample metadata

Get_MetaData(cd4_flow)

	filename	name	discrete_group	discrete_group2	continuous_var
Sample_1	export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs	Sample_1	Control	Time1	8
Sample_2	export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs	Sample_2	Test	Time1	10
Sample_3	export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs	Sample_3	Control	Time2	3
Sample_4	export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs	Sample_4	Test	Time2	15
Sample_5	export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs	Sample_5	Control	Time1	7
Sample_6	export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs	Sample_6	Test	Time1	18
Sample_7	export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs	Sample_7	Control	Time2	8

Exclude additional parameters (OPTIONAL)

By default, the FSC/SSC/Time parameters are excluded.

However, if you pregated using additional markers, such as CD3, or Live/dead stains, it’s highly recommended that you exclude them from the unsupervised analysis.

If you wish to exclude additional parameters, you can use the exclude_parameters function.

If you need a reminder of the available marker names in your flow object, you can still look with Get_MarkerNames(cd4_flow)

## View excluded markers
Get_MarkerNames(cd4_flow, select = "excluded", show_channel_name = T)

##   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV737-A    FJComp-BV480-A 
##             "CD3"             "CD4"             "CD8" "DUMP(CD14_CD19)" 
##     FJComp-DAPI-A 
##              "LD"

# Exclude additional markers
cd4_flow <- exclude_parameters(flow_object = cd4_flow,
                              exclude_markers = c("CD3", "LD", "DUMP(CD14_CD19)", "CD4", "CD8") )

## View excluded markers
Get_MarkerNames(cd4_flow, select = "excluded", show_channel_name = T)

##   FJComp-BUV496-A   FJComp-BUV563-A   FJComp-BUV737-A    FJComp-BV480-A 
##             "CD3"             "CD4"             "CD8" "DUMP(CD14_CD19)" 
##     FJComp-DAPI-A 
##              "LD"

Add metadata

If you have sample metadata, such as group, timepoints, treatments, or otherwise, this is where you add it to your flow object. It can be added at any time, or not at all, but some options for plotting and statistics will require it.

You need to have a dataframe, usually imported from a csv or spreadsheet, OR you can use interactive mode to load a file.

If you have continuous (numerical) variables, make sure R read them as numeric (class(metadata_df[["continuous_variable"]]) should be numeric, not character).
If you have discrete groups, you can factor them if you wish them ordered in a certain way, otherwise R will treat them in alphabetical order. e.g. metadata_df[["discrete_group"]] <- factor(metadata_df[["discrete_group"]], levels = c("Control", "Acute", "Chronic"))
The dataframe REQUIRES a column that matches the samplenames in the flow object, to be specificed with the ‘key_column’ parameter.

Option 1: Interactive mode

cd4_flow <- add_sample_metadata(flow_object = cd4_flow, interactive = T)

test <- CreateFlowObject(name = "test", interactive = T)

Add Metadata Interactive Mode

Option 2: Provide data.frame

#read_metadata
metadata_df <- read.csv("input/metadata_flow_vignette_demo.csv")

## Add  metadta to flow_object
cd4_flow <- add_sample_metadata(flow_object = cd4_flow, metadata = metadata_df, key_column = "filename", flowobj_column = "filename")

#View phenoData
Get_MetaData(cd4_flow)

	filename	name	discrete_group	discrete_group2	continuous_var
Sample_1	export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs	Sample_1	Control	Time1	1
Sample_2	export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs	Sample_2	Test	Time2	10
Sample_3	export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs	Sample_3	Test	Time1	14
Sample_4	export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs	Sample_4	Test	Time1	20
Sample_5	export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs	Sample_5	Control	Time1	4
Sample_6	export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs	Sample_6	Control	Time2	9
Sample_7	export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs	Sample_7	Control	Time2	2

Find cofactors (OPTIONAL)

For the unsupervised analysis, we can use an inverse hyperbolic sine transformation of the data. This requires division by a cofactor, but different channels require different cofactors depending on how strongly they are expressed.

This method attempts at finding an optimal cofactor per channel, by incrementally increasing the cofactor value until only a small portion (0.1%) of events fall under zero when transformed.

## Find cofactors function
cd4_flow <- find_cofactors(flow_object = cd4_flow)

#View cofactors
cd4_flow$parameters$cofactors

##    ID2   CD39   CD28   CCR5 SLAMF6  CD101   IFNg  NR4A1   BCL2 CD45RA   CD27 
##    735    885    720    295    110    830   1595   2215     50    570    840 
## CD45RO   TCF7   Ki67   IRF4   CD69  EOMES   CCR7   TBET    TOX    TNF   CD95 
##   2895    345   1220    140   1475   2495   2685   2640   2765   3320    810 
##    PD1 
##    910

Add reference controls (OPTIONAL)

While staining controls will not get used for clustering and dimension reduction, they will be useful when comparing cluster expression to the baseline, and interpreting not only marker relative expression, but whether it is positive or negative.

Controls are not necessary for EVERY marker: a FMO for a strong marker such as CD4 on T cells is rarely warranted. So, in the absence of a given control for its corresponding marker, the Unstained control will be used. The Unstained control is therefore necessary.

## [1] "Sample A8 (export_Unstained_A8_A08_T Lymphocytes.fcs) only contains 691 events. Using all 691 events"

# List control files
## Names must fit with markernames in the flowSet's `markernames()`

## Option 1
#You can run the function interactively: it will open up a browser window to select the 
#files on your computer for each desired marker, and then adds them to your flow_object. 
#You can also add controls for excluded markers for some plotting functions.
cd4_flow <- add_staining_controls(cd4_flow)


## Option 2
# Add staining controls manually from a list 
control_LS <- list("Unstained" = "input/controls/export_Unstained_A8_A08_T Lymphocytes.fcs",
                   "EOMES" = "path_to_your_EOMES_file.fcs",
                   "T-BET" = "path_to_your_T-BET_file.fcs")

cd4_flow <- add_staining_controls(cd4_flow, control_list = control_LS, interactive = F)

Run UMAP and clustering

# run UMAP
cd4_flow <- UMAP_flow(cd4_flow,transformation = "asinh" )

## [1] "Performing dimension reduction using UMAP..."

## Alternative: use another transformation on the data. 
## Available options are `log`, `cytofAsinh`, or `none`. 
## The latter uses the biexponential data as is. 
cd4_flow_biexp <- UMAP_flow(flow_object = cd4_flow,
                     transformation = "none")

## [1] "Performing dimension reduction using UMAP..."

# Run Phenograph
cd4_flow <- Cluster_flow(cd4_flow, k = 60)

## [1] "Performing clustering using RPhenograph..."
##   Finding nearest neighbors...DONE ~ 53.449 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.263 s
##   Build undirected graph from the weighted links...DONE ~ 3.835 s
##   Run louvain clustering on the graph ...DONE ~ 15.798 s
##   Return a community class
##   -Modularity value: 0.6643943 
##   -Number of clusters: 12

cd4_flow_biexp <- Cluster_flow(cd4_flow_biexp, k = 60)

## [1] "Performing clustering using RPhenograph..."
##   Finding nearest neighbors...DONE ~ 47.58 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.265 s
##   Build undirected graph from the weighted links...DONE ~ 3.783 s
##   Run louvain clustering on the graph ...DONE ~ 12.127 s
##   Return a community class
##   -Modularity value: 0.6806686 
##   -Number of clusters: 14

Cluster annotation (OPTIONAL)

Having louvain labels is a good basis for unbiased analysis. But perhaps, after looking at the data, you are able to discern the biological identity of those louvain clusters and wish to have your plots reflect that instead of just numbers.

You can provide a data.frame, where one column (called louvain) contains ALL the louvain clusters from your analysis. All additional columns (that you name however you see fit) contain your labels of choice.

The function provided for this is called annotate_clusters and can be used multiple times

cd4_flow <- annotate_clusters(cd4_flow, interactive = T)

#View cluster annotations
Get_ClusterAnnotations(cd4_flow)

	louvain	Cell Type	Granular population
8	1	CD4 Memory	CD4 Tcm
1	2	CD4 Effector	CD4 Tte
9	3	CD4 Memory	CD4 Tem
11	4	CD4 Naive	CD4 Naive
12	5	CD4 Memory	CD4 Tem
5	6	CD4 Memory	CD4 Tem
3	7	CD4 Naive	CD4 Naive
2	8	CD4 Memory	CD4 Tem
6	9	CD4 Naive	CD4 Naive
10	10	CD4 Naive	CD4 Naive
4	11	CD4 Memory	CD4 RARO Tscm
NA	NA	NA	NA

Plotting

Many visualizations are available within the pipeline, without needing to resort to FlowJo. That being said, if that’s your preferred way of doing things, look for how to export the flow object to FCS at the end.

HOW-TO: Save a plot

Before exploring different visual options, you need to be familiar with how to save your plots to file. All plotting options in the pipeline use ggplot2 and thus can use ggsave for exporting visuals to file.

Alternatively, you can use R’s default image engines: pdf(), svg(), tiff(), jpeg(), png(), then the plot, followed by dev.off()

When using a plotting function, you can either save it to a variable (p <- plotting_function(...)), or use it directly within ggsave (ggsave(plotting_function(...), device = "pdf")).

For more details on ggsave, look in the documentation using ?ggsave

RECOMMENDED: when exporting UMAP plots, you should export to a jpeg format, to avoid very large files.

#1. Option 1
p <- plotting_function(...)

ggsave(p, filename = "output/UMAP_plot.tiff", device = "tiff", width = 7, height = 7, units = "in", dpi = 300))


#. Option 2
tiff("output/UMAP_plot.tiff", width = 7, height = 7, units = "in", dpi = 300)
print(p)
dev.off()

UMAP plot

The raw visualization of UMAP dimension reduction. Often known as DimPlot for Seurat aficionados.

Most of the options you find here will also be found

You can split.by metadata discrete columns (up to 3 at a time)
You can group.by a metadata discrete column OR louvain clusters to color cells.
You can choose to display only a few selected cluster labels when group.by is set to "louvain" by passing a vector of clusters, e.g. c(1,4,6)

## BASIC UMAP plot: simple call
# Example 1: Raw plot
plot_umap(flow_object = cd4_flow) 


# Example 2a: Louvain clusters
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain")

# Example 2b: Louvain clusters
plot_umap(flow_object = cd4_flow_biexp, 
          group.by = "louvain")

# Example 3: Louvain clusters separated by a metadata variable
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          split.by = "discrete_group")

# Example 4: Select only a subset of cluster labels to show
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          label_filter = c(1,4,5))

# Example 5: Color by cluster annotations OR experimental groups
plot_umap(flow_object = cd4_flow, 
          group.by = "Granular population")

# Example 6: Remove labels, and replace with legend
plot_umap(flow_object = cd4_flow, 
          group.by = "Granular population",
          label = F)

# Example 7: Louvain clusters separated by multiple metadata variables
plot_umap(flow_object = cd4_flow, 
          group.by = "louvain",
          split.by = c("discrete_group","discrete_group2"))

 # Example 8: Color by one metadata variable and split by another. 
plot_umap(flow_object = cd4_flow,
          group.by = "discrete_group",
          split.by = "discrete_group2",
          dot_size = 1,
          ncols = 1)

Examples

1: Default

2a: Louvain clusters (asinh transform)

2b: Louvain clusters (biexponential transform)

3: Split by metadata variable

4: Filter cluster labels

5: Color by cluster annotation

6: Color by experimental groups; no label

7: Split by multiple metadata variables

8: Combine split.by and group.by

Density Plot

Instead of displaying clusters or metadata groups, these plots are meant to display point density, in a pseudocolour distribution similar to FlowJo.

Similar to UMAP_plot, you can split.by up to 3 variables.

#1
plot_density_umap(cd4_flow) 

#2

plot_density_umap(cd4_flow,  
                  split.by = "discrete_group",
                  dot_size = 0.5)

Examples

1: Default

2: Split

UMAP heat plot

Visualization of the expression of individual markers on the UMAP plot

You can also visualize fluorescent markers that were excluded.

#1
plot_umap_heat(cd4_flow, markers = "CD45RA" ) 

#2
plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 100) + 
  plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 10000)


#3
plot_umap_heat(cd4_flow, markers = c("CD45RA", "CD45RO", "CCR7", "EOMES", "CD27"), legend_height = 8) 




#3 Combine heatmaps into a complex arrangement

Examples

1: Default

2: Quantile Saturation

3: Complex combination

Cluster visuals

Cluster heatmap

# 1 Basic relative expression heatmap

plot_cluster_heatmap(cd4_flow)


#2 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T)

#3 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T, annotation = "Granular population")

#4 Subset louvain clusters
plot_cluster_heatmap(cd4_flow, show_control = T, louvain_select = c(1,5,7,12,9))

#5 Include Absolute expression annotation
plot_cluster_heatmap(cd4_flow, show_control = T, add_mfi = T)

Examples

1: Default

2: With Controls

3: With cluster annotations

4: Louvain cluster subset

5: + MFI annotation

Cluster histogram

#1
plot_cluster_histograms(cd4_flow, marker = "CD45RA") #1

#2

plot_cluster_histograms(cd4_flow, marker = "EOMES", show_control = T) #2

#3
plot_cluster_histograms(cd4_flow, marker = "CD45RA", clusters = c(1,2,5)) #3

Examples

1: Default

2: Include control

3: Cluster Filter

Cluster jitterplot

#1
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain")

#2
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain", 
                   louvain_select = c(1,5,7), 
                   value = "absolute")

#3

cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "louvain",
                   group = "discrete_group")


#4
cluster_jitterplot(flow_object = cd4_flow, 
                   annotation = "Granular population", 
                   group = "discrete_group")

Examples

1: Default

2: Absolute counts + cluster filter

3: Compare metadata groups

4: Using cluster annotations

Cluster stacked barplot

#1


cluster_stacked_barplot(cd4_flow) 

#2
cluster_stacked_barplot(cd4_flow, group = "discrete_group", annotation = "Granular population")

Examples

1: Default

2: Group.by and split.by

Extract data

Retrieve expression data

df <- get_flowSet_matrix(cd4_flow,add_sample_id = T, annotations = T )

freq_df <- get_cluster_frequencies(cd4_flow)

Statistical comparisons

While an exhaustive statistical support for all possible experimental designs is beyond the scope of this pipeline, we can provide simple support for simpler designs, such as correlation or discrete statistics with metadata outcomes.

These tests are meant for initial quick exploration of your data: they are unlikely to be the perfect test for your specific hypothesis. For instance, in outcomes with more than 2 levels, consider one-way ANOVA with a post-hoc TukeyHSD test if doing a parametric analysis. Or in a paired design, adapt the test accordingly.

These are not testing for statistical assumptions either: normality, equality of variances, etc.

R package rstatix is very helpful in quickly doing those additional tests.

Regression

#1: Correlate louvain cluster relative_frequencies with a given continuous outcome
correlate_clusters(cd4_flow, group.by = "louvain", outcome = "continuous_var", method = "spearman")

#2 Correlate cluster annotations instead of louvain
correlate_clusters(cd4_flow, group.by = "Granular population", outcome = "continuous_var", method = "pearson")

Examples

1: Louvain vs outcome

louvain	outcome	rho	statistic	p.value	method	padj
1	continuous_var	0.220	43.89140	0.641	Spearman	0.9328
10	continuous_var	-0.036	58.01810	0.939	Spearman	0.9390
11	continuous_var	0.160	46.91855	0.728	Spearman	0.9328
2	continuous_var	-0.090	61.04525	0.848	Spearman	0.9328
3	continuous_var	-0.180	66.09050	0.699	Spearman	0.9328
4	continuous_var	-0.230	69.11765	0.613	Spearman	0.9328
5	continuous_var	0.130	48.93665	0.788	Spearman	0.9328
6	continuous_var	-0.180	66.09050	0.699	Spearman	0.9328
7	continuous_var	0.250	41.87330	0.585	Spearman	0.9328
8	continuous_var	-0.490	83.24435	0.268	Spearman	0.9328
9	continuous_var	-0.110	62.16540	0.814	Spearman	0.9328

2: Cluster annotation vs outcome

Granular population	outcome	p	statistic	p.value	conf.low	conf.high	method	padj
CD4 Naive	continuous_var	-0.051	-0.1139557	0.914	-0.7742781	0.7301467	Pearson	0.95
CD4 RARO Tscm	continuous_var	0.150	0.3388990	0.748	-0.6799366	0.8113503	Pearson	0.95
CD4 Tcm	continuous_var	0.031	0.0686768	0.948	-0.7394540	0.7660474	Pearson	0.95
CD4 Tem	continuous_var	0.029	0.0659301	0.950	-0.7400100	0.7655396	Pearson	0.95
CD4 Tte	continuous_var	0.057	0.1268631	0.904	-0.7274442	0.7765762	Pearson	0.95

Discrete statistics

Akin to correlation against a continuous variable, you can also provide a discrete metadata variable for contrast statistics (T-tests, Wilcoxon tests, etc).

#1: Louvain cluster relative_frequencies non-parametric with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "non-parametric")

#2: Louvain cluster relative_frequencies T-test with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "parametric")

Examples

1: Non-Parametric

louvain	group1	group2	n1	n2	statistic	p.value	Test	outcome
1	Control	Test	4	3	7.0	0.857	Non-parametric Wilcoxon Test	discrete_group
10	Control	Test	4	3	4.0	0.629	Non-parametric Wilcoxon Test	discrete_group
11	Control	Test	4	3	4.0	0.629	Non-parametric Wilcoxon Test	discrete_group
2	Control	Test	4	3	7.0	0.857	Non-parametric Wilcoxon Test	discrete_group
3	Control	Test	4	3	6.0	1.000	Non-parametric Wilcoxon Test	discrete_group
4	Control	Test	4	3	10.0	0.229	Non-parametric Wilcoxon Test	discrete_group
5	Control	Test	4	3	8.0	0.629	Non-parametric Wilcoxon Test	discrete_group
6	Control	Test	4	3	10.0	0.229	Non-parametric Wilcoxon Test	discrete_group
7	Control	Test	4	3	2.0	0.229	Non-parametric Wilcoxon Test	discrete_group
8	Control	Test	4	3	10.0	0.229	Non-parametric Wilcoxon Test	discrete_group
9	Control	Test	4	3	4.5	0.719	Non-parametric Wilcoxon Test	discrete_group

2: Parametric

louvain	group1	group2	n1	n2	statistic	df	p.value	Test	outcome
1	Control	Test	4	3	0.6475306	4.995724	0.5460	Parametric 2 tailed T-test	discrete_group
10	Control	Test	4	3	-0.8266370	2.711678	0.4750	Parametric 2 tailed T-test	discrete_group
11	Control	Test	4	3	-1.0853855	3.962754	0.3390	Parametric 2 tailed T-test	discrete_group
2	Control	Test	4	3	0.4935197	4.510478	0.6450	Parametric 2 tailed T-test	discrete_group
3	Control	Test	4	3	-0.0672052	3.422912	0.9500	Parametric 2 tailed T-test	discrete_group
4	Control	Test	4	3	1.3411261	4.998986	0.2380	Parametric 2 tailed T-test	discrete_group
5	Control	Test	4	3	0.5560583	4.430742	0.6050	Parametric 2 tailed T-test	discrete_group
6	Control	Test	4	3	1.7331882	4.986497	0.1440	Parametric 2 tailed T-test	discrete_group
7	Control	Test	4	3	-1.7002722	4.945435	0.1500	Parametric 2 tailed T-test	discrete_group
8	Control	Test	4	3	2.1301794	4.929100	0.0872	Parametric 2 tailed T-test	discrete_group
9	Control	Test	4	3	-0.4398776	3.708202	0.6840	Parametric 2 tailed T-test	discrete_group

Given that the function provided for this only looks at one column, this would not work if you wanted to look at the interaction of 2 variables (e.g. a 2 group design at 2 timepoints).

To circumvent this, you can create an aggregated column in the metadata like in the following example, and perform the test on that new column

new_meta <- pData(cd4_flow$flowSet) %>%
            unite(col = "aggregate_outcome", discrete_group, discrete_group2, sep = "::" ) %>%
            mutate(aggregate_outcome = as.factor(aggregate_outcome)) %>%
            rownames_to_column("SampleID")
cd4_flow <- add_sample_metadata(cd4_flow, metadata = new_meta)


clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "aggregate_outcome", method = "parametric")

Exporting your analysis

Exporting to a FCS file

A FCS file can hold the new UMAP and louvain values generated. It can also hold the metadata, albeit not as words. A workaround is to encode the discrete variables as number, and export a csv key as a dictionary.

Thus, you’d be able to explore your data in an environment you might be more familiar with.

export_to_fcs(cd4_flow,
              filename = "output/data/demo_unsupervised.fcs")

## [1] "output/data/demo_unsupervised.fcs"

Exporting to RDS

If you want to save your analysis as a whole, you can export your flow object to R’s RDS format, to then either load at a later time or share.

saveRDS(cd4_flow, file = "output/demo_flow_object.RDS")
#You can read it later using
cd4_flow <- readRDS("output/demo_flow_object.RDS")

Export to spreadsheet

If you wish to perform your statistical analyses or some plotting outside of R, and wish to access the values embedded in the flow object, you can use the export_to_spreadsheet function.

Each tab will represent a different type of values, i.e.:

The fluorescence values annotated with metadata variables
Cluster counts/frequencies
Cluster Median Fluorescence Intensity

export_to_spreadsheet(cd4_flow, filename = "output/demo_object.xlsx")

Session info

sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
##  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
##  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Toronto
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.2     flowCore_2.12.0     unsupflowhelper_1.2
##  [4] forcats_1.0.0       stringr_1.5.0       dplyr_1.1.2        
##  [7] purrr_1.0.1         readr_2.1.4         tidyr_1.2.0        
## [10] tibble_3.2.1        ggplot2_3.4.2       tidyverse_1.3.2    
## 
## loaded via a namespace (and not attached):
##   [1] mathjaxr_1.6-0              RColorBrewer_1.1-3         
##   [3] rstudioapi_0.15.0           jsonlite_1.8.7             
##   [5] magrittr_2.0.3              farver_2.1.1               
##   [7] rmarkdown_2.23              fs_1.6.2                   
##   [9] zlibbioc_1.46.0             vctrs_0.6.3                
##  [11] ggpointdensity_0.1.0        base64enc_0.1-3            
##  [13] rstatix_0.7.0               htmltools_0.5.5            
##  [15] haven_2.5.0                 broom_1.0.0                
##  [17] cellranger_1.1.0            googlesheets4_1.1.1        
##  [19] sass_0.4.6                  KernSmooth_2.23-22         
##  [21] bslib_0.5.0                 plyr_1.8.8                 
##  [23] lubridate_1.9.2             cachem_1.0.8               
##  [25] igraph_1.5.0                mime_0.12                  
##  [27] lifecycle_1.0.3             pkgconfig_2.0.3            
##  [29] Matrix_1.6-0                R6_2.5.1                   
##  [31] fastmap_1.1.1               shiny_1.7.4.1              
##  [33] numDeriv_2016.8-1.1         digest_0.6.33              
##  [35] ggnewscale_0.4.9            colorspace_2.1-0           
##  [37] S4Vectors_0.38.1            irlba_2.3.5.1              
##  [39] ggpubr_0.4.0                labeling_0.4.2             
##  [41] metadat_1.2-0               cytolib_2.12.0             
##  [43] fansi_1.0.4                 colorRamps_2.3.1           
##  [45] timechange_0.2.0            polyclip_1.10-4            
##  [47] httr_1.4.6                  abind_1.4-5                
##  [49] compiler_4.3.1              gargle_1.5.1               
##  [51] withr_2.5.0                 ConsensusClusterPlus_1.64.0
##  [53] backports_1.4.1             metafor_4.2-0              
##  [55] carData_3.0-5               DBI_1.1.3                  
##  [57] hexbin_1.28.3               highr_0.10                 
##  [59] ggforce_0.4.1               ggsignif_0.6.4             
##  [61] MASS_7.3-60                 FlowSOM_2.4.0              
##  [63] tools_4.3.1                 googledrive_2.1.1          
##  [65] httpuv_1.6.11               glue_1.6.2                 
##  [67] nlme_3.1-162                promises_1.2.0.1           
##  [69] grid_4.3.1                  checkmate_2.2.0            
##  [71] Rtsne_0.16                  cluster_2.1.4              
##  [73] generics_0.1.3              gtable_0.3.3               
##  [75] tzdb_0.4.0                  MetaCyto_1.18.0            
##  [77] rmdformats_1.0.4            data.table_1.14.8          
##  [79] hms_1.1.3                   xml2_1.3.5                 
##  [81] car_3.1-0                   utf8_1.2.3                 
##  [83] BiocGenerics_0.46.0         RcppAnnoy_0.0.21           
##  [85] ggrepel_0.9.3               RANN_2.6.1                 
##  [87] pillar_1.9.0                later_1.3.1                
##  [89] tweenr_2.0.2                lattice_0.21-8             
##  [91] RProtoBufLib_2.12.0         CytoML_2.8.0               
##  [93] tidyselect_1.2.0            RBGL_1.76.0                
##  [95] ggcyto_1.24.1               pbapply_1.7-2              
##  [97] knitr_1.43                  gridExtra_2.3              
##  [99] bookdown_0.34               flowWorkspace_4.12.0       
## [101] Rphenograph_0.99.1          scattermore_0.8            
## [103] stats4_4.3.1                xfun_0.39                  
## [105] Biobase_2.60.0              matrixStats_1.0.0          
## [107] pheatmap_1.0.12             stringi_1.7.8              
## [109] ncdfFlow_2.46.0             yaml_2.3.7                 
## [111] evaluate_0.21               codetools_0.2-19           
## [113] Rgraphviz_2.44.0            graph_1.78.0               
## [115] cli_3.6.1                   uwot_0.1.16                
## [117] RcppParallel_5.1.7          xtable_1.8-4               
## [119] munsell_0.5.0               jquerylib_0.1.4            
## [121] modelr_0.1.8                Rcpp_1.0.11                
## [123] readxl_1.4.3                dbplyr_2.2.1               
## [125] fastcluster_1.2.3           XML_3.99-0.14              
## [127] parallel_4.3.1              ellipsis_0.3.2             
## [129] assertthat_0.2.1            reprex_2.0.2               
## [131] scales_1.2.1                ggridges_0.5.4             
## [133] writexl_1.4.2               crayon_1.5.2               
## [135] rlang_1.1.1                 cowplot_1.1.1              
## [137] rvest_1.0.3

RPM Bioinfo Solutions, adam.nicolas.pelletier@rpmbioinfo-solutions.com ↩︎
Emory University, ashish.arunkumar.sharma@emory.edu ↩︎
Allen Institute for Immunology↩︎