Analysis of high-dimensional flow-cytometry using UMAP and Phenograph-clustering
Before starting:
1. Acquire data on Flow cytometer of interest
2. Export to flowjo/Fix compensation
3. Gate on live events – To increase granularity of analyses you can specifically gate on a sub-population (i.e. Tregs, CD4s etc.)
4. Export gated events/sample as individual FCS files: In the export window choose format as: FCS
Installation
First run these steps if the pipeline is not installed on your computer (expand code by clicking the Code button in the lower right)
REQUIREMENTS
- If on Mac: XCode
- R 4.1 +
- BiocManager v3.14+
if(!require(devtools)){
install.packages("devtools")
}
library(devtools)
install_github("sekalylab/unsupflowhelper")
Load libraries
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(unsupflowhelper))
suppressPackageStartupMessages(library(flowCore))
suppressPackageStartupMessages(library(patchwork)) ## to combine multiple plots in a single plot
knitr::opts_chunk$set(echo = TRUE,time_it = TRUE)
set.seed(1)
library(future)
plan("multicore", workers = 8)
options(future.globals.maxSize = 8000 * 1024^2)
Flow Object
The flow object is a new data structure that will contain:
- A flowSet
- Additional parameters (i.e. excluded markers, transformation cofactors, statistical test results, etc)
It is the basis of how our pipeline functions
The CreateFlowObject
function has multiple optional
arguments that can be supplied. By default, it has preset values for
subsampling, minimum number of events, equalization across samples.
You can either supply a flowSet from flowCore (generated from FCS files), or a list of filenames (CSV or FCS) as input to create a Flow Object.
If you have csv files exported from an older analysis, you can use them as well. Files are assumed to be from the same experiment, and markers in the same order and spelled the same way. WARNING: this is not the main intended way to run the analysis, FCS files are preferable going forward.
The FlowSet
FlowSets (and the individual flowframes within for each sample) can interact with multiple R packages and are thus one of the best ways to managing flow data in R. This will allow to:
- Adjust compensations
- Plot histograms/scatter plots
- Batch correct
- etc.
R can directly read FCS files with the flowCore package. All you need
to do is supply the path of a list of fcs files. and run
read.FlowSet()
. If you wish to skip directly to the
Flow Object, you can simply supply file paths.
Test dataset
The pipeline includes a test dataset to get familiar with the functions.
HOW-TO : Interact with a flow object
See embedded markers
## FJComp-APC-A FJComp-APC-Cy7-A FJComp-APC-R700-A FJComp-BB515-A
## "ID2" "CD39" "CD28" "CCR5"
## FJComp-BB630-A FJComp-BB660-A FJComp-BB700-A FJComp-BB750-A
## "SLAMF6" "CD101" "IFNg" "NR4A1"
## FJComp-BUV395-A FJComp-BUV496-A FJComp-BUV563-A FJComp-BUV615-A
## "BCL2" "CD3" "CD4" "CD45RA"
## FJComp-BUV661-A FJComp-BUV737-A FJComp-BUV805-A FJComp-BV421-A
## "CD27" "CD8" "CD45RO" "TCF7"
## FJComp-BV480-A FJComp-BV570-A FJComp-BV605-A FJComp-BV650-A
## "DUMP(CD14_CD19)" "Ki67" "IRF4" "CD69"
## FJComp-BV711-A FJComp-BV750-A FJComp-BV786-A FJComp-DAPI-A
## "EOMES" "CCR7" "TBET" "LD"
## FJComp-PE-A FJComp-PE-CF594-A FJComp-PE-Cy5-A FJComp-PE-Cy7-A
## "TOX" "TNF" "CD95" "PD1"
Get sample metadata
filename | name | discrete_group | discrete_group2 | continuous_var | |
---|---|---|---|---|---|
Sample_1 | export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs | Sample_1 | Control | Time1 | 8 |
Sample_2 | export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs | Sample_2 | Test | Time1 | 10 |
Sample_3 | export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs | Sample_3 | Control | Time2 | 3 |
Sample_4 | export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs | Sample_4 | Test | Time2 | 15 |
Sample_5 | export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs | Sample_5 | Control | Time1 | 7 |
Sample_6 | export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs | Sample_6 | Test | Time1 | 18 |
Sample_7 | export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs | Sample_7 | Control | Time2 | 8 |
Exclude additional parameters (OPTIONAL)
By default, the FSC/SSC/Time parameters are excluded.
However, if you pregated using additional markers, such as CD3, or Live/dead stains, it’s highly recommended that you exclude them from the unsupervised analysis.
If you wish to exclude additional parameters, you can use the
exclude_parameters
function.
If you need a reminder of the available marker names in your flow
object, you can still look with
Get_MarkerNames(cd4_flow)
## FJComp-BUV496-A FJComp-BUV563-A FJComp-BUV737-A FJComp-BV480-A
## "CD3" "CD4" "CD8" "DUMP(CD14_CD19)"
## FJComp-DAPI-A
## "LD"
# Exclude additional markers
cd4_flow <- exclude_parameters(flow_object = cd4_flow,
exclude_markers = c("CD3", "LD", "DUMP(CD14_CD19)", "CD4", "CD8") )
## View excluded markers
Get_MarkerNames(cd4_flow, select = "excluded", show_channel_name = T)
## FJComp-BUV496-A FJComp-BUV563-A FJComp-BUV737-A FJComp-BV480-A
## "CD3" "CD4" "CD8" "DUMP(CD14_CD19)"
## FJComp-DAPI-A
## "LD"
Add metadata
If you have sample metadata, such as group, timepoints, treatments, or otherwise, this is where you add it to your flow object. It can be added at any time, or not at all, but some options for plotting and statistics will require it.
You need to have a dataframe, usually imported from a csv or spreadsheet, OR you can use interactive mode to load a file.
- If you have continuous (numerical) variables, make sure R read them
as numeric (
class(metadata_df[["continuous_variable"]])
should be numeric, not character). - If you have discrete groups, you can factor them if you
wish them ordered in a certain way, otherwise R will treat them in
alphabetical order.
e.g.
metadata_df[["discrete_group"]] <- factor(metadata_df[["discrete_group"]], levels = c("Control", "Acute", "Chronic"))
- The dataframe REQUIRES a column that matches the samplenames in the flow object, to be specificed with the ‘key_column’ parameter.
Option 1: Interactive mode
cd4_flow <- add_sample_metadata(flow_object = cd4_flow, interactive = T)
test <- CreateFlowObject(name = "test", interactive = T)
Option 2: Provide data.frame
#read_metadata
metadata_df <- read.csv("input/metadata_flow_vignette_demo.csv")
## Add metadta to flow_object
cd4_flow <- add_sample_metadata(flow_object = cd4_flow, metadata = metadata_df, key_column = "filename", flowobj_column = "filename")
#View phenoData
Get_MetaData(cd4_flow)
filename | name | discrete_group | discrete_group2 | continuous_var | |
---|---|---|---|---|---|
Sample_1 | export_Unstim_GE_Samples_2_B03_FlowAIGoodEvents_CD4.fcs | Sample_1 | Control | Time1 | 1 |
Sample_2 | export_Unstim_GE_Samples_3_B04_FlowAIGoodEvents_CD4.fcs | Sample_2 | Test | Time2 | 10 |
Sample_3 | export_Unstim_GE_Samples_4_B05_FlowAIGoodEvents_CD4.fcs | Sample_3 | Test | Time1 | 14 |
Sample_4 | export_Unstim_GE_Samples_5_B06_FlowAIGoodEvents_CD4.fcs | Sample_4 | Test | Time1 | 20 |
Sample_5 | export_Unstim_GE_Samples_6_B07_FlowAIGoodEvents_CD4.fcs | Sample_5 | Control | Time1 | 4 |
Sample_6 | export_Unstim_GE_Samples_7_B08_FlowAIGoodEvents_CD4.fcs | Sample_6 | Control | Time2 | 9 |
Sample_7 | export_Unstim_GE_Samples_8_B09_FlowAIGoodEvents_CD4.fcs | Sample_7 | Control | Time2 | 2 |
Find cofactors (OPTIONAL)
For the unsupervised analysis, we can use an inverse hyperbolic sine transformation of the data. This requires division by a cofactor, but different channels require different cofactors depending on how strongly they are expressed.
This method attempts at finding an optimal cofactor per channel, by incrementally increasing the cofactor value until only a small portion (0.1%) of events fall under zero when transformed.
## Find cofactors function
cd4_flow <- find_cofactors(flow_object = cd4_flow)
#View cofactors
cd4_flow$parameters$cofactors
## ID2 CD39 CD28 CCR5 SLAMF6 CD101 IFNg NR4A1 BCL2 CD45RA CD27
## 735 885 720 295 110 830 1595 2215 50 570 840
## CD45RO TCF7 Ki67 IRF4 CD69 EOMES CCR7 TBET TOX TNF CD95
## 2895 345 1220 140 1475 2495 2685 2640 2765 3320 810
## PD1
## 910
Add reference controls (OPTIONAL)
While staining controls will not get used for clustering and dimension reduction, they will be useful when comparing cluster expression to the baseline, and interpreting not only marker relative expression, but whether it is positive or negative.
Controls are not necessary for EVERY marker: a FMO for a strong marker such as CD4 on T cells is rarely warranted. So, in the absence of a given control for its corresponding marker, the Unstained control will be used. The Unstained control is therefore necessary.
## [1] "Sample A8 (export_Unstained_A8_A08_T Lymphocytes.fcs) only contains 691 events. Using all 691 events"
# List control files
## Names must fit with markernames in the flowSet's `markernames()`
## Option 1
#You can run the function interactively: it will open up a browser window to select the
#files on your computer for each desired marker, and then adds them to your flow_object.
#You can also add controls for excluded markers for some plotting functions.
cd4_flow <- add_staining_controls(cd4_flow)
## Option 2
# Add staining controls manually from a list
control_LS <- list("Unstained" = "input/controls/export_Unstained_A8_A08_T Lymphocytes.fcs",
"EOMES" = "path_to_your_EOMES_file.fcs",
"T-BET" = "path_to_your_T-BET_file.fcs")
cd4_flow <- add_staining_controls(cd4_flow, control_list = control_LS, interactive = F)
Run UMAP and clustering
## [1] "Performing dimension reduction using UMAP..."
## Alternative: use another transformation on the data.
## Available options are `log`, `cytofAsinh`, or `none`.
## The latter uses the biexponential data as is.
cd4_flow_biexp <- UMAP_flow(flow_object = cd4_flow,
transformation = "none")
## [1] "Performing dimension reduction using UMAP..."
## [1] "Performing clustering using RPhenograph..."
## Finding nearest neighbors...DONE ~ 53.449 s
## Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.263 s
## Build undirected graph from the weighted links...DONE ~ 3.835 s
## Run louvain clustering on the graph ...DONE ~ 15.798 s
## Return a community class
## -Modularity value: 0.6643943
## -Number of clusters: 12
## [1] "Performing clustering using RPhenograph..."
## Finding nearest neighbors...DONE ~ 47.58 s
## Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 13.265 s
## Build undirected graph from the weighted links...DONE ~ 3.783 s
## Run louvain clustering on the graph ...DONE ~ 12.127 s
## Return a community class
## -Modularity value: 0.6806686
## -Number of clusters: 14
Cluster annotation (OPTIONAL)
Having louvain labels is a good basis for unbiased analysis. But perhaps, after looking at the data, you are able to discern the biological identity of those louvain clusters and wish to have your plots reflect that instead of just numbers.
You can provide a data.frame
, where one column (called
louvain) contains ALL the louvain clusters from your
analysis. All additional columns (that you name however you see fit)
contain your labels of choice.
The function provided for this is called
annotate_clusters
and can be used multiple times
louvain | Cell Type | Granular population | |
---|---|---|---|
8 | 1 | CD4 Memory | CD4 Tcm |
1 | 2 | CD4 Effector | CD4 Tte |
9 | 3 | CD4 Memory | CD4 Tem |
11 | 4 | CD4 Naive | CD4 Naive |
12 | 5 | CD4 Memory | CD4 Tem |
5 | 6 | CD4 Memory | CD4 Tem |
3 | 7 | CD4 Naive | CD4 Naive |
2 | 8 | CD4 Memory | CD4 Tem |
6 | 9 | CD4 Naive | CD4 Naive |
10 | 10 | CD4 Naive | CD4 Naive |
4 | 11 | CD4 Memory | CD4 RARO Tscm |
NA | NA | NA | NA |
Plotting
Many visualizations are available within the pipeline, without needing to resort to FlowJo. That being said, if that’s your preferred way of doing things, look for how to export the flow object to FCS at the end.
HOW-TO: Save a plot
Before exploring different visual options, you need to be familiar with how to save your plots to file. All plotting options in the pipeline use ggplot2 and thus can use ggsave for exporting visuals to file.
Alternatively, you can use R’s default image engines:
pdf()
, svg()
, tiff()
,
jpeg()
, png()
, then the plot, followed by
dev.off()
When using a plotting function, you can either save it to a variable
(p <- plotting_function(...)
), or use it directly within
ggsave
(ggsave(plotting_function(...), device = "pdf")
).
For more details on ggsave, look in the documentation using
?ggsave
RECOMMENDED: when exporting UMAP plots, you should export to a jpeg format, to avoid very large files.
UMAP plot
The raw visualization of UMAP dimension reduction. Often known as
DimPlot
for Seurat aficionados.
Most of the options you find here will also be found
- You can
split.by
metadata discrete columns (up to 3 at a time) - You can
group.by
a metadata discrete column OR louvain clusters to color cells. - You can choose to display only a few selected cluster labels when
group.by
is set to"louvain"
by passing a vector of clusters, e.g.c(1,4,6)
## BASIC UMAP plot: simple call
# Example 1: Raw plot
plot_umap(flow_object = cd4_flow)
# Example 2a: Louvain clusters
plot_umap(flow_object = cd4_flow,
group.by = "louvain")
# Example 2b: Louvain clusters
plot_umap(flow_object = cd4_flow_biexp,
group.by = "louvain")
# Example 3: Louvain clusters separated by a metadata variable
plot_umap(flow_object = cd4_flow,
group.by = "louvain",
split.by = "discrete_group")
# Example 4: Select only a subset of cluster labels to show
plot_umap(flow_object = cd4_flow,
group.by = "louvain",
label_filter = c(1,4,5))
# Example 5: Color by cluster annotations OR experimental groups
plot_umap(flow_object = cd4_flow,
group.by = "Granular population")
# Example 6: Remove labels, and replace with legend
plot_umap(flow_object = cd4_flow,
group.by = "Granular population",
label = F)
# Example 7: Louvain clusters separated by multiple metadata variables
plot_umap(flow_object = cd4_flow,
group.by = "louvain",
split.by = c("discrete_group","discrete_group2"))
# Example 8: Color by one metadata variable and split by another.
plot_umap(flow_object = cd4_flow,
group.by = "discrete_group",
split.by = "discrete_group2",
dot_size = 1,
ncols = 1)
Examples
1: Default
2a: Louvain clusters (asinh transform)
2b: Louvain clusters (biexponential transform)
3: Split by metadata variable
4: Filter cluster labels
5: Color by cluster annotation
6: Color by experimental groups; no label
7: Split by multiple metadata variables
8: Combine split.by and group.by
Density Plot
Instead of displaying clusters or metadata groups, these plots are meant to display point density, in a pseudocolour distribution similar to FlowJo.
Similar to UMAP_plot
, you can split.by
up
to 3 variables.
#1
plot_density_umap(cd4_flow)
#2
plot_density_umap(cd4_flow,
split.by = "discrete_group",
dot_size = 0.5)
Examples
1: Default
2: Split
UMAP heat plot
Visualization of the expression of individual markers on the UMAP plot
You can also visualize fluorescent markers that were excluded.
#1
plot_umap_heat(cd4_flow, markers = "CD45RA" )
#2
plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 100) +
plot_umap_heat(cd4_flow, markers = "ID2", quantile_saturation = 10000)
#3
plot_umap_heat(cd4_flow, markers = c("CD45RA", "CD45RO", "CCR7", "EOMES", "CD27"), legend_height = 8)
#3 Combine heatmaps into a complex arrangement
Examples
1: Default
2: Quantile Saturation
3: Complex combination
Cluster visuals
Cluster heatmap
# 1 Basic relative expression heatmap
plot_cluster_heatmap(cd4_flow)
#2 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T)
#3 Include staining control
plot_cluster_heatmap(cd4_flow, show_control = T, annotation = "Granular population")
#4 Subset louvain clusters
plot_cluster_heatmap(cd4_flow, show_control = T, louvain_select = c(1,5,7,12,9))
#5 Include Absolute expression annotation
plot_cluster_heatmap(cd4_flow, show_control = T, add_mfi = T)
Examples
1: Default
2: With Controls
3: With cluster annotations
4: Louvain cluster subset
5: + MFI annotation
Cluster histogram
#1
plot_cluster_histograms(cd4_flow, marker = "CD45RA") #1
#2
plot_cluster_histograms(cd4_flow, marker = "EOMES", show_control = T) #2
#3
plot_cluster_histograms(cd4_flow, marker = "CD45RA", clusters = c(1,2,5)) #3
Examples
1: Default
2: Include control
3: Cluster Filter
Cluster jitterplot
#1
cluster_jitterplot(flow_object = cd4_flow,
annotation = "louvain")
#2
cluster_jitterplot(flow_object = cd4_flow,
annotation = "louvain",
louvain_select = c(1,5,7),
value = "absolute")
#3
cluster_jitterplot(flow_object = cd4_flow,
annotation = "louvain",
group = "discrete_group")
#4
cluster_jitterplot(flow_object = cd4_flow,
annotation = "Granular population",
group = "discrete_group")
Examples
1: Default
2: Absolute counts + cluster filter
3: Compare metadata groups
4: Using cluster annotations
Extract data
Statistical comparisons
While an exhaustive statistical support for all possible experimental designs is beyond the scope of this pipeline, we can provide simple support for simpler designs, such as correlation or discrete statistics with metadata outcomes.
These tests are meant for initial quick exploration of your data: they are unlikely to be the perfect test for your specific hypothesis. For instance, in outcomes with more than 2 levels, consider one-way ANOVA with a post-hoc TukeyHSD test if doing a parametric analysis. Or in a paired design, adapt the test accordingly.
These are not testing for statistical assumptions either: normality, equality of variances, etc.
R package rstatix is very helpful in quickly doing those additional tests.
Regression
#1: Correlate louvain cluster relative_frequencies with a given continuous outcome
correlate_clusters(cd4_flow, group.by = "louvain", outcome = "continuous_var", method = "spearman")
#2 Correlate cluster annotations instead of louvain
correlate_clusters(cd4_flow, group.by = "Granular population", outcome = "continuous_var", method = "pearson")
Examples
1: Louvain vs outcome
louvain | outcome | rho | statistic | p.value | method | padj |
---|---|---|---|---|---|---|
1 | continuous_var | 0.220 | 43.89140 | 0.641 | Spearman | 0.9328 |
10 | continuous_var | -0.036 | 58.01810 | 0.939 | Spearman | 0.9390 |
11 | continuous_var | 0.160 | 46.91855 | 0.728 | Spearman | 0.9328 |
2 | continuous_var | -0.090 | 61.04525 | 0.848 | Spearman | 0.9328 |
3 | continuous_var | -0.180 | 66.09050 | 0.699 | Spearman | 0.9328 |
4 | continuous_var | -0.230 | 69.11765 | 0.613 | Spearman | 0.9328 |
5 | continuous_var | 0.130 | 48.93665 | 0.788 | Spearman | 0.9328 |
6 | continuous_var | -0.180 | 66.09050 | 0.699 | Spearman | 0.9328 |
7 | continuous_var | 0.250 | 41.87330 | 0.585 | Spearman | 0.9328 |
8 | continuous_var | -0.490 | 83.24435 | 0.268 | Spearman | 0.9328 |
9 | continuous_var | -0.110 | 62.16540 | 0.814 | Spearman | 0.9328 |
2: Cluster annotation vs outcome
Granular population | outcome | p | statistic | p.value | conf.low | conf.high | method | padj |
---|---|---|---|---|---|---|---|---|
CD4 Naive | continuous_var | -0.051 | -0.1139557 | 0.914 | -0.7742781 | 0.7301467 | Pearson | 0.95 |
CD4 RARO Tscm | continuous_var | 0.150 | 0.3388990 | 0.748 | -0.6799366 | 0.8113503 | Pearson | 0.95 |
CD4 Tcm | continuous_var | 0.031 | 0.0686768 | 0.948 | -0.7394540 | 0.7660474 | Pearson | 0.95 |
CD4 Tem | continuous_var | 0.029 | 0.0659301 | 0.950 | -0.7400100 | 0.7655396 | Pearson | 0.95 |
CD4 Tte | continuous_var | 0.057 | 0.1268631 | 0.904 | -0.7274442 | 0.7765762 | Pearson | 0.95 |
Discrete statistics
Akin to correlation against a continuous variable, you can also provide a discrete metadata variable for contrast statistics (T-tests, Wilcoxon tests, etc).
#1: Louvain cluster relative_frequencies non-parametric with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "non-parametric")
#2: Louvain cluster relative_frequencies T-test with a given discrete outcome
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "discrete_group", method = "parametric")
Examples
1: Non-Parametric
louvain | group1 | group2 | n1 | n2 | statistic | p.value | Test | outcome |
---|---|---|---|---|---|---|---|---|
1 | Control | Test | 4 | 3 | 7.0 | 0.857 | Non-parametric Wilcoxon Test | discrete_group |
10 | Control | Test | 4 | 3 | 4.0 | 0.629 | Non-parametric Wilcoxon Test | discrete_group |
11 | Control | Test | 4 | 3 | 4.0 | 0.629 | Non-parametric Wilcoxon Test | discrete_group |
2 | Control | Test | 4 | 3 | 7.0 | 0.857 | Non-parametric Wilcoxon Test | discrete_group |
3 | Control | Test | 4 | 3 | 6.0 | 1.000 | Non-parametric Wilcoxon Test | discrete_group |
4 | Control | Test | 4 | 3 | 10.0 | 0.229 | Non-parametric Wilcoxon Test | discrete_group |
5 | Control | Test | 4 | 3 | 8.0 | 0.629 | Non-parametric Wilcoxon Test | discrete_group |
6 | Control | Test | 4 | 3 | 10.0 | 0.229 | Non-parametric Wilcoxon Test | discrete_group |
7 | Control | Test | 4 | 3 | 2.0 | 0.229 | Non-parametric Wilcoxon Test | discrete_group |
8 | Control | Test | 4 | 3 | 10.0 | 0.229 | Non-parametric Wilcoxon Test | discrete_group |
9 | Control | Test | 4 | 3 | 4.5 | 0.719 | Non-parametric Wilcoxon Test | discrete_group |
2: Parametric
louvain | group1 | group2 | n1 | n2 | statistic | df | p.value | Test | outcome |
---|---|---|---|---|---|---|---|---|---|
1 | Control | Test | 4 | 3 | 0.6475306 | 4.995724 | 0.5460 | Parametric 2 tailed T-test | discrete_group |
10 | Control | Test | 4 | 3 | -0.8266370 | 2.711678 | 0.4750 | Parametric 2 tailed T-test | discrete_group |
11 | Control | Test | 4 | 3 | -1.0853855 | 3.962754 | 0.3390 | Parametric 2 tailed T-test | discrete_group |
2 | Control | Test | 4 | 3 | 0.4935197 | 4.510478 | 0.6450 | Parametric 2 tailed T-test | discrete_group |
3 | Control | Test | 4 | 3 | -0.0672052 | 3.422912 | 0.9500 | Parametric 2 tailed T-test | discrete_group |
4 | Control | Test | 4 | 3 | 1.3411261 | 4.998986 | 0.2380 | Parametric 2 tailed T-test | discrete_group |
5 | Control | Test | 4 | 3 | 0.5560583 | 4.430742 | 0.6050 | Parametric 2 tailed T-test | discrete_group |
6 | Control | Test | 4 | 3 | 1.7331882 | 4.986497 | 0.1440 | Parametric 2 tailed T-test | discrete_group |
7 | Control | Test | 4 | 3 | -1.7002722 | 4.945435 | 0.1500 | Parametric 2 tailed T-test | discrete_group |
8 | Control | Test | 4 | 3 | 2.1301794 | 4.929100 | 0.0872 | Parametric 2 tailed T-test | discrete_group |
9 | Control | Test | 4 | 3 | -0.4398776 | 3.708202 | 0.6840 | Parametric 2 tailed T-test | discrete_group |
Given that the function provided for this only looks at one column, this would not work if you wanted to look at the interaction of 2 variables (e.g. a 2 group design at 2 timepoints).
To circumvent this, you can create an aggregated column in the metadata like in the following example, and perform the test on that new column
new_meta <- pData(cd4_flow$flowSet) %>%
unite(col = "aggregate_outcome", discrete_group, discrete_group2, sep = "::" ) %>%
mutate(aggregate_outcome = as.factor(aggregate_outcome)) %>%
rownames_to_column("SampleID")
cd4_flow <- add_sample_metadata(cd4_flow, metadata = new_meta)
clusters_compare_groups(cd4_flow, group.by = "louvain", outcome = "aggregate_outcome", method = "parametric")
Exporting your analysis
Exporting to a FCS file
A FCS file can hold the new UMAP and louvain values generated. It can also hold the metadata, albeit not as words. A workaround is to encode the discrete variables as number, and export a csv key as a dictionary.
Thus, you’d be able to explore your data in an environment you might be more familiar with.
## [1] "output/data/demo_unsupervised.fcs"
Exporting to RDS
If you want to save your analysis as a whole, you can export your flow object to R’s RDS format, to then either load at a later time or share.
Export to spreadsheet
If you wish to perform your statistical analyses or some plotting
outside of R, and wish to access the values embedded in the flow object,
you can use the export_to_spreadsheet
function.
Each tab will represent a different type of values, i.e.:
- The fluorescence values annotated with metadata variables
- Cluster counts/frequencies
- Cluster Median Fluorescence Intensity
Session info
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
## [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
## [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Toronto
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.1.2 flowCore_2.12.0 unsupflowhelper_1.2
## [4] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
## [7] purrr_1.0.1 readr_2.1.4 tidyr_1.2.0
## [10] tibble_3.2.1 ggplot2_3.4.2 tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] mathjaxr_1.6-0 RColorBrewer_1.1-3
## [3] rstudioapi_0.15.0 jsonlite_1.8.7
## [5] magrittr_2.0.3 farver_2.1.1
## [7] rmarkdown_2.23 fs_1.6.2
## [9] zlibbioc_1.46.0 vctrs_0.6.3
## [11] ggpointdensity_0.1.0 base64enc_0.1-3
## [13] rstatix_0.7.0 htmltools_0.5.5
## [15] haven_2.5.0 broom_1.0.0
## [17] cellranger_1.1.0 googlesheets4_1.1.1
## [19] sass_0.4.6 KernSmooth_2.23-22
## [21] bslib_0.5.0 plyr_1.8.8
## [23] lubridate_1.9.2 cachem_1.0.8
## [25] igraph_1.5.0 mime_0.12
## [27] lifecycle_1.0.3 pkgconfig_2.0.3
## [29] Matrix_1.6-0 R6_2.5.1
## [31] fastmap_1.1.1 shiny_1.7.4.1
## [33] numDeriv_2016.8-1.1 digest_0.6.33
## [35] ggnewscale_0.4.9 colorspace_2.1-0
## [37] S4Vectors_0.38.1 irlba_2.3.5.1
## [39] ggpubr_0.4.0 labeling_0.4.2
## [41] metadat_1.2-0 cytolib_2.12.0
## [43] fansi_1.0.4 colorRamps_2.3.1
## [45] timechange_0.2.0 polyclip_1.10-4
## [47] httr_1.4.6 abind_1.4-5
## [49] compiler_4.3.1 gargle_1.5.1
## [51] withr_2.5.0 ConsensusClusterPlus_1.64.0
## [53] backports_1.4.1 metafor_4.2-0
## [55] carData_3.0-5 DBI_1.1.3
## [57] hexbin_1.28.3 highr_0.10
## [59] ggforce_0.4.1 ggsignif_0.6.4
## [61] MASS_7.3-60 FlowSOM_2.4.0
## [63] tools_4.3.1 googledrive_2.1.1
## [65] httpuv_1.6.11 glue_1.6.2
## [67] nlme_3.1-162 promises_1.2.0.1
## [69] grid_4.3.1 checkmate_2.2.0
## [71] Rtsne_0.16 cluster_2.1.4
## [73] generics_0.1.3 gtable_0.3.3
## [75] tzdb_0.4.0 MetaCyto_1.18.0
## [77] rmdformats_1.0.4 data.table_1.14.8
## [79] hms_1.1.3 xml2_1.3.5
## [81] car_3.1-0 utf8_1.2.3
## [83] BiocGenerics_0.46.0 RcppAnnoy_0.0.21
## [85] ggrepel_0.9.3 RANN_2.6.1
## [87] pillar_1.9.0 later_1.3.1
## [89] tweenr_2.0.2 lattice_0.21-8
## [91] RProtoBufLib_2.12.0 CytoML_2.8.0
## [93] tidyselect_1.2.0 RBGL_1.76.0
## [95] ggcyto_1.24.1 pbapply_1.7-2
## [97] knitr_1.43 gridExtra_2.3
## [99] bookdown_0.34 flowWorkspace_4.12.0
## [101] Rphenograph_0.99.1 scattermore_0.8
## [103] stats4_4.3.1 xfun_0.39
## [105] Biobase_2.60.0 matrixStats_1.0.0
## [107] pheatmap_1.0.12 stringi_1.7.8
## [109] ncdfFlow_2.46.0 yaml_2.3.7
## [111] evaluate_0.21 codetools_0.2-19
## [113] Rgraphviz_2.44.0 graph_1.78.0
## [115] cli_3.6.1 uwot_0.1.16
## [117] RcppParallel_5.1.7 xtable_1.8-4
## [119] munsell_0.5.0 jquerylib_0.1.4
## [121] modelr_0.1.8 Rcpp_1.0.11
## [123] readxl_1.4.3 dbplyr_2.2.1
## [125] fastcluster_1.2.3 XML_3.99-0.14
## [127] parallel_4.3.1 ellipsis_0.3.2
## [129] assertthat_0.2.1 reprex_2.0.2
## [131] scales_1.2.1 ggridges_0.5.4
## [133] writexl_1.4.2 crayon_1.5.2
## [135] rlang_1.1.1 cowplot_1.1.1
## [137] rvest_1.0.3
RPM Bioinfo Solutions, adam.nicolas.pelletier@rpmbioinfo-solutions.com↩︎
Emory University, ashish.arunkumar.sharma@emory.edu↩︎
Allen Institute for Immunology↩︎