Running ONTraC on a Xenium dataset

Download the data

The human breast cancer Xenium dataset originally published by Janesick, et al. is available at 10X genomics

For running this example, download the In Situ Sample 1, Replicate 1 (Xenium_FFPE_Human_Breast_Cancer_Rep1_outs.zip) and the Cell_Barcode_Type_Matrices.xlsx file

Data preparation

For this step, we will use the R package Giotto .

  • Read the Xenium data and create a Giotto object

library(Giotto)

expression_matrix <- get10Xmatrix("data_xenium/outs/cell_feature_matrix/",
                     gene_column_index = 2)

expression_matrix <- expression_matrix$`Gene Expression`

spatial_locs <- data.table::fread("data_xenium/outs/cells.csv.gz")
spatial_locs <- spatial_locs[, c("x_centroid", "y_centroid", "cell_id")]
colnames(spatial_locs) <- c("sdimx", "sdimy", "cell_ID")

instructions <- createGiottoInstructions(python_path = NULL,
                                       save_plot = TRUE,
                                       show_plot = FALSE,
                                       return_plot = FALSE,
                                       save_dir = "results_xenium")

xenium <- createGiottoObject(expression = expression_matrix,
                           spatial_locs = spatial_locs,
                           instructions = instructions)

cell_types <- cell_types$Cluster

xenium <- addCellMetadata(xenium,
                        new_metadata = cell_types)
  • Filter the cells with low counts

xenium <- filterGiotto(xenium,
                     feat_det_in_min_cells = 100,
                     min_det_feats_per_cell = 10)
  • Create the input table for ONTraC

metadata <- pDataDT(xenium)

coordinates <- getSpatialLocations(xenium,
                                 output = "data.table")

# create data table
dataset_table <- data.frame(Cell_ID = metadata$cell_ID,
                           Sample = rep("sample1_rep1", 163565),
                           Cell_Type = metadata$cell_types,
                           x = coordinates$sdimx,
                           y = coordinates$sdimy)


# subset region
write.table(dataset_table[dataset_table$x > 3000 & dataset_table$x < 4500  & dataset_table$y > 1000 & dataset_table$y < 2500, ],
            "dataset.csv", sep = ",",
            quote = FALSE, row.names = FALSE, col.names = TRUE)

Running ONTraC

If your default shell is not Bash, please adjust this code.

ONTraC will run on CPU if CUDA is not available.

conda activate ONTraC
ONTraC --meta-input dataset.csv \
--NN-dir selection/output_xenium/NN \
--GNN-dir selection/output_xenium/GNN \
--NT-dir selection/output_xenium/NT \
--device cpu --epochs 300 -s 42 --patience 100 \
--min-delta 0.001 --min-epochs 50 --lr 0.03 --hidden-feats 4 \
-k 4 --n-neighbors 50 --modularity-loss-weight 0.3 \
--regularization-loss-weight 0.1 --purity-loss-weight 300 \
--beta 0.03 > xenium_final.log

Results visualization

Please see the visualization tutorials for details.

  • Create all the plots with the ONTraC_analysis command.

ONTraC_analysis --meta-input dataset.csv \
--NN-dir selection/output_xenium/NN \
--GNN-dir selection/output_xenium/GNN \
--NT-dir selection/output_xenium/NT \
-o selection/analysis_xenium -l xenium_final.log
Cell level NT score