FAQ

  1. What expression platforms are compatible with CellNet?
  2. Can I use RNA-Seq data in CellNet?
  3. I have lots of microarray data for my favorite cell type/tissue, which is not already part of CellNet. How can I use this data to train another cell or tissue type for CellNet?
  4. Can I add a different microarray platform or a different species?
  5. The tissues used as training data are made up of many different cell types, but cell engineering seeks to derive relatively pure cell populations. How does cellular heterogeneity of tissues influence CellNet results?
  6. Most of the training data is from in vivo derived samples, but CellNet is typically applied to in vitro derived samples. How does cell culture affect CellNet results?
  7. How do I cite CellNet?
  8. What is the Network Influence Score (NIS)?
  9. How was CellNet made?
  10. How was the complete training data set generated?
  11. How were the C/T GRNs reconstructed?
  12. How were the C/T classifiers made?
  13. I just ran CellNet. How do I interpret the results?
  14. I am having problems running CellNet through the web site. What should I do?
Q: What expression platforms are compatible with CellNet?
A: The following microarray platforms can be immediately analyzed in CellNet (as of June 21st 2014):
  • Affymetrix Mouse Genome 430 2.0 Array (GPL1261)
  • Affymetrix Mouse Gene 1.0 ST Array [transcript (gene) version] (GPL6246)
  • Illumina MouseRef-8 v2.0 expression beadchip (GPL6885) (only available in stand-alone version)
  • Affymetrix Human Genome U133 Plus 2.0 Array (GPL570)
  • Affymetrix Human Gene 1.0 ST Array [transcript (gene) version] (GPL6244)
Q: Can I use RNA-Seq data in CellNet?
A: We are focusing our current efforts on training a RNA-Seq version of CellNet, but this is functionality is not available yet. Please stay tuned. If you have unpublished RNA-Seq data, and are interested in cooperating with us to expand CellNet to RNA-Seq for your cell type or tissue, please contact us directly.
Q: I have lots of microarray data for my favorite cell type/tissue, which is not already part of CellNet. How can I use this data to train another cell or tissue type for CellNet?
A: If you have sufficient number of profiles for a cell type or tissue on one of the platforms for which CellNet is already trained, then there are several ways to incorporate this data into CellNet. However, this is only possible by using the stand-alone CellNet code. Please see our GitHub project page for more details.
Q: Can I add a different microarray platform or a different species?
A: If you have sufficient data for your platform or species, then you can train up your own CellNet. However, this is only possible by using the stand-alone CellNet code. Please see our GitHub project page for more details.
Q: The tissues used as training data are made up of many different cell types, but cell engineering seeks to derive relatively pure cell populations. How does cellular heterogeneity of tissues influence CellNet results?
A: To address this question, we applied CellNet to primary, purified cells. We found that the classification scores and gene regulatory network status was very high, suggesting that the cell and tissue specific gene regulatory networks capture generic aspects of the tissues that distinguish them for other cell and tissues in the training data set. Please see Figure 2A-F and the associated text in Cahan et al 2014 for more details.
Q: Most of the training data is from in vivo derived samples, but CellNet is typically applied to in vitro derived samples. How does cell culture affect CellNet results?
A: To address this question, we applied CellNet to neurons and hepatocytes cultured for up to eight days and 72 hours, respectively. We found that the classification and GRN status strongly reflected the tissue of origin. There was a slight degradation of classification and GRN status in cultured cells, but this was not as substantial as what we saw in engineered cell populations. Please see Figure 2G-I and associated text in Cahan et al 2014 for more details.
Q: How do I cite CellNet?
A: Please see Cahan et al 2014.
Q: What is the Network Influence Score (NIS)?
A: The network influence score is meant to quantify the extent to which a transcriptional regulator (TR) and its targets are dysregulated with respect to a specific tissue or cell type. It integrates the following variables in this rating of a TR:
  • the number of predicted targets
  • the expression level of the TR in the C/T of the training data
  • the extent to which the transcriptional regulator is dysregulated (either too high or too low)
  • the extent to which its targets are dysregulated (also too high or too low)
Please see Figure 1F in Cahan et al 2014 and associated text for more details.
Q: How was CellNet made?
A: Making CellNet entailed the following three activities. First, we had to generate the complete training data that would serve to estimate the expression distributions of genes from each platform. Second, we had to reconstruct gene regulatory networks. Finally, we had to use the GRNs to train cell and tissue type classifiers. You can see below for more details on each of these steps.
Q: How was the complete training data set generated?
A: Gene Expression Omnibus was queried for expression profiles on the Affymetrix Mouse 430.2, Illumina8v2, HG133 plus 2, MoGene 1.0, and HuGene 1.0 platforms. Affymetrix microarrays not passing the GNUse quality control metric were not used to train CellNet. Probe intensities from raw .CEL files were background corrected, and summarized into probeset values. Then, values of probesets mapping to the same gene were averaged. Each array was normalized by dividing each gene expression value by the total gene expression per array. The end result of this step was the generation of one expression matrix per platform.
  1. Gene Expression Omnibus
  2. Quality control
  3. Background correct
  4. Summarize probes
  5. Gene average probesets
  6. Normalize (proportional)
  7. Complete training data set
Q: How were the C/T GRNs reconstructed?
A: First, approximately equal numbers of arrays were sampled from each cell and tissue (C/T) and GEO accession and combined into one data set. The resulting expression matrix was quantile normalized to create a data set for input to the Context Likelihood of Relatedness GRN reconstruction algorithm. Then, GRN based predictions were assessed by comparison to the gold standards, a GRN threshold of 4 was selected as the optimal cutoff, and the resulting trimmed GRNs were input to InfoMap to identify communities or sub-networks. Gene set enrichment analysis was performed on each sub-network to identify C/T enriched sub-networks, and sub-networks enriched in the same C/Ts were merged into single C/T GRNs.
  1. Complete training data set
  2. Select subset of arrays
  3. Quantile normalize
  4. Context Likelihood of Relatedness GRN reconstruction (all arrays)
  5. Repeat (3) after limiting data set to only samples derived from ectoderm
  6. Repeat (4) for mesoderm, endoderm, and germ-cell associated lineages
  7. ENCODE-based GRN calibration and trimming
  8. Apply InfoMap sub-network detection algorithms to each trimmed GRN
  9. Find cell and tissue type enriched sub-networks by gene set enrichment
  10. Merge C/T sub-networks into C/T GRNs
Q: How were the C/T classifiers made?
A: For each platform, a binary Random Forest classifier was trained for each C/T. Initially, approximately 50% of arrays from the complete data set were randomly selected and used to train the classifiers. The performance of these classifiers was evaluated by classifying the remaining, independent arrays, which were not used to train the classifier and were from different studies. The performance of these classifiers was presented in Figures 1 and Figure S1. To improve the overall power of the classifiers, we used the complete data set to train the classifiers that are used in the final version of CellNet.
  1. Complete training data set
  2. Select subset of arrays
  3. Train one binary Random Forest classifier per C/T
  4. Assess performance of classifiers using independent, held-out samples
  5. Train one binary Random Forest classifier per C/T using the complete training data set
Q: I just ran CellNet. How do I interpret the results?
A: We highly recommend that you read Cahan et al. Cell 2014, which describes in detail how CellNet was made and several example analyses of cell engineering experiments. In brief, CellNet compares your expression data to a large compendium of expression data sets to determine the extent to which cell and tissue specific gene regulatory networks (GRNs) were established. The results file names are prepended with the analysis name that you entered in Step 4. The three main outputs of CellNet included in your results are:
  1. Cell and tissue type classification. This is a data table in which each row represents one cell or tissue type, and each column represents one of your samples. The file name will be analysis name (which you entered in Step 4) _classRes.csv. The values represent the classification score, and reflect the probability that a sample is indistinguishable from a cell or tissue type by gene expression. We typically represent this as a heatmap, on a black-> green->yellow scale (0.0->0.5->1.0), where the rows and columns are ordered as in the data table. This heatmap, and all other plots are provided in PDF format, named [date of analysis_plots.pdf]. The panels are easily to imported into graphics programs such as Adobe Illustrator for re-formatting.
  2. GRN status. This is a data table in which each row represents one GRN associated with a particular cell or tissue type, and each column represents one of your samples. The values represent the extent to which the GRN is established to a level equivalent to that seen in the associated cell or tissue type. The file is named grnScores.csv in your results. We represent this as a bar plot, were replicates are combined into one bar (colored light blue), and the training data of the starting and target cell types are also shown as points of reference (dark blue).
  3. Network influence score. The Network influence score is computed for each transcriptional regulator in the target cell type GRN according to the extent that it is either too highly or too lowly expressed in your terminal sample. The NIS also integrates other information into scoring transcriptional regulators, including the extend to which predicted target genes are dysregulated, the number of transcriptional targets, and the expression level of the regulator in the target cell type. Positive values indicate that the regulator is too highly expressed, and negative values indicate that the regulator is too lowly expressed. The results include a file named NIS.csv that includes the NIS scores of all transcriptional regulators of the target cell type GRN in all of the samples. In the paper, we visualized the NIS as a bar plot. In your results, we represent the NIS as a heatmap, and only display the 50 most lowest scoring transcriptional regulators.

Other output. We also include a log file that lists the steps in your analysis, and expNorm.csv, which contains your normalized data.

Q: I am having problems running CellNet through the web site. What should I do?
There are several issues that could impede the analysis of your data:
  • Data upload: Were all of the files listed in your sample table selected for upload, or included in the compressed file? Did you include at least two samples?
  • Sample annotation file: We strongly recommend that you use the provided template file to make your own sample annotation table. It is crucial that, if you make it in Microsoft Excel, that you save the file as a .csv.
  • Platform: Does the platform that you selected in Step 6 match the platform that was used to generate all of your data files.

If you have addressed all of these potential issues and are still having problems, please visit the CellNet web application user group at https://groups.google.com/forum/#!forum/cellnet_webapp.