Data analysis workflow.
Any type of content formally published in an academic journal, usually following a peer-review process.
The Dharmacon SMARTpool protein coding library comprised 18120 genes (RefSeq v.27) and was screened in 384 well format, duplicate plates per transfection (i). Raw cell count (total number of cells identified from Hoechst stain/well) and Ecad score were averaged over the duplicate plates for all controls and SMARTpool siRNAs. The total number of mock control wells were averaged per plate (16 wells per primary screen plate and 31 wells per deconvolution screen plate). The raw cell count and Ecad Scores for all SMARTpool siRNAs and the remaining control siRNAs were then normalised to the mock control (from the same plate) (ii). siRNAs were excluded from further analysis based on low cell counts (iii). siPLK1 was used as a toxicity gene control to assess and define cut-off scores for low cell count and to ensure reproducible transfection conditions each transfection. siRNA were binned into the following Cell Viability categories based on cell count; CV1, CV2 and Low Count (LC). CV1: ≥ 0.7 -fold vs mock, CV2: ≥ 0.5 <0.7 -fold vs mock, LC: < 0.5 -fold vs mock. The target cell count per well was set to 3000 and the maximum number of fields was set to 25 to be binned into CV1 category. The minimum number of cells per field was set at 14 and the maximum number of continuous sparse fields (ie fields where there are less than 14 cells) was set to 6. siRNAs in the LC category (i.e <1500 cell count in 25 FOV) were excluded from further analysis. siRNAs were removed from further analysis based on Ecad score (iv). siZEB1 and siCDH1 were used as Ecad Score positive controls to assess and define cut-off values for the high and low Ecad thresholds. siRNAs were binned into the following Ecad Score categories; High (siZEB1 like siRNA): Ecad score 1.6≥ -fold vs mock, NC: Ecad score >0.2, <1.6 –fold vs mock, Low (siCDH1 like siRNA): Ecad score ≤0.2 –fold vs mock. siRNAs were not analysed further if they had an Ecad score in the NC category (v). RNA from SW480 cells was sequenced and analysed . The siRNA targeting genes that had an RPKM of less than 1 were removed from further consideration on the premise that any changes in Ecad Score upon transfection with these siRNA may be attributed to off-target effects (v). microRNA seed sequence analysis was carried out on the SMARTpool siRNA sequences of 454 genes and compared against 3 times as many genes that had 0 Z score from the primary screen (Dharmacon RNAi Technologies unpublished program). siRNAs were removed on the basis that they had sequence identity to the seed sequence of the miRNA-200 family (vi). These miRNA have a defined role in E-cadherin regulation and therefore any changes with these siRNA are likely caused by a direct effect on miRNAs rather than a specific gene. siRNAs that passed the multiple filtering steps were then screened in the deconvolution validation screen (vii). The results from the primary screen revealed an abundant number of genes that had scored Ecad Low. The dynamic range for these genes was relatively small compared to the Ecad high genes (see S1I Table). From our own and other laboratories experience in culturing SW480 cells, we observe a higher ratio of cells with junctional E-cadherin when cells are grown to a high passage number and at increased cell density By increasing the Ecad score dynamic range between mock and siCDH1, variations in Ecad score were easier to identify between individual siRNA. We were able to increase the dynamic range by transfecting the cells at a higher passage number  and density (3000 cells/well) and without affecting the siCDH1 Ecad score (remains at zero). The Ecad high screen transfection was carried out under the same conditions as the primary screen. Genes were scored out of 4 individual siRNA for Ecad Score and removed from further analysis if they scored <2 active siRNAs (vii). 34 genes had a High Ecad Score (siZEB1 like genes) and 167 had a Low Ecad Score (siCDH1 like genes).