The pancreatic cancer datasets also include a variety of platform

The pancreatic cancer datasets also include a variety of platforms and clinical focuses [27�C31]. We identify genes to discriminate pancreatic cancer versus noncancer patient samples. These datasets contain different numbers of probes (or probesets in the case of Affymetrix datasets) due to differences either in microarray platform. Within each dataset group, we reduce the number of probes in each dataset to a common shared set based on probe sequence similarity.Table 1Microarray datasets.2.2. Rank Average Meta-AnalysisThe meta-analysis-based FS method proposed in this paper ranks genes individually in each dataset and computes the average rank of each gene. Gene rank order is determined by a measure of differential expression (which can be any of a number of basic FS methods such as fold change or t-test) and we assume that this rank order is invariant to batch effects.

Using the average rank of a gene across several datasets to obtain the final multidataset rank order, we can infer (1) the relative strength of that gene in differentiating the patient samples of interest and (2) the consistency of the gene’s differential expression across multiple studies.The remainder of this section uses the following mathematical notation. K is the total number of datasets, M is the total number of genes in each dataset, and Nk is the number of samples in dataset k, where k = 1 K and N is the total number of samples in all datasets. We denote a gene i in dataset k as a vectorg?i,k=(x1i,k,x2i,k,��,xNki,k),(1)where xji,k is the expression value of gene i of sample j in dataset k.

In the case of sample aggregation (i.e., the naive method of meta-analysis), we denote a gene i across all datasets (x1i,K,x2i,K,��,xNKi,K)).(2)Using?withg?i,?=((x1i,1,x2i,1,��,xN1i,1),(x1i,2,x2i,2,��,xN2i,2),��, this notation, we can define a function, ri,k,?=R?(g-i,k), to compute the rank, ri,k,, of a gene, g-i,k, using a ranking algorithm denoted by . A smaller rank indicates a greater degree of differential expression. In the case of sample aggregation, the ranking function takes the form ri,?,?=R?(g-i,?). Entinostat The average rank, r-i, of a gene i across all datasets, weighted by number of samples in each dataset, Nk, isr?i=1N��k=1KNkR?k(g?i,k).(3)Weighting gives preference to ranks from datasets with larger sample sizes.We consider several basic FS, or gene ranking, methods as follows: fold change (FC), t-test (T), significance analysis of microarrays (SAM) [32], rank-sum (RS), minimum redundancy maximum relevance using the difference formulation (mRMRD), and mRMR using the quotient formulation (mRMRQ) [33]. We explicitly define the rank algorithm for the kth dataset as?k��FC,T,SAM,RS,mRMRD,mRMRQ.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>