Identification of Novel Genes associated with Rheumatoid Arthritis using Differential Gene Co-Expression Analysis
MetadataShow full item record
Large RNA-Seq and DNA microarray data sets have enabled the study of relationships between genes through co-expression analyses. Changes in gene co-expression patterns are often related to changes in biological function, and differential co-expression network analyses have become an important step in the comparison of co-expression profiles of different conditions. The CSD method for differential co-expression analysis is used to identify conserved, differentiated and specific correlation patterns between gene pairs over multiple conditions, analyzing pair-wise correlations in gene expression and the variability of these correlations within each data set. The variability in correlation is found from the analysis of independent sub-samples from the full data set. The minimum sub-sample size and required number of sub-samples in total gives a lower bound for the data set size. An alternative sub-sampling approach allowing dependence between the sub-samples was investigated in this study. The aim was to find out how the estimated variability was affected by sub-sample dependence compared to a low number of sub-samples, and to investigate whether or not sub-sample overlap could be justified as a means to allow for smaller data sets in a CSD analysis. The results of the study were clear: Sub-sample dependence had a much smaller negative impact on the estimated variability than having too few sub-samples. Fewer than 150 sub-samples should therefore be avoided, and for data sets smaller than 60 data points per gene, sub-sample dependence was suggested as a valid alternative to the original sub-sampling approach. Rheumatoid Arthritis (RA) is an autoimmune disease affecting about 1% of the worldwide population. The disease involves a chronic inflammation of the joints, followed by progressive articular remodelling and damage, as well as many common comorbidities. The CSD method was applied on gene expression data from the synovial fluid of RA patients and healthy controls, with the aim of identifying novel genes that could be central in RA development. The alternative sub-sampling algorithm was implemented on the small control data set. The CSD analysis resulted in a differential co-expression network clearly enriched in genes with functions related to the disease. Eleven network hubs were identified: Three genes, PDCD1, CTLA4 and PRDM1, had previously been associated with RA, while eight genes, ZNF205-AS1, GPR18, LINC00426, SEPT1, ASAP2, ENPP1, IL2RG and GBP5 were new in this context. Like a high number of genes in the network, most of them were related to immune function or growth and tissue/organ development, and due to their high network connectivity, all of them were considered to be possible candidates for further research of RA. The three most highly connected genes, PDCD1, ZNF205-AS1 and GPR18, contributed to a highly disassortative sub-network and were hypothesized to have a coordinating role in RA. PDCD1 also was a strong connector between parts of the network with different correlation relationships. This is consistent with previous reports of RA association for this gene. The eight other genes were found in a sub-network dominated by differentiated co-expression and particularly enriched with genes related to cell differentiation and tissue and organ development and morphology. These were hypothesized to be involved in later stages of rheumatoid arthritis.