data_store directory structure¶
This page explains the output directory structure (i.e. data_store) of the code_for_paper,
please read the code_for_paper.rst before proceeding.
In the first layer of data_store there are three directories:
data
juicer_tools
outputs
The data directory is used for storing the data required for the experiments of Lieberman, 2009 (GSE18199) and Rao, 2014 (GSE63525),
including the Pearson matrices and PC1s for each cell line at the resolution of 1Mb and 100Kb.
For the experiments of Lieberman, 2009, the data are directly downloaded from GSE18199;
For the experiments of Rao, 2014, the .hic data are downloaded from GSE63525,
and processed with juicer_tools 1.22.01 for creating the Pearsons and PC1s.
The outputs directory is used for storing the experiment results, including the Estimated PC1-pattern .txt files, scatter & relative_magnitude plots and the summary informations of all the experiments.
Note that we created the Estimated PC1-pattern by selecting the cxmax or cxmin of the Pearson’s covariance matrix, at the resolution of 1Mb and 100Kb;
Besides, since GSE63525 doesn’t provide the .hic files for HeLa, we skip this cell line.
Here we further explain the details of the outputs directory structure, in the first layer of outputs there are three directories:
est_pc1_pattern
plots
summary
The est_pc1_pattern directory contains the text files of the Estimated PC1-pattern.
The plots directories including the scatter and the relative_magnitude plots.
In the summary directory there are (2009 means using the data from GSE18199; 2014 means using the data from GSE63525):
summary_similarity_2009.xlsxandsummary_similarity_2014.xlsx, summary_similarity_2014_sample10.xlsx, which is for comparing thesimilar_ratebetween the juicer_tools calculated PC1 and the Estimated PC1-pattern, with and without using sampling method.summary_self_pca_2009.xlsxandsummary_self_pca_2014.xlsx, which is for recording the explained variance ratio of the first 3 Principal components of the Pearson matrix, and for recording thesimilar_ratebetween the self calaulated PC1 (NOT the juicer_tools calculated PC1) and the Estimated PC1-pattern.summary_similar_rate_percentage_2014.xlsx, which is used for summarizing the percentage of columns in the covariance matrix that has a similar_rate over 90%, 95% or 99%.