User Manual
1. Tool Overview
This tool is designed for plasma proteomics data analysis and provides the following core functions:
- Multi-dimensional contamination assessment and adaptive contamination indexing
- Mathematic model-based contamination correction
- Data recovery evaluation with visualization
2. User Guide
2.1 Data Input
-
Data source selection:
Choose example data for reference or upload CSV files (gene expression matrix and group information)
-
File format requirements:
- Expression matrix: First column contains protein names, columns represent samples. Requires missing value imputation. Do NOT perform log2 transformation (software will automatically apply log2 transformation)
- Group information: Must contain id (matching expression matrix column names) and group columns
-
Parameter settings:
Select comparison groups, set correlation coefficient threshold (default: 0.9)
2.2 Contamination Assessment
-
Quality assessment:
View quality control plots including PCA, heatmap, correlation coefficient distribution
-
Marker selection:
- Select contamination panels with high CV values from contamination type list
- Filter effective markers through correlation analysis and differential expression
-
Contamination level:
Assess impact degree through CV distribution
Evaluate sample-specific contamination through expression of contaminant markers
If the CV value of a contamination panel is not significantly higher than other proteins, or if markers show no high correlation, the dataset has no significant contamination
If contaminant markers show significant differential expression in both groups, correction cannot be performed as differences may originate from either contamination or biological variation
2.3 Data Correction
-
Correction type:
Select contamination types to correct (RBC, platelets, coagulation system). Do NOT select types without available markers
-
Constraint factor:
Adjust correction strength using slider (recommended range: 0.8-1.2, default: 1)
-
Quality control:
Compare quality metrics pre/post correction: PCA, contaminant marker CV changes
2.4 Differential Analysis
-
Analysis method:
Differential expression analysis based on limma
-
Result interpretation:
- Compare overlapping differential proteins pre/post correction using Venn diagrams
- Visualize significant differential proteins via volcano plots
-
Data export:
Download results in CSV format
3. Important Notes
-
Data preprocessing:
Perform missing value imputation before uploading
-
Marker validation:
Ensure selected contamination markers show stable expression in the dataset
-
Parameter optimization:
Adjust constraint factor using CV distribution, correlation plots and PCA results. Default values suffice for most cases
-
Result validation:
Post-correction should show: Significant reduction in CV values of contaminant markers and decreased high-correlation distribution
-
Technical support:
Report issues at: https://github.com/The-Hong-Wang-Lab-a-bloodomics-group/CAT-APP
4. Frequently Asked Questions
-
Q1:
Why do negative values appear after correction?
A: This is normal and may occur with extremely small values due to automatic log2 transformation
-
Q2:
How to determine optimal correlation coefficient threshold?
A: Default 0.9 works for most cases. Lower threshold if insufficient markers are identified
-
Q3:
Is significant change in differential proteins post-correction normal?
A: Yes. Removed proteins typically associate with contamination pathways, while new differential proteins often relate to biological pathways