Contamination Analysis and Tempering-An Automated Online Platform for Plasma Proteomics

Welcome to CAT-APP

This interactive tool allows you to analyze and correct for contamination in plasma proteomics data.

Key features:

  • Multi-dimensional contamination assessment and adaptive contamination indexing
  • Mathematic model-based contamination correction
  • Data recovery evaluation with visualization

How to use:

  1. Upload your protein expression data and group information
  2. Check data quality and select contamination markers
  3. Run correction for selected contamination types
  4. Perform differential expression analysis

For any questions or feedback, please contact us via email: zhangdong_0121@foxmail.com

© 2025 CAT-APP - Contamination Analysis Tool for Plasma Proteomics

No data uploaded. Please upload your data or use example data to explore.

Data File Preview

Group Information Preview

Data Quality Assessment

Contamination Summary


                
                

Quality Control

Contamination Marker Expression

Relevance of contamination markers

Contamination Levels

Correction Outcomes

Post-correction QC

Post-correction Contamination

Post-correction Data Matrix

Download Post-correction Data

User Manual

1. Tool Overview

This tool is designed for plasma proteomics data analysis and provides the following core functions:

  • Multi-dimensional contamination assessment and adaptive contamination indexing
  • Mathematic model-based contamination correction
  • Data recovery evaluation with visualization

2. User Guide

2.1 Data Input

  1. Data source selection: Choose example data for reference or upload CSV files (gene expression matrix and group information)
  2. File format requirements:
    • Expression matrix: First column contains protein names, columns represent samples. Requires missing value imputation. Do NOT perform log2 transformation (software will automatically apply log2 transformation)
    • Group information: Must contain id (matching expression matrix column names) and group columns
  3. Parameter settings: Select comparison groups, set correlation coefficient threshold (default: 0.9)

2.2 Contamination Assessment

  1. Quality assessment: View quality control plots including PCA, heatmap, correlation coefficient distribution
  2. Marker selection:
    • Select contamination panels with high CV values from contamination type list
    • Filter effective markers through correlation analysis and differential expression
  3. Contamination level:
      Assess impact degree through CV distribution
      Evaluate sample-specific contamination through expression of contaminant markers
      If the CV value of a contamination panel is not significantly higher than other proteins, or if markers show no high correlation, the dataset has no significant contamination
      If contaminant markers show significant differential expression in both groups, correction cannot be performed as differences may originate from either contamination or biological variation

2.3 Data Correction

  1. Correction type: Select contamination types to correct (RBC, platelets, coagulation system). Do NOT select types without available markers
  2. Constraint factor: Adjust correction strength using slider (recommended range: 0.8-1.2, default: 1)
  3. Quality control: Compare quality metrics pre/post correction: PCA, contaminant marker CV changes

2.4 Differential Analysis

  1. Analysis method: Differential expression analysis based on limma
  2. Result interpretation:
    • Compare overlapping differential proteins pre/post correction using Venn diagrams
    • Visualize significant differential proteins via volcano plots
  3. Data export: Download results in CSV format

3. Important Notes

  • Data preprocessing: Perform missing value imputation before uploading
  • Marker validation: Ensure selected contamination markers show stable expression in the dataset
  • Parameter optimization: Adjust constraint factor using CV distribution, correlation plots and PCA results. Default values suffice for most cases
  • Result validation: Post-correction should show: Significant reduction in CV values of contaminant markers and decreased high-correlation distribution
  • Technical support: Report issues at: https://github.com/The-Hong-Wang-Lab-a-bloodomics-group/CAT-APP

4. Frequently Asked Questions

  • Q1: Why do negative values appear after correction? A: This is normal and may occur with extremely small values due to automatic log2 transformation
  • Q2: How to determine optimal correlation coefficient threshold? A: Default 0.9 works for most cases. Lower threshold if insufficient markers are identified
  • Q3: Is significant change in differential proteins post-correction normal? A: Yes. Removed proteins typically associate with contamination pathways, while new differential proteins often relate to biological pathways