CD-HIT

Search Software

CD-HIT is a program for clustering large protein database at high sequence identity threshold. The program removes redundant sequences and generate a database of only the representatives. It can be applied in protein family classification, domain analysis, organizing large protein databases, improving performance of database search, and much more.

SW Documentation: 

To run this software interactively in a Linux environment run the commands:

module load cd-hit/4.3matlab
cd-hit

cd-hit can be called as: cd-hit optionshere
More information is available here

Short Name: 
cdhit
SW Module: 
cdhit
Service Level: 
Minimal
SW Category: