Minnesota Supercomputing Institute
4.3, 4.6.1, 4.8.1
Friday, February 26, 2021
CD-HIT is a program for clustering large protein database at high sequence identity threshold. The program removes redundant sequences and generate a database of only the representatives. It can be applied in protein family classification, domain analysis, organizing large protein databases, improving performance of database search, and much more.