Dataset

[1] test/enzyme.fasta

    fasta sequence of test enzyme

[2] test/enzyme.tsv

    tsv table of test enzyme with the following columns:
    - UniProt accession
    - EC number. If a protein contains multiple enzymes,
      it will be spread over multiple lines.
    - evidence code
    - ChEBI of substrates and products
    - ligand binding residues
    - active site residues

[3] test/enzyme.class.txt

    classification of test enzymes into easy/medm/hard

[4] test/nonenzyme.fasta

    fasta sequence of non-enzyme test protein

[5] test/nonenzyme.tsv

    tsv table of test non-enzyme proteins with the following columns:
    - UniProt accession
    - non-enzyme label 0.-.-.-
    - list of Molecular Function GO terms

[6] test/pdb
    
    alphafold2 predicted structure for test set

[7] train/enzyme.fasta
    
    fasta sequence of train enzyme

[8] train/enzyme.tsv

    tsv table of train enzyme

[9] train/nonenzyme.fasta

    fasta sequence of train non-enzyme

[10] train/nonenzyme.tsv
    
    tsv table of train non-enzyme proteins with the following columns:
    - UniProt accession
    - Molecular Function GO term
    - name of GO term
    - evidence code
