A collaboration between EBI and Google Research has annotated ~49 million previously uncharacterized proteins that are now part of the UniProt database. The annotations are based on a Protein Natural Language Model called 'ProtNLM'. Three datasets were created to facilitate use by the maize research community: --------------------------------------------- Uniprot_ProtNLM_maize_annotations.txt A tab-separated file filtered for maize genes. The original annotation file (https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/evidencer_sorted.csv.gz) was filtered for only annotations of UniProt IDs with B73 RefGen_v5 gene models at MaizeGDB. --------------------------------------------- Uniprot_ProtNLM_maize_v5.txt A tab-separated file for B73 genes and annotations with the following columns: B73 RefGen_v5 gene model ID Previous UniProt description New ProtNLM description --------------------------------------------- Uniprot_ProtNLM_maize_v5_uncharacterized.txt A tab-separated file for B73 genes with previous "Uncharacterized protein" annotations with the following columns: B73 RefGen_v5 gene model ID Previous UniProt description New ProtNLM description --------------------------------------------- ProtNLM description: Website: https://www.uniprot.org/help/ProtNLM Gane, A. et al. ProtNLM: Model-based Natural Language Protein Annotation (2022) Preprint: https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/protnlm_preprint_draft.pdf Last update 10/13/2022 - C. Andorf