—- TO INVESTIGATE FURTHER —- #### OVERALL #### OVERALL
Intro Topics
- outline the rough product that you are aiming for, an integrative platform that has a generally applicable library in the backend and a visualization platform.
- it’s important to index all avaialble data in a way that is easily accessible and searchable
- would be nice for the method of indexing to be future-proof/extendable to new datasets.
- enumerate the datatypes/databases that you’ll be speaking about to draw some boundaries
Sequence Block
Isotypes characterization by species & cell types PTMS characterization by species, cell types and isotypes
- Start with broad family HMMs (alpha, beta, etc.) For each family.
- Collect all matching sequences
- Perform clustering (try several thresholds)
- Build phylogenetic trees to identify natural groups
- Extract C-terminal regions and cluster them separately
- Compare these different clustering results
Where clusters agree across methods, define these as strong sub-families Build HMMs for these sub-families Test with known examples to evaluate discrimination power Iteratively refine as needed
The above for families of tubulins and, separately, for families of MAPs. Stathmin-lke
- PTMs possibly included into clustering/families/searches via ProForma Notation
- search/discovery in a single modality instead of searching across various dbs, fasta files, mass spec records etc.
Helices as domain “views” into families.
Control Layer
- introduce neo4j graph database as a way to keep track of semantic data and connect it to the structural models
Structural Domain
Index PDB structures and incorporate them into the graph according to the sequence block.
- introduce capabilities of molstar as the the visualizing applications.
- Raise interactivity-vs-generality tradeoff
- search/navigation/visualization/comparison of the above
- comparison/alignment (across what? sequences)
- ligand binding sites
Then go more indepth to augmneting each of the individual following aspects of the data at scale:
Creating 3D structures of proteins from from sequences via AlphaFold
PTM Reconstruction workflows
Reconstructing PTMs on top of a template structure via:
- Rosetta/
- PSIptm
Ligand Binding Sites
Fragments
Ligands Classification via
Binking Pokets, Sites
Applications for Model building:
- HMM families-based deep learning models for automating CryoEM model building :
https://www.nature.com/articles/s41586-024-07215-4
https://www.biorxiv.org/content/10.1101/2025.03.16.643561v1
https://www.biorxiv.org/content/10.1101/2024.11.13.623164v1.full.pdf