Rewriting the file “on the fly” is hard due to lack of modularity and other:

Background | The Unfurling Landscape of Structural Biology

What’s the problem that needs solving?

Data organization at the confluence of CryoET, protein and rna engineering and LLMs’ ability to manipulate types/ontologies.

  • gargantuan-scale images at angstrom-resolution
  • digital twins
  • molecular force fields
  • near-perfect polymer folding
  • ligand/binding affinity prediction

Interfaces between

  • MD
  • EM/Crystallography
  • atomic/crystallographic data encoding
  • sequence

What our part in it might be:

The implicit proposition here is that by having a common substrate (what is it? a format? a framework? a type system? a library? an application?) the friction is reduced.

Does any kind of study benefit from this improved substrate?

Yes, i think compositional and conformational heterogeneity studies would be impossible without a framework under which to track the artifacts. By that, i mean studies of type “motion of molecule X in the presence of Y” or “conformational change of Z in the presence of W”, spliceosome .

  • “modularity at biological hierarchy boundaries”
  • Who is going to use it?
  • Who is going to pay you for it?
  • What is the job here that won’t need doing in 5-10 years?
  • What is the job that will need doing doing in 5 years but doesn’t exist now?

entity_poly_seq can’t be mandatory since you can produce a mmCIF file without any polymeric molecular entity. You could write a mmCIF file with a single ion in it, no protein, no nucleic acid and it still would be a valid mmCif file while that file can’t have entity_poly_seq because… no polymer ;) I guess once you have a linear polymer in a mmCIF file, entity_poly_seq should be in there, too.__ That can’t be reflected by mmCIF dictionaries since they don’t know conditionals.___