\(\renewcommand\AA{\text{Å}}\)
FASTA format for DNA/RNA and amino acid sequences¶
periodictable.fasta
¶
Biomolecule support.
Molecule
lets you define biomolecules with labile hydrogen atoms
specified using H[1] in the chemical formula. The biomolecule object creates
forms with natural isotope ratio, all hydrogen and all deuterium. Density can be
provided as natural density or cell volume. A %D2O contrast match value is
computed for matching the molecule SLD in the presence of labile hydrogens.
Molecule.D2Osld()
computes the neutron SLD for the solvated molecule in a
%D2O solvent.
Sequence
lets you read amino acid and DNA/RNA sequences from FASTA
files.
Tables for common molecules are provided[1]:
AMINO_ACID_CODES : amino acids indexed by FASTA code
RNA_CODES, DNA_CODES* : nucleic bases indexed by FASTA code
RNA_BASES, DNA_BASES* : individual nucleic acid bases
NUCLEIC_ACID_COMPONENTS, LIPIDS, CARBOHYDRATE_RESIDUES
Neutron SLD for water at 20C is also provided as H2O_SLD and D2O_SLD.
For unmodified protein an H and an OH are added for terminations.
Assumes that proteins were created in an environment with the usual H/D isotope ratio on the nonlabile hydrogen.
The value of residue volumes differs from that used by the bio scattering calculators from ISIS and ORSO, which will lead to different values for SLD. There are small differences for the number of hydrogen in His and Cys residues, where one table considers them present but labile and the other considers them absent.
DNA and RNA residues from the source[1] included sodium in the chemical formula, but these have been removed and will not appear in the sequence. Volumes for DNA and RNA residues come from Buckin (1989) as reported in Durchlag (1997), with correction for phosphorylation and dehydration. The correction value of 30.39 comes from comparison of the volume given in Harroun (2006) to the volumes of the RNA ACGU and DNA T nucleosides given in Buckin (1989) after correcting for units. Harroun doesn’t give volumes for DNA AGC nucleosides despite them being different (especially guanosine). This code uses the values from Buckin for these as well, rather than the RNA nucleoside values given in Harroun. Note that the computed density for equal parts AGCT is 1.67, compared to the measured average of 1.70 given in Arrighi (1970).
[1] Perkins, S.J. (1988). Chapter 6 X-Ray and Neutron Solution Scattering, in: New Comprehensive Biochemistry. Elsevier, pp. 143-265. https://doi.org/10.1016/S0167-7306(08)60575-X
[2] Buckin, V. A., B. I. Kankiya, and R. L. Kazaryan (1989). Hydration of nucleosides in dilute aqueous solutions: ultrasonic velocity and density measurements. Biophysical chemistry 34.3 211-223. https://doi.org/10.1016/0301-4622(89)80060-2
[3] Durchschlag, H. and Zipper, P. (1997). Calculation of partial specific volumes and other volumetric properties of small molecules and polymers. Journal of Applied Chemistry 30 803-807. https://doi.org/10.1107/S0021889897003348
[4] Harroun, T.A., Wignall, G.D., Katsaras, J. (2006). Neutron Scattering for Biology. In: Neutron Scattering in Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-29111-3_1
[5] Arrighi, F.E., Mandel, M., Bergendahl, J. et al. (1970). Buoyant densities of DNA of mammals. Biochem Genet 4, 367–376. https://doi.org/10.1007/BF00485753
AMINO_ACID_CODES:
-: gap
A: alanine
B: aspartic acid/asparagine
C: cysteine
D: aspartic acid
E: glutamic acid
F: phenylalanine
G: glycine
H: histidine
I: isoleucine
J: leucine/isoleucine
K: lysine
L: leucine
M: methionine
N: asparagine
P: proline
Q: glutamine
R: arginine
S: serine
T: threonine
V: valine
W: tryptophan
X: any
Y: tyrosine
Z: glutamic acid/glutamine
NUCLEIC_ACID_COMPONENTS:
adenine: C5H2H[1]2N5
cytosine: C4H2H[1]2N3O
deoxyribose: C5H7O2
guanine: C5HH[1]3N5O
phosphate: NaPO3
ribose: C5H6H[1]O3
thymine: C5H4H[1]N2O2
uracil: C4H2H[1]N2O2
CARBOHYDRATE_RESIDUES:
Fuc (terminal): C6H7H[1]3O4
Gal: C6H7H[1]3O5
GalNAc: C8H10H[1]3NO5
Glc: C6H7H[1]3O5
GlcNAc: C8H10H[1]3NO5
Man: C6H7H[1]3O5
Man (terminal): C6H7H[1]4O5
NeuNac (terminal): C11H11H[1]5NO8
chondroitin sulphate: C14H15H[1]4NO14SNa
hyaluronate: C14H15H[1]5NO11Na
keratan sulphate: C14H17H[1]5NO13SNa
LIPIDS:
DLPE: C29H55H[1]3NO8P
DMPC: C36H72NO8P
DMPC-D52: C36H20D52NO8P
cholesteral: C27H45H[1]O
methylene: CH2
methylene-D: CD2
oleate: C45H78O2
palmitate ester: C39H77H[1]2N2O2P
phospholipid headgroup: C10H18NO8P
triglyceride headgroup: C6H5O6
trioleate form: C57H104O6
RNA_BASES:
A:adenosine
C:cytidine
G:guanosine
T:uridine
DNA_BASES:
A:adenosine
C:cytidine
G:guanosine
T:thymidine
- class periodictable.fasta.Molecule(name, formula, cell_volume=None, density=None, charge=0)¶
Bases:
object
Specify a biomolecule by name, chemical formula, cell volume and charge.
Labile hydrogen positions should be coded using H[1] rather than H. H[1] will be substituded with H for solutions with natural water or D for solutions with heavy water. Any deuterated non-labile hydrogen can be marked with D, and they will stay as D regardless of the solvent.
name is the molecule name.
formula is the chemical formula as string or atom dictionary, with H[1] for labile hydrogen.
cell_volume is the volume of the molecule. If None, cell volume will be inferred from the natural density of the molecule. Cell volume is assumed to be independent of isotope.
density is the natural density of the molecule. If None, density will be inferred from cell volume.
charge is the overall charge on the molecule.
Attributes
labile_formula is the original formula, with H[1] for the labile H. You can retrieve the deuterated from using:
molecule.labile_formula.replace(elements.H[1], elements.D)
natural_formula has H substituted for H[1] in labile_formula.
D2Omatch is percentage of D2O by volume in H2O required to match the SLD of the molecule, including substitution of labile hydrogen in proportion to the D/H ratio in the solvent. Values will be outside the range [0, 100] if the contrast match is impossible.
sld/Dsld are the the SLDs of the molecule with H[1] replaced by naturally occurring H/D ratios and pure D respectively.
mass/Dmass are the masses for natural H/D and pure D respectively.
charge is the charge on the molecule
cell_volume is the estimated cell volume for the molecule
density is the estimated molecule density
Change 1.5.3: drop Hmass and Hsld. Move formula to labile_formula. Move Hnatural to formula.
- D2Osld(volume_fraction=1.0, D2O_fraction=0.0)¶
Neutron SLD of the molecule in a deuterated solvent.
Changed 1.5.3: fix errors in SLD calculations.
- class periodictable.fasta.Sequence(name, sequence, type='aa')¶
Bases:
Molecule
Convert FASTA sequence into chemical formula.
name sequence name
sequence code string
type is one of:
aa: amino acid sequence dna: dna sequence rna: rna sequence
Note: rna sequence files treat T as U and dna sequence files treat U as T.
- D2Osld(volume_fraction=1.0, D2O_fraction=0.0)¶
Neutron SLD of the molecule in a deuterated solvent.
Changed 1.5.3: fix errors in SLD calculations.
- static load(filename, type=None)¶
Load the first FASTA sequence from a file.
- static loadall(filename, type=None)¶
Iterate over sequences in FASTA file, loading each in turn.
Yields one FASTA sequence each cycle.
- periodictable.fasta.D2Omatch(Hsld, Dsld)¶
Find the D2O% concentration of solvent such that neutron SLD of the material matches the neutron SLD of the solvent.
Hsld, Dsld are the SLDs for the hydrogenated and deuterated forms of the material respectively, where D includes all the labile protons swapped for deuterons. Water SLD is calculated at 20 C.
Note that the resulting percentage is only meaningful between 0% to 100%. Beyond 100% you will need an additional constrast agent in the 100% D2O solvent to increase the SLD enough to match.
Deprecated since version 1.5.3: Use periodictable.nsf.D2O_match(formula) instead.
Change 1.5.3: corrected D2O sld, which will change the computed match point.
- periodictable.fasta.fasta_table()¶
- periodictable.fasta.isotope_substitution(formula, source, target, portion=1)¶
Substitute one atom/isotope in a formula with another in some proportion.
formula is the formula being updated.
source is the isotope/element to be substituted.
target is the replacement isotope/element.
portion is the proportion of source which is substituted for target.
Deprecated since version 1.5.3: Use formula.replace(source, target, portion) instead.
- periodictable.fasta.read_fasta(fp)¶
Iterate over the sequences in a FASTA file.
Each iteration is a pair (sequence name, sequence codes).
Change 1.5.3: Now uses H[1] rather than T for labile hydrogen.
- periodictable.fasta.test()¶
- periodictable.fasta.D2O_SLD = 6.390934026937301¶
real portion of D2O sld at 20 C Change 1.5.2: Use correct density in SLD calculation
- periodictable.fasta.H2O_SLD = -0.5595112084983276¶
real portion of H2O sld at 20 C