One common application of curve-fitting occurs in the analysis of ultracentrifuge data. One of the most common uses of an ultracentrifuge is for molecular weight determination. Suppose we load the cell (a wedge-like container with circular-profile surfaces at each end) of an ultracentrifuge with a solution of a substance with an unknown molecular weight, and then spin the cell until the molecules in solution distribute themselves in the cell according to the forces that act upon them. This equilibrium arrangement will have the solute distributed in the cell in an exponential fashion with relatively more of the solute at the bottom (which is the outer end) of the cell. Therefore the concentration of the solute varies at different positions along the cell ( i.e., at different radial positions or distances from the center of the ultracentrifuge.)
Let us denote a given radial position by , measured in centimeters from the center of rotation, where the cell ranges from the top position at to the bottom position at the outer end of the cell at . Then the concentration of the solute, , measured in grams per liter, at the radial position is a function of given as:
is the concentration in grams per liter of the solute at a given reference radial position, . The value can be computed from and the total amount of solute in the cell measured in grams per liter, however, in practice, can be determined by curve-fitting.
is the given radial reference position (it is not necessary that be or , indeed is generally a poor choice).
is the apparent molecular weight of the solute.
is a certain constant, defined as: where is the partial specific volume of the solute ( is the density of the anhydrous solute, so is measured in cm/gram), is the solution density in grams/cm is the angular velocity of the rotor in radians per second, is the gas constant ( ergs/degree/mole), and is the absolute temperature.
Due to the non-ideal diffusion behavior of most materials, the observable or apparent molecular weight , at a given radial position , is approximately the following non-linear function of the actual molecular weight, approximately equals where the parameter is the so-called virial coefficient, with B > 0.
Thus is a function of , and we have:
The notation denotes a value, in the interval which is a root of the expression i.e., is a value such that MLAB provides the root operator as a built-in function ROOT. The value is any value greater than which is an upper bound for
Now, if we measure the concentration distribution with a UV scanner or with a Rayleigh interferometer attached to the ultracentrifuge and express the results as a table of points which gives the observed concentration at the radial position in the cell, then the root-form equation for given above can be used as a model to fit these points. In this way, can be determined by curve-fitting. The virial coefficient is also generally treated as a fitting parameter. Often is not exactly known, and can also be a fitting parameter. Concentration can be measured in any of a number of different units. Fundamentally, we have fringe numbers or optical densities, and such observations are readily convertible by scaling to grams-per-liter concentration units.
Here is an example showing how the fit is done in MLAB. We first define the model, and then read-in some data, set initial-guess values of the parameters to be fitted, and fit the model to the given data. Finally, we draw pictures of the data and the fitted model.
fct c(r)=root(x, 0.5,20,(ch*exp(A*M*(r^2-rh^2)/(1+M*B*x))-x)) data = read(centri,51,2) /* read in the data from file centri.dat */ q = 0.26 /* this term is 1-zbar*rho */ omega = 1047.2 /* corresponding to 10000 rpm machine rotation */ R = 8.314*10^7 /* gas concentration */ t = 298.15 /* absolute temprature */ A = (q*omega^2)/(2*R*t); /* A = 5.75119093E-6 */ rh = 6.7 /* reference radius in cm */ /* set initial guesses for B, ch, M */ B = 10^(-7) ch = 1.25; M = 60000 fit(M,B,ch), c to data final parameter values value error dependency parameter 68743.0466061385 1414.898785691 0.9928704243 M 2.283270139e-07 1.901124729e-08 0.9729368851 B 0.9799669867 0.0201841151 0.9789628624 ch 4 iterations CONVERGED best weighted sum of squares = 3.946440e-01 weighted root mean square error = 9.067387e-02 weighted deviation fraction = 9.884748e-03 R squared = 9.989925e-01 draw data, lt none, pt star, ptsize 0.01 draw points(c,(data col 1)) top title "MLAB fit of C(r) vs. r" view
Note that our model is an implicit function. One of the features of MLAB is the ability to handle such models. The MLAB curve-fitting process involves computing the derivatives of with respect to and . Here are the derivative functions generated by MLAB. When differentiating an implicit function, MLAB makes use of the eval operator defined so that denotes the value of the expression with the value substituted for within
* type c'B FUNCTION c'B(r) = EVAL(x,ROOT(x,0.5,20,ch*EXP((A*M*(r^2-rh^2)/(1+M*B*x))-x), -(((M*x*A*M*(r^2-rh^2))/((1+M*B*x)*(1+M*B*x)))*EXP((A*M*(r^2-rh^2)) /(1+M*B*x))*ch)/(((M*B*A*M*(r^2-rh^2))/((1+M*B*x)*(1+M*B*x))) *EXP((A*M*(r^2-rh^2))/(1+M*B*x))*ch+1)) * * type c'M FUNCTION c'M(r) = EVAL(x,ROOT(x,0.5,20,ch*EXP((A*M*(r^2-rh^2))/(1+M*B*x))-x), (((A*(r^2-rh^2)*(1+M*B*x)-B*x*A*M*(r^2-rh^2))/((1+M*B*x)*(1+M*B*x))) *EXP((A*M*(r^2-rh^2))/(1+M*B*x))*ch)/(((M*B*A*M*(r^2-rh^2)) /((1+M*B*x)*(1+M*B*x)))*EXP((A*M*(r^2-rh^2))/(1+M*B*x))*ch+1)) * * type c'ch FUNCTION c'ch(r) = EVAL(x,ROOT(x,0.5,20,ch*EXP((A*M*(r^2-rh^2))/(1+M*B*x))-x), EXP((A*M*(r^2-rh^2))/(1+M*B*x))/(((M*B*A*M*(r^2-rh^2))/((1+M*B*x)*(1+M*B*x))) *EXP((A*M*(r^2-rh^2))/(1+M*B*x))*ch+1))
If we have several solutes with unknown molecular weights, then the equation for the combined concentration distribution is merely the sum of the separate concentration distributions. Thus, curve-fitting can be used to determine several molecular weights simultaneously.
In general, we have: where the concentration of the th solute is is the apparent molecular weight of the th solute, and is the grams per liter concentration of the th solute at the reference position , and is the usual constant described above, particularized for the th solute.
As before, is a function of and the true molecular weight, namely: approximately equals In fact this is an approximation to the relationship:
Usually assuming each value is the same, namely for and for is adequate, and this results in the simple form used above.
Suppose, for example, we have two solutes with molecular weights and Then
It is often of interest to compute
the amounts of each component present in the ultracentrifuge
cell directly from the gram-per-liter concentration profile function.
This can be done by using the
equation:
MLAB can be used to compute the equilibrium constant for the simple binding reaction occurring within the ultracentrifuge cell in many cases.
Let denote the grams-per-liter concentration of the first solute substance at radial position , let denote the grams-per-liter concentration of the second solute substance at radial position , and let denote the grams-per-liter concentration of the composite third solute substance at radial position . Then the combined concentration distribution function is where the grams-per-liter concentration distribution of the th solute is for
Also, at any fixed radial position
Let
Generally we cannot successfully estimate the equilibrium constant
There are several more ``rules of thumb'' that we should remember
when estimating the equlibrium constant
Here is an example of using MLAB to estimate the equilibrium
constant for a simple binding reaction.
Note we have also used the conservation-of-mass relations
for substance 1 (X) and for substance 2 (Y). Choosing
parameters to fit the computed masses of X and Y to the known
initially-loaded masses is a valuable device to improve our
parameter estimates. This can only be done, however, when
accurate values of
* rm = 6.7 /* minimum radius in cm */ * rb = 7.2 /* maximum radius in cm */ * rh = rm /* reference radius in cm */ * vc = 2*asin(.33/(2*rb)) /*angle of the cell wedge */ * vc=vc*1.2 /*volume constant, total solution vol.= (vc/2)*(rb^2-rm^2) cm^3 */ * r1 = rm:rb:.01 /* radial positions for curve-plotting */ * qv = 0.26 /* this term is 1 - zbar*rho */ * omega = 1047.2 /* corresponding to 10000 rpm machine rotation */ * R = 8.314*10^7 /* gas concentration */ * t = 298.15 /* absolute temperature */ * A1 = (qv*omega^2)/(2*R*t); A2 = A1 /* Both A1 and A2 = 5.75119093E-6 */ * /* predetermined values for known parameters */ * B1 = 2*10^(-8); B2=10^(-9); * M1 = 67000; M2=6800 * constraints q={90, ch2>0, ch1+ch2<1.4} * fct g1(r,x)=ch1*exp(A1*M1*(r^2-rh^2)/(1+M1*B1*x)) * fct g2(r,x)=ch2*exp(A2*M2*(r^2-rh^2)/(1+M2*B2*x)) * fct f(c1,c2)=c1+c2+exp(lk)*c1*c2*(M1+M2)/(M1*M2) * fct c(r) = root(x,(ch1+ch2)/3,100000, f(g1(r,x),g2(r,x))-x) * fct c1(r)=g1(r,c(r)) * fct c2(r)=g2(r,c(r)) * fct c3(r)=c3h(r,c(r)) * fct c3h(r,x)=x-g1(r,x)-g2(r,x) * /*tm1, tm2 = total mass in grams of substances 1 and 2 */ * fct tm1()=integral(r,rm,rb,tm1z(r,c(r))*vc*r) * fct tm1f(c1,c2,c)=c1+(M1/(M1+M2))*(c-c1-c2) * fct tm1z(r,x) = tm1f(g1(r,x),g2(r,x),x) * fct tm2()=integral(r,rm,rb,tm2z(r,c(r))*vc*r) * fct tm2f(c1,c2,c)=c2+(M2/(M1+M2))*(c-c1-c2) * fct tm2z(r,x) = tm2f(g1(r,x),g2(r,x),x) * /*specify the total mass in grams of substances 1 and 2 */ * mass1[1]= .113689839; * mass2[1]= .00613855051 * /* read-in the r vs. c(r) data to be modeled. */ * data = read(data.dat,1000,2) * n=nrows(data); * * /* define the weights for fitting and normalize them to sum to 1 */ * zw=ewt(data); zw=zw/rowsum(zw) * /*set initial guesses for k, ch1,and ch2 */ * k=10000; lk=log(k) * ch1=.15; * ch2=.035 * fit(lk,ch1,ch2), c to data with weight zw, tm1 to mass1, tm2 to mass2,\ : constraints q final parameter values value error dependency parameter 10.92859862 0.1533653124 0.9980490348 LK 0.09847696449 0.003231742687 0.9986427864 CH1 0.01982622037 0.001226384415 0.9914474897 CH2 4 iterations CONVERGED best weighted sum of squares = 2.386287e-05 weighted root mean square error = 6.908382e-04 weighted deviation fraction = 5.217909e-03 R squared = 9.998792e-01 no active constraints * exp(lk) /* k=exp(lk) */ = 55748.1007 * tm1() /* estimated total mass in grams of X */ = .113740813 * tm2() /* estimated total mass in grams of Y */ = 6.25476406E-3 * draw data, lt none, pt star, ptsize 0.01 * draw points(c,r1) * top title "fit of c(r) to data" * draw points(c1,r1), color red * draw points(c2,r1), color green * draw points(c3,r1), color yellow * title "Y" at (7.15,.95) world * title "Z" at (7.18,.36) world * title "X" at (7.15,.095) world * title "X+Y'2T='RZ" at (6.75,1.55) world * view
When we have self-associating molecules in an ultracentrifuge that self-associate in a more complex manner than simple dimer formation, we can still determine the monomer molecular weight, and the equilibrium constants of the known association reactions by curve-fitting in MLAB.
For example, suppose we have the self-associating reactions:
If we let these reactions take place in an ultracentrifuge cell, and
measure the weight-average molecular weight,
We have
Note that
Thus, the corresponding equilibrium constants for gram-per-liter concentrations can be defined as:
Define
Now,
Also,
Due to the non-ideal gas behavior of most materials, the observed
weight-average molecular weight
Thus, given data:
*FUNCTION MA(C) = NID(C,M1*MWX(C,E12,E14,E18),B) *FUNCTION NID(C,MW,B) = MW/(1+B*C*MW) *FUNCTION MWX(C,A1,A2,A3) = P(C1(C,A1,A2,A3),A1,A2,A3)/C *FUNCTION P(C1,A1,A2,A3) = C1+2*A1*C1^2+4*A2*C1^4+8*A3*C1^8 *FUNCTION C1(C,A1,A2,A3) = ROOT(X,0,C,X+A1*X^2+A2*X^4+A3*X^8-C)
In order to obtain the data
Now one can take a few neighboring observations, say five points
Alternatively, one may use the
In MLAB we have:
*FUNCTION C(R) = P(C1(R,B,M1,C1H),E12,E14,E18) *FUNCTION P(X,A1,A2,A3) = X+A1*X^2+A2*X^4+A3*X^8 *FCT C1(R,B,M1,CH1) = ROOT(Y,0,MAXC1,Y-CH1*EXP(A*(R*R-RH*RH)*M1/(1+B*M1*Y)))
Although using this model to fit
In either case,
Suppose we have a non-specific isodesmic ``stacking'' reaction involving
a monomer
An isodesmic association is one where the free energy for adding one
more monomer,
As before, let
Now,
Thus
Also,
Now,
Also, since
The equations above are examples of the type of models involved in the analysis of data obtained from the use of an ultracentrifuge. There are many other useful experiments, with other equations (or differential equations) describing the phenomena involved. A good reference on the use of the ultracentrifuge and the associated mathematical modeling is the chapter by K.E.\ Van Holde in the book The Proteins, 3rd edition, Vol.~1, 1975. Another source is {\em Foundations of Ultracentrifugical Analysis}, by Hiroshi Fujita, Wiley, 1975.