pyhande.analysis¶
Analysis of data from FCIQMC and CCMC calculations.
- pyhande.analysis.projected_energy(reblock_data, covariance, data_length, sum_key='\\sum H_0j N_j', ref_key='N_0', col_name='Proj. Energy')¶
Calculate the projected energy estimator and associated error.
The projected energy estimator is given by
\[E = \frac{\sum H_0j N_j}{N_0}\]The numerator and denominator are correlated and so their covariance must be taken into account.
- Parameters:
reblock_data (
pandas.DataFrame
) – reblock data for (at least) the numerator and denominator in the projected energy estimator.covariance (
pandas.DataFrame
) – covariance at each reblock iteration between (at least) the numerator and denominator in the projected energy estimator.data_length (
pandas.DataFrame
) – number of data points in each reblock iteration.sum_key (string) – column name in reblock_data containing \(\sum H_0j N_j\), i.e. the sum of the population weighted by the Hamiltonian matrix element with the trial wavefunction.
ref_key (string) – column name in reblock_data containing \(N_0\), i.e. the population of the trial wavefunction (often/originally just a single determinant).
- Returns:
proje – The projected energy estimator at each reblock iteration.
- Return type:
See also
- pyhande.analysis.qmc_summary(data, keys=('\\sum H_0j N_j', 'N_0', 'Shift', 'Proj. Energy'), summary_tuple=None)¶
Summarise a reblocked data set by the optimal block.
- Parameters:
data (
pandas.DataFrame
) – reblocked data (i.e. data with the reblock iteration as the index).keys (list of strings) – columns (by top-level index) of the data table to inspect. Each top-level column must contain an optimal block column.
summary_tuple ((
pandas.DataFrame
, list of strings)) – Optionally append summary data to this tuple. Allows repeated calling of this function.
- Returns:
opt_data (
pandas.DataFrame
) – Data for each column from the optimal block size of that column.no_opt (list of strings) – list of columns for which no optimal block size was found.
- pyhande.analysis.extract_pop_growth(data, ref_key='N_0', shift_key='Shift', min_ref_pop=10)¶
Select QMC data during which the population was allowed to grow.
We define the region of population growth as the period in which the shift is held constant.
- Parameters:
data (
pandas.DataFrame
) – HANDE QMC data.pyhande.extract.extract_data_sets()
can be used to extract this from a HANDE output file.ref_key (string) – column name in reblock_data containing the number of psips on the reference determinant.
shift_key (string) – column name in reblock_data containing the shift.
min_pop (int) – discard data entries with fewer than min_pop on the reference.
- Returns:
pop_data – The subset of data prior to the shift being varied.
- Return type:
- pyhande.analysis.plateau_estimator(data, total_key='# H psips', ref_key='N_0', shift_key='Shift', min_ref_pop=10, pop_data=None)¶
Estimate the (plateau) shoulder from a FCIQMC/CCMC calculation.
The population on the reference starts to grow exponentially during the plateau, whilst the total population grows exponentially from the start of the calculation before stabilising (perhaps only briefly) during the plateau phase. As a result, the ratio of the total population to the population on the reference is at a maximum at the start of the plateau.
The shoulder estimator is defined to be mean of the ten points with the smallest proportion of the population on the reference (excluding points when the population drops below min_pop excips (psips). The shoulder height is the total population at this point.
Credit to Alex Thom for original implementation.
- Parameters:
data (
pandas.DataFrame
) – HANDE QMC data.pyhande.extract.extract_data_sets()
can be used to extract this from a HANDE output file.total_key (string) – column name in reblock_data containing the total number of psips.
ref_key (string) – column name in reblock_data containing the number of psips on the reference determinant.
shift_key (string) – column name in reblock_data containing the shift.
min_ref_pop (int) – exclude points with less than min_ref_pop on the reference.
pop_data (
pandas.DataFrame
) – The subset of data prior to the shift being varied. Calculated if not supplied from extract_pop_growth.
- Returns:
plateau_data – An estimate of the shoulder (plateau) from a FCIQMC (CCMC) calculation, along with the associated standard error.
- Return type:
- pyhande.analysis.plateau_estimator_hist(data, total_key='# H psips', shift_key='Shift', pop_data=None, bin_width_fn=None)¶
Estimate the plateau height via a histogram of the population.
The population (approximately) stabilises during the plateau phase. By taking a histogram of the population, the plateau can be estimated from the histogram bin with greatest frequency. Due to the exponential population growth outside of the plateau, we histogram the logarithm of the population.
This tends to give similar numbers to shoulder_estimator, though may be less useful for shoulder-like plateaus. Detecting a plateau automatically is tricky so having multiple approaches for comparison helps with corner cases.
Used in [Shepherd14].
Credit to James Shepherd for the idea and original (perl) implementation.
- Parameters:
data (
pandas.DataFrame
) – HANDE QMC data.pyhande.extract.extract_data_sets()
can be used to extract this from a HANDE output file.total_key (string) – column name in reblock_data containing the total number of psips.
shift_key (string) – column name in reblock_data containing the shift.
pop_data (
pandas.DataFrame
) – The subset of data prior to the shift being varied. Calculated if not supplied from extract_pop_growth.bin_width_fn (function) – A function which calculates the bin width in the histogram based upon pop_data. 12500/len(data)^2 (obtained empirically) is used if not supplied.
- Returns:
plateau – An estimate of the population at the plateau.
- Return type:
References
- Shepherd14
J.J. Shepherd et al., Phys. Rev. B 90, 155130 (2014).
- pyhande.analysis.inefficiency(opt_block, dtau, iterations, sum_key='\\sum H_0j N_j', ref_key='N_0', total_key='# H psips', proje_key='Proj. Energy')¶
Estimate the inefficiency of a calculation from the blocked data.
The statistical error of an ideal FCIQMC calculation decreases with the square-root of number of steps, \(N\), total number of particles, \(N_p\) and (at sufficiently low values) timestep, \(\delta\tau\).
We define the inefficiency, \(a\), as a quantity independent of these, which depends on purely the algorithm and system studied, and can be used to determine the expected runtime to achieve a given error. We provide an estimate of this from the best estimate of the error in the projected energy, \(\sigma_E\):
\[a = \sigma_E \sqrt{N_p N \delta\tau}\]Error bars are (over)-estimated with a simple error propagation, but since no information about the covariance of the error estimates is available, this will always be an overestimate.
Used in [Vigor16].
Credit to William Vigor for the original pyhande implementation.
- Parameters:
opt_block (
pandas.DataFrame
) – Optimally blocked HANDE QMC data. func:pyhande.analysis.qmc_summary can be used to extract this from reblocked HANDE data.dtau (float) – length of an imaginary time timestep.
iterations (integer) – number of iterations (timeteps) in the reblocked data.
sum_key (string) – column name in reblock_data containing \(\sum H_0j N_j\), i.e. the sum of the population weighted by the Hamiltonian matrix element with the trial wavefunction.
ref_key (string) – column name in reblock_data containing \(N_0\), i.e. the population of the trial wavefunction (often/originally just a single determinant).
total_key (string) – column name in reblock_data containing the total number of psips.
proje_key (string) – key for projected energy index in opt_block.
- Returns:
ineff – A data frame with index ‘inefficiency’ and columns ‘mean’ and ‘standard error’ or None if the appropriate data is not available.
- Return type:
References
- Vigor16
Vigor, et al., J. Chem. Phys. 144, 094110 (2016); doi: 10.1063/1.4943113