pyhande.data_preparing

Mappings/preparation of output data column names.

pyhande.data_preparing.hande_ccmc.fciqmc

CCMC and FCIQMC HANDE data preparation for analysis.

class pyhande.data_preparing.hande_ccmc_fciqmc.PrepHandeCcmcFciqmc

Bases: AbsDataPreparator

Prepare HANDE CCMC/FCIQMC data for analysis.

property observables: Dict[str, str]

Access observables, key mapping.

Raises:

AttributeError – If data has not been prepared yet.

Returns:

Map of observables property to column/observables name.

Return type:

Dict[str,str]

property data: List[DataFrame]

Access (prepared) data.

Raises:

AttributeError – If data has not been prepared yet.

Returns:

QMC Data. Cleaned list over merged calculations.

Return type:

List[pd.DataFrame]

property complex_data: bool

True if data is complex.

Raises:

AttributeError – If preparation has not been done yet.

Return type:

bool

property replica_data: bool

True if replica tricks were used.

Raises:

AttributeError – If preparation has not been done yet.

Return type:

bool

exe(data: List[DataFrame], make_copy: bool = True)

Prepare data; deal with complex, replica and add inst. proje.

Parameters:
  • data (List[pd.DataFrame]) – List of output data. Should be all of same type (complex/non-complex, replica-tricks/no replica tricks, calc_type).

  • make_copy (bool, optional) – If true, deepcopy data so that passed in data are not altered by any changes here. The default is True.

pyhande.error_analysing

Classes for analysis of CCMC/FCIQMC observables.

pyhande.error_analysing.analysis_utils

Shared helper functions for analysers.

pyhande.error_analysing.analysis_utils.check_data_input(data: List[DataFrame], cols: Optional[List[str]], eval_ratio: Optional[Dict[str, str]], hybrid_col: Optional[str], start_its: Union[List[int], str], end_its: Optional[List[int]]) None

Check data input against other, previous, input.

Parameters:
  • data (List[pd.DataFrame]) – List of QMC data.

  • cols (Union[List[str], Optional[List[str]]]) – Columns to be analysed when blocking/ finding starting iteration with ‘blocking’.

  • eval_ratio (Optional[Dict[str, str]]) – Contains information to evaluate ratio (e.g. projected energy) when doing blocking analysis.

  • hybrid_col (Union[Optional[str], str]) – Column name when doing hybrid analysis/ finding starting iteration with ‘mser’.

  • start_its (Union[List[int], str]) – Starting iterations for analysis or information on type of find_starting_it function.

  • end_its (Optional[List[int]]) – Last iterations for analysis.

Raises:

ValueError – If cols, eval_ratio, hybrid_cols are specified but not in data respectively. If start_its/end_its are lists of iterations but the list has a different length than data.

pyhande.error_analysing.analysis_utils.set_cols(observables: Dict[str, str], it_key: str, cols: Optional[List[str]], replica_col: str, eval_ratio: Optional[Dict[str, str]], hybrid_col: Optional[str]) Tuple[str, Optional[List[str]], str, Optional[Dict[str, str]], str]

Set various columns and observable names.

Either the input is simply returned or set to observables[input] if input starts with ‘obs:’.

Parameters:
  • observables (Dict[str, str]) – Map of key to column/observable name, e.g. {‘ref_key’: ‘N_0’}

  • it_key (str) – Key or actual name for iterations.

  • cols (Union[Optional[List[str]], List[str]]) – Keys or actual names of columns/observables to be analysed in blocking.

  • replica_col (str) – Key or actual name for replica column.

  • eval_ratio (optional[Dict[str, str]]) – Keys or actual names of elements in observable ratio to be evaluated.

  • hybrid_col (Union[Optional[str], str]) – Key or actual name of column/observable to be analysed in hybrid analysis.

Return type:

(Set) values from above (except observables).

pyhande.error_analysing.blocker

Analyse Monte Carlo correlated output using reblocking.

class pyhande.error_analysing.blocker.Blocker(it_key: str, cols: List[str], replica_col: str, eval_ratio: Optional[Dict[str, str]] = None, hybrid_col: Optional[str] = None, start_its: Union[List[int], str] = 'blocking', end_its: Optional[List[int]] = None, find_start_kw_args: Optional[Dict[str, Union[bool, float, int]]] = None)

Bases: AbsErrorAnalyser

Reblock specified columns from HANDE output using pyblock.

Can be used instead of HybridAnalyser.

This uses pyblock [1] to do reblocking, see Ref. [2] for more details on the reblocking algorithm.

[1] - pyblock, James Spencer, http://github.com/jsspencer/pyblock [2] - Flyvbjerg, H., Petersen, H. G., 1989, J. Chem. Phys. 91, 461.

classmethod inst_hande_ccmc_fciqmc(start_its: Union[List[int], str] = 'blocking', end_its: Optional[List[int]] = None, find_start_kw_args: Optional[Dict[str, Union[bool, float, int]]] = None)

Return Blocker instance for a HANDE CCMC/FCIQMC calculation.

Parameters:

__init__(). (See) –

Returns:

Instance of the Blocker class, customised for a HANDE CCMC/ FCIQMC calculation.

Return type:

Blocker

property start_its: List[int]

Access _start_its attribute if available, else error.

property end_its: List[int]

Access _end_its attribute if available, else error.

property opt_block: DataFrame

Access _opt_block attribute if available. Else error.

property no_opt_block: Union[List[List[str]], List[List[List[str]]]]

Access _no_opt_block attribute if available. Else error.

property reblock: List[DataFrame]

Access _reblock attribute if available. Else raise error.

property data_len: List[DataFrame]

Access _data_len attribute if available. Else raise error.

property covariance: List[DataFrame]

Access _covariance attribute if available. Else error.

exe(data: List[DataFrame], observables: Dict[str, str]) None

Do reblocking (first finding starting iteration if required).

Parameters:
  • data (List[pd.DataFrame]) – HANDE calculation Monte Carlo output data.

  • observables (Dict[str, str]) – Mapping of column key to column name, e.g. ‘ref_key’: ‘N_0’. The default is None. If any of it_key, cols, eval_ratio, hybrid_col, replica_col were instantiated as ‘obs:key’ to be overwritten with observables[‘key’], observables can’t be None and those keys have to be present.

Raises:

ValueError – If not all columns to be blocked appear in ‘data’ or if the length of ‘data’ is different to length of ‘start_its’ or ‘end_its’ if they are defined.

pyhande.error_analysing.find_starting_iteration

Functions to find starting iteration for analysis.

pyhande.error_analysing.find_starting_iteration.find_starting_iteration_blocking(data: DataFrame, end_it: int, it_key: str, cols: List[str], hybrid_col: str, start_max_frac: float = 0.8, grid_size: int = 10, number_of_reblocks_to_cut_off: int = 1, show_graph: bool = False) int

Find the best iteration to start analysing CCMC/FCIQMC data.

It first excludes data before not all data in all columns specified in cols are varying and after end_it. Then it searches for the starting iteration using an adaptive grid search on a log scale since we assume that the starting iteration is closer to the beginning than the end of the available data. During the search, a loss function is minimised. The loss is the fractional error over number of data involved in the blocking for each data column in cols.

This implementation is based on an older version in pyhande/lazy.py. V. A. Neufeld thanks the EPSRC CDT CMMS cohort 1 in Cambridge for helpful discussions.

Warning

Use with caution, check whether output is sensible and adjust parameters if necessary.

Parameters:
  • data (pd.DataFrame) – QMC data, e.g. as extracted by extract.py. Has to contain columns with key it_key and columns in cols, used for blocking.

  • end_it (int) – Last iteration to be considered in blocking.

  • it_key (str) – Key of column containing MC iterations.

  • cols (List[str]) – List of keys of columns involved in blocking.

  • hybrid_col (str) – Ignored here, for common interface.

  • start_max_frac (float, optional) – The start iterations found has to be in the first start_max_frac fraction of the data between the point where all columns in cols have started varying and end_it. This prevents finding a starting iteration too close to the end. Has to be between 0.00001 and 1.0. The default is 0.8.

  • grid_size (int, optional) – Number of logarithmically spaced grid points per run. The default is 10.

  • number_of_reblocks_to_cut_off (int, optional) – To be extra sure, cut off a few reblocks to make sure data after starting iteration is truly in equilibrium. Cannot be negative. The default is 1.

  • show_graph (bool, optional) – If True, show a graph showing the columns with key cols[0] as a function of iterations. The suggested starting iteration is highlighted. The default is False.

Raises:
  • ValueError – If start_max_frac or number_of_reblocks_to_cut_off are out of range.

  • RuntimeError – If not all columns with keys in cols have started varying in data or if suitable starting iteration was not found.

Returns:

Suggestion iteration in columns it_key where analysis should start.

Return type:

int

pyhande.error_analysing.find_starting_iteration.find_starting_iteration_mser_min(data: DataFrame, end_it: int, it_key: str, cols: List[str], hybrid_col: str, start_max_frac: float = 0.84, n_blocks: int = 100) int

Estimate starting iteration with MSER minimization scheme.

Warning

Use with caution, check whether output is sensible and adjust parameters if necessary.

This function gives an optimal estimation of the starting interations based on MSER minimization heuristics. This methods decides the starting iterations \(d\) as minimizing an evaluation function MSER(\(d\)) = \(\Sigma_{i=1}^{n-d} ( X_{i+d} - X_{mean}(d) ) / (n-d)^2\). Here, \(n\) is length of time-series, \(X_i\) is eval_ratio[‘num’] / eval_ratio[‘denom’] of \(i\)-th step, and \(X_{mean}\) is the average of \(X_i\) after the \(d\)-th step.

This is a reformatted and altered version of a previous implementation in lazy.py by Tom Ichibha. See Ichibha, T., Hongo, K., Maezono, R., Thom, A. J. W., 2019 arXiv:1904.09934 [physics.comp-ph]

Parameters:
  • data (pandas.DataFrame) – Calculation output of a FCIQMC or CCMC calculation.

  • end_it (int) – Last iteration to be considered in blocking.

  • it_key (str) – Key of column containing MC iterations.

  • cols (List[str]) – Ignored here. Keep for common interface.

  • hybrid_col (str) – Column in data to be analysed here, e.g. ‘Inst. Proj. Energy’.

  • start_max_frac (float) – MSER(d) may oscillate when become unreanably small when \(n-d\) is large. Thus, we calculate MSER(\(d\)) for \(d\) < (\(n\) * start_max_frac) and give the optimal estimation of the starting iterations only in this range of \(d\). The default is 0.84.

  • n_blocks (int) – This analysis takes long time when \(n\) is large. Thus, we pick up \(d\) for every ‘n_blocks’ samples, calculate MSER(\(d\)), and decide the optimal estimation of the starting iterations only from these d. The default is 100.

Returns:

starting_it – Iteration from which to start reblocking analysis for this calculation.

Return type:

int

pyhande.error_analysing.find_starting_iteration.select_find_start(key: str)

Select find_starting_iteration function to use.

Parameters:

key (str) – Key linked to find_starting_iteration.

Return type:

Find_starting_iteration function.

pyhande.error_analysing.hybrid_ana

Analyse Monte Carlo correlated output with hybrid analyser.

class pyhande.error_analysing.hybrid_ana.HybridAna(it_key: str, hybrid_col: str, replica_col: str, cols: Optional[List[str]] = None, start_its: Union[List[int], str] = 'mser', end_its: Optional[List[int]] = None, batch_size: int = 1, find_start_kw_args: Optional[Dict[str, Union[bool, float, int]]] = None)

Bases: AbsErrorAnalyser

Analyse ratio observable, such as projected energy.

Can be used instead of Blocker.

This scheme is made by hybridizing two different post-analysis methods, AR model and Straatsma. The former (the latter) is comparatively good at estimating the statistic error for smaller (larger) length of time-series, respectively. This method just picks up the larger statistic error from the ones given by both methods. The mathematical details of both methods are explained in (please cite if you use this):

Ichibha, T., Hongo, K., Maezono, R., Thom, A. J. W., 2019 arXiv:1904.09934 [physics.comp-ph]

classmethod inst_hande_ccmc_fciqmc(start_its: Union[List[int], str] = 'mser', end_its: Optional[List[int]] = None, batch_size: int = 1, find_start_kw_args: Optional[Dict[str, Union[bool, float, int]]] = None)

Return HybridAna instance for a HANDE CCMC/FCIQMC calc.

Parameters:

__init__(). (See) –

Returns:

Instance of the HybridAna class, customised for a HANDE CCMC/FCIQMC calculation.

Return type:

HybridAna

property start_its: List[int]

Access _start_its attribute if available, else error.

property end_its: List[int]

Access _end_its attribute if available, else error.

property opt_block: DataFrame

Access _opt_block attribute if available. Else error.

property no_opt_block: List[List[str]]

Access _no_opt_block attribute if available. Else error.

exe(data: List[DataFrame], observables: Dict[str, str]) None

Do analysis (first finding starting iteration if required).

Parameters:
  • data (List[pd.DataFrame]) – HANDE calculation Monte Carlo output data.

  • observables (Dict[str, str]) – Mapping of column key to column name, e.g. ‘ref_key’: ‘N_0’. The default is None. If any of it_key, cols, eval_ratio, hybrid_col, replica_col were instantiated as ‘obs:key’ to be overwritten with observables[‘key’], observables can’t be None and those keys have to be present.

Raises:

ValueError – If not all columns to be blocked appear in ‘data’ or if the length of ‘data’ is different to length of ‘start_its’ or ‘end_its’ if they are defined.

pyhande.extracting

Classes for extracting metadata and data from output files.

pyhande.extracting.extractor

Extract and merge (meta)data from (multiple) HANDE output files.

class pyhande.extracting.extractor.Extractor(merge: Optional[Dict[str, Union[List[str], str]]] = None)

Bases: AbsExtractor

Extract data/metadata from HANDE output files and merge.

Merge if desired/sensible, e.g. when calculation was restarted. This expands the functionality of extract.py and is more compactly represented as a class.

property out_files: List[str]

Access (read only) out_files property.

Raises:

AttributeError – If data has not been extracted yet, i.e. output files have not been passed yet.

Returns:

List of out_files names the data is extracted from.

Return type:

List[str]

property data: List[DataFrame]

Access (extracted) data property.

Raises:

AttributeError – If data has not been extracted yet.

Returns:

QMC Data. List over merged calculations.

Return type:

List[pd.DataFrame]

property metadata: List[List[Dict]]

Access (extracted) metadata property.

Raises:

AttributeError – If metadata has not been extracted yet.

Returns:

Metadata. List over merged calculations where each element is a list over the metadata of the calculations that got merged.

Return type:

List[List[Dict]]

property calc_to_outfile_ind: List[List[int]]

Map index of calculation to output file.

This maps what HANDE output file the data and metadata belong to. E.g. [[0], [0], [1, 2]] with three output files shows that the first calculations (index 0) contained two calculations and the second and third output file (indices 1 and 2) were merged to the third calculation.

Raises:

AttributeError – If data has not been extracted yet.

Returns:

Outer list has length equal the length of the data/metadata lists and contains list of indices of output files containing them (see above).

Return type:

List[List[int]]

property all_ccmc_fciqmc: bool

Are all calculations extracted either CCMC or FCIQMC.

This will affect what postprocessing can be done.

Raises:

AttributeError – If data has not been extracted yet.

Returns:

True if all calculations extracted are either CCMC or FCIQMC. False if at least one is of another type, such as FCI or Hilbert space estimation.

Return type:

bool

exe(out_files: List[str])

Extract and merge.

The merge code was inspired by an older implementation in deprecated/removed lazy.py file. [todo] Test with calc where a file has more then one calc.

Parameters:

out_files (List[str]) – List of HANDE output filenames to be extracted here.

pyhande.helpers

Helpful generic callables.

pyhande.helpers.simple_callables

Simple, useful callables when selecting. *args are ignored.

pyhande.helpers.simple_callables.do_nothing(*args)

Do nothing.

class pyhande.helpers.simple_callables.RaiseValueError(message)

Bases: object

Raise ValueError with message.

pyhande.results_viewer

Further analysis and data viewing.

pyhande.results_viewer.get_results

Helper functions to run analysis and get results object.

pyhande.results_viewer.get_results.define_objects_common(merge_type: str = 'uuid', analyser: str = 'blocking', start_its: Union[List[int], str] = 'blocking') Tuple[Extractor, PrepHandeCcmcFciqmc, Union[Blocker, HybridAna]]

Create extractor, preparator and analyser with common options.

Parameters:
  • merge_type (str, optional) – how to do merge, ‘uuid’, ‘legacy’ or ‘no. Note that this is different to fuller options when instantiating extractor object directly, by default ‘uuid’.

  • analyser (str, optional) – ‘blocking’ for doing reblocking or ‘hybrid’, by default ‘blocking’

  • start_its (Union[List[int], str], optional) – Either list of integer for start iterations or ‘blocking’ or ‘hybrid’, defining find starting iteration function to use. by default ‘blocking’

Returns:

Instantiated objects for extracting, preparing and analysing data.

Return type:

Tuple[Extractor, PrepHandeCcmcFciqmc, Union[Blocker, HybridAna]]

pyhande.results_viewer.get_results.analyse_data(out_files: List[str], extractor: Extractor, preparator: Optional[PrepHandeCcmcFciqmc] = None, analyser: Optional[Union[Blocker, HybridAna]] = None) Union[Results, ResultsCcmcFciqmc]

Execute objects to extract data, prepare and analyse it.

Parameters:
  • out_files (List[str]) – Output files with data to extract, prepare and analyse.

  • extractor (Extractor) – Instance to extract data from files.

  • preparator (PrepHandeCcmcFciqmc) – Instance to prepare data, e.g. calculate inst. proj. energy or deal with complex/replica tricks. The default is None.

  • analyser (Union[Blocker, HybridAna]) – Instance to analyse data, e.g. blocking. The default is None.

Returns:

Results object to view and further analyse results.

Return type:

Union[Results, ResultsCcmcFciqmc]

pyhande.results_viewer.get_results.get_results(out_files: List[str], merge_type: str = 'uuid', analyser: str = 'blocking', start_its: Union[List[int], str] = 'blocking') Union[Results, ResultsCcmcFciqmc]

Lazy function to combine defining objects and executing them.

Parameters:

analyse_data (see define_objects_common and) –

Return type:

see analyse_data

pyhande.results_viewer.results

Access and investigate generic results from HANDE QMC.

class pyhande.results_viewer.results.Results(extractor: Extractor)

Bases: object

Show and allow investigation of HANDE QMC results.

Extraction has already happened. This is a base class, used for now for all non CCMC and non FCIQMC calculations who use a more specific class.

property extractor: Extractor

Access extractor used to supply these results.

property summary: DataFrame

Access summary.

get_metadata(meta_keys: Union[str, List[str]]) DataFrame

Get part(s) of metadata in pandas DataFrame.

Parameters:

meta_keys (Union[str, List[str]]) – List of metadata items to put into DataFrame. Each item as ‘keyOuter:keyInner:…’, e.g. [‘qmc:tau’, ‘system:ueg:r_s’] adds extractor.metadata[:][‘qmc’][‘tau’] as well as extractor.metadata[:][‘system’][‘ueg’][‘r_s’].

Returns:

Contains metadata requested for all calculations.

Return type:

pd.DataFrame

add_metadata(meta_keys: List[str])

Add metadata to summary. Overwritten in ResultsCcmcFciqmc.

Parameters:

meta_keys (List[str]) – List of metadata to add in strings where different level keys are separated by colons. E.g. [‘qmc:tau’, ‘system:ueg:r_s’] adds extractor.metadata[:][‘qmc’][‘tau’] as well as extractor.metadata[:][‘system’][‘ueg’][‘r_s’] to summary (if they exist).

pyhande.results_viewer.results_ccmc_fciqmc

Access and investigate CCMC/FCIQMC results from HANDE QMC.

class pyhande.results_viewer.results_ccmc_fciqmc.ResultsCcmcFciqmc(extractor: Extractor, preparator: Optional[PrepHandeCcmcFciqmc] = None, analyser: Optional[Union[Blocker, HybridAna]] = None)

Bases: Results

Show CCMC and FCIQMC HANDE results and allow further analysis.

property preparator: Optional[PrepHandeCcmcFciqmc]

Access preparator used to prepare data for analysis.

property analyser: Optional[Union[Blocker, HybridAna]]

Access analyser used to supply the analysed results.

property summary_pretty: DataFrame

Access self._summary but prettify for viewing data.

Combine value in “value/mean” column with “standard error” columns for easy viewing, e.g. ‘0.123(4)’. If not possible, due to type or not present values, fill in value in “value/mean”.

Returns:

Prettified summary table for viewing (not further analysis).

Return type:

pd.DataFrame

compare_obs(observables: List[str]) DataFrame

Compare observables from .summary where obs are columns.

Parameters:

observables (List[str]) – Observables from .summary to compare.

Returns:

DataFrame where easier comparisons are possible.

Return type:

pd.DataFrame

property shoulder: DataFrame

Access shoulder. For now, not hist shoulder [todo].

See J. S. Spencer and A. J. W. Thom (2016), J. Chem. Phys. 144, 084108.

add_shoulder()

Add shoulder to summary. [todo]: allow hist shoulder.

property inefficiency: DataFrame

Access inefficiency.

See W. A. Vigor et al. (2016), J. Chem. Phys. 144, 094110.

add_inefficiency()

Add inefficiency to summary.

add_metadata(meta_keys: List[str])

Overwritten version of Results.add_metadata.

Parameters:

meta_keys (List[str]) – List of metadata to add in strings where different level keys are separated by colons. E.g. [‘qmc:tau’, ‘system:ueg:r_s’] adds extractor.metadata[:][‘qmc’][‘tau’] as well as extractor.metadata[:][‘system’][‘ueg’][‘r_s’] to summary (if they exist).

do_reweighting(max_weight_history: int = 300) None

Do reweighting to check for population bias if done blocking.

For each independent shift value, this shows a graph of weight_history against (weighted) projected energy/ eval_ratio. If the (weighted) projected energies (eval_ratio[‘name’]) do not agree with each other, this is a sign of population control bias. Note that this is only tested if eval_ratio[‘name’] contains the projected energy. See references. Very first implementation credit to Will Vigor.

Parameters:

max_weight_history (int, optional) – The maximum value of weight_history. Weight_history is done in steps of 2**n with 2**n < `max_weight_history. The default is 300.

Raises:
  • TypeError – If analyser is not the blocking analyser.

  • ValueErroreval_ratio not specified when analysing.

References

Umrigar93

C.J. Umrigar et al. (1993), J. Chem. Phys. 99, 2865.

Vigor15

W.A. Vigor, et al. (2015), J. Chem. Phys. 142, 104101.

plot_shoulder(inds: Optional[List[int]] = None, show_shoulder: bool = True, log_scale: bool = True) None

Plot shoulder.

Parameters:
  • inds (List[int]) – Indices of calculations to plot. If None, plot all. The default is None.

  • show_shoulder (bool) – Show positions of shoulder height with vertical lines.

  • log_scale (bool) – Set x and y axis on log scale.