How To Show Mrbayes Pp Values?
https://www.biostars.org/p/44350/
The new version of MrBayes outputs a single .con.tre file with the consensus tree. The posterior probabilities are located within the nexus comments. Most programs, not being aware of the new MrBayes format, will not display the posterior probabilities (TreeView is quite an old program). I tend to use FigTree for most of my tree visualisation, and according to the manual (page 34) it supports this format. One way to work around this, if you wish to use different tree viewing software, is to enter the command conformat=simple in order to revert to the more simple form of tree output (see page 35 of the manual). This will output two trees; one with branch lengths and support values and one with only branch lengths.
**Steve Moss**
Ronquist (http://www.molecularevolution.org/molevolfiles/presentations/RonquistBayesianInference_LabI_2011-07-27.pdf)
: If you use ModelTest or MrModelTest: Do not fix parameters in MrBayes
Save time by running the analysis without heating if it works for your analysis
**Checking you runs (not in order of importance)
1. Check if burnin size is appropriate (using Tracer, look at trace of Likelihood; careful because Tracer uses generations not sampled generations, the latter is the unit entered into MrBayes' burnin)
2. Check plot of the generation versus the log probability (generated by sump, within MrBayes)
3. Check Average standard deviation of split frequencies, when performing multiple simultaneous MCMCs. "Below 0.01 is very good indication of convergence, while values between 0.01 and 0.05 may be adequate depending on the purpose of your analysis."
4. Check the
a. Acceptance rates for the moves in the "cold" chain
b. Chain swap information
As a rough rule of thumb, an efficient Metropolis-Hastings MCMC sampler will have
acceptance rates somewhere in the range 10 % to 70 %. If the acceptance rate is too high,
then the proposal mechanism is making too modest suggestions of new states. If the rate
is too low, on the other hand, then the proposals are too bold. Both situations are likely to
lead to slow mixing, which means that you will have to run the chain longer to obtain
convergence. The acceptance rates can be changed by modifying the tuning parameters: Swapfreq, Nswaps, Nchains, and Temp. (low swap fre » lower temp parameter » increase likelihood of swap
**If you have difficulties with convergence:
- Change relative proposal probabilities or tuning parameters. The default is Swapfreq=1, resulting in Nswaps (default=1) swaps being tried each generation of the run. If Swapfreq is set to 10, then Nswaps swaps will be tried every tenth generation of the run.
- If you use heated chains and there are few swaps between chains, try to lower the temperature coefficient which will increase the likelihood of a swap. By default, Temp = 0.1
- Increase the number of heated chains
- Run the analysis longer
- Make the model more realistic
- Start with randomly perturbed good trees
**Setting the priors
There are six types of parameters in the model:
1) the topology,
2) the branch lengths,
3) the four stationary frequencies of the nucleotides,
4) the six different nucleotide substitution rates,
5) the proportion of invariable sites, and
6) the shape parameter of the gamma distribution of rate variation.
The default priors in MrBayes work well for most analyses. The default prior probability density is a flat Dirichlet (all values are 1.0) for both Revmatpr and Statefreqpr. This is appropriate if we want to estimate these parameters from the data assuming no prior knowledge about their values. It is possible to fix the rates and nucleotide frequencies but this is generally not recommended. However, it is occasionally necessary to fix the nucleotide frequencies to be equal, for instance in specifying the JC and SYM models. This would be achieved by typing prset statefreqpr=fixed(equal).
*Note that MrBayes does not support some submodels of the GTR model.
The Shapepr parameter determines the prior for the shape parameter of the gamma
distribution of rate variation. Default setting: a uniform distribution
spanning a wide range of values. The prior for the proportion of invariable sites is set
with Pinvarpr. The default setting is a uniform distribution between 0 and 1, an
appropriate setting if we don’t want to assume any prior knowledge about the proportion
of invariable sites. For topology, the default Uniform setting for the Topologypr parameter puts equal probability on all distinct, fully resolved topologies. The Brlenspr parameter can either be set to unconstrained or clock-constrained. For trees without a molecular clock (unconstrained) the branch length prior can be set either to exponential or uniform. *The default exponential prior with parameter 10.0 should work well for most analyses.
* However, recent work has shown that in some circumstances, MrBayes can produce trees with artificially-long branch lengths (Brown et al., in press). In these cases, one would need to modify the branch length prior to put more emphasis on shorter branch lengths. A good way to check for abnormally long branch lengths is to compare ML tree lengths with those output from your Bayes analysis-they should be similar.
—-
The sump command will first generate a plot of the generation versus the log probability of the data (the log likelihood values). If we are at stationarity, this plot should look like "white noise", that is, there should be no tendency of increase or decrease over time. The plot should look something like this:
+------------------------------------------------------------+ -5718.96
| 2 12 |
| 2 |
| 1 2 2 2 |
| 1 22 1*1 2 22 2 1 2 |
| 2 2 2 1 2 2 2 2 1 |
| 11 1 2 1 2 2 2 2 2 2 |
| 1 2 1 1 12 1 1 * 2 2 |
| 1 11 2 * 2 1 1 2 2 1 *|
| *2 2 1 22 2 1 211 22 |
| 2 1 1 1 11 1 22 1 |
|* 1 2 2 1 1 2 1* |
| 2 1 1 1 |
| 1 111 2 1 1 1 |
| 22 1 1 |
| 1 1 |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ -5729.82
^ ^
2500 10000
If you see an obvious trend in your plot, either increasing or decreasing, you probably need to run the analysis longer to get an adequate sample from the posterior probability distribution.
CHOOSING PARAMETERS FOR MCMC
when choosing use mcmcp (instead of mcmc) which does not start the analysis.
TEMP
lowering temp parameter approximates the temperature of each chain, increases probability of swaps
The effect of the heating (lowering the 'temp' parameter) is to flatten out the posterior probability, such that the heated chains more easily find isolated peaks in the posterior distribution and can help the cold chain move more rapidly between these peaks.
Chain information:
ID -- Heat
-----------
1 -- 1.00 (cold chain)
2 -- 0.83
3 -- 0.71
4 -- 0.63
Heat = 1 / (1 + T * (ID - 1))
(where T = 0.20 is the temperature and ID is the chain number)
ID -- Heat
-----------
1 -- 1.00 (cold chain)
2 -- 0.91
3 -- 0.83
4 -- 0.77
Heat = 1 / (1 + T * (ID - 1))
(where T = 0.10 is the temperature and ID is the chain number)
Acceptance rates for the moves in the "cold" chain:
With prob. Chain accepted changes to
30.74 \% param. 1 (revmat) with Dirichlet proposal
10.98 \% param. 2 (state frequencies) with Dirichlet proposal
33.33 \% param. 3 (gamma shape) with multiplier
16.24 \% param. 4 (topology and branch lengths) with LOCAL
16.28 \% param. 4 (topology and branch lengths) with extending TBR
25.11 \% param. 5 (branch lengths) with nodeslider
[this output comes out in MrBayes window]
As a rough rule of thumb, an efficient Metropolis-Hastings MCMC sampler will have
acceptance rates somewhere in the range 10 % to 70 %. If the acceptance rate is too high,
then the proposal mechanism is making too modest suggestions of new states. If the rate
is too low, on the other hand, then the proposals are too bold. Both situations are likely to
lead to slow mixing, which means that you will have to run the chain longer to obtain
convergence. The acceptance rates can be changed by modifying the tuning parameters
using the props command. The topology proposals are difficult, however, and one often
has to be satisfied with relatively low acceptance rates for these.
1 2 3 4
--------------------------
1 | 0.48 0.28 0.14
2 | 1644 0.58 0.32
3 | 1716 1680 0.56
4 | 1589 1680 1691
look at values in the "major diagonal", eg. .48 .58 .56
The bottom "row" of the upper diagonal contains the acceptance rates for the swaps
between chains separated by only one heating step. Again, a rough rule of thumb is that
these acceptance rates should lie in the range 10 % to 70 %. If the acceptance rates are
too low, you can try to lower the temperature difference; if the rates are too high, the
temperature should be increased instead.
The seed is the same for each MrBayes window. When comparing different runs, open new windows for each, or set a new seed.
when fine-tunning the length of a ideal chain, and the necessary burnin, use a large sample freq, as this frees up memory.
Differences between version 2.01 and 3.0
If you are new to version 3, note that there are several important changes between version 2 and version 3 of MrBayes. A change affecting most users is the new way of specifying models using 'lset' and 'prset', which more consistently separates the specification of model structure (lset) from the specification of priors (prset). Type 'help lset' and 'help prset' in the program to get detailed information about the current options; you can also discover some of the new features by examining the example files.
Data partitions
Another important change is a new mechanism for specifying data partitions. This allows you to specify very complex mixed models for heterogeneous data, for instance datasets including both morphology and molecules or different types of genes. The new mechanism also means that the site specific model of rate variation is set differently in version 3 than in version 2. In version 3, you set a model of site specific rates by invoking a rate multiplier spanning the relevant partitions, which first need to be defined with the 'partition' command and invoked by the 'set' command. For more info, see 'help prset' under 'Ratepr'.
From bglobin.nex
[The following block illustrates how to set up two data partitions
and use different models for the different partitions.]
charset non_coding = 1-90 358-432;
charset coding = 91-357;
partition region = 2:non_coding,coding;
set partition = region;
[The following lines set a codon model for the second data partition (coding) and
allows the non_coding and coding partitions to have different overall rates.]
lset applyto=(2) nucmodel=codon;
prset ratepr=variable;
From cynmix.nex
Begin data;
Dimensions ntax=32 nchar=3246;
Format datatype=mixed(Standard:1-166,DNA:167-3246) interleave=yes gap=- missing=?;
Matrix
begin mrbayes;
[This block sets up several different partitions that could be used in the analysis
of this dataset]
outgroup Ibalia;
charset morphology = 1-166;
charset molecules = 167-3246;
charset COI = 167-1244;
charset COI_1st = 167-1244\3;
charset COI_2nd = 168-1244\3;
charset COI_3rd = 169-1244\3;
charset EF1a = 1245-1611;
charset EF1a_2nd = 1245-1611\3;
charset EF1a_3rd = 1246-1611\3;
charset EF1a_1st = 1247-1611\3;
charset LWRh = 1612-2092;
charset LWRh_2nd = 1612-2092\3;
charset LWRh_3rd = 1613-2092\3;
charset LWRh_1st = 1614-2092\3;
charset 28S = 2093-3246;
charset 28S_Stem = 2160-2267 2361-2401 2489-2528 2539-2565 2577-2647 2671-2760 2768-2827 2848-3194 3220-3246;
charset 28S_Loop = 2093-2159 2268-2360 2402-2488 2529-2538 2566-2576 2648-2670 2761-2767 2828-2847 3195-3219;
partition Names = 5: morphology, COI, EF1a, LWRh, 28S;
partition Nopart = 2: morphology, molecules;
partition Morph_mito_nucl_ribo = 4: morphology, COI, EF1a LWRh, 28S;
[ note lack of comma between EF1a and LWRh, which thus are in same partition, corresponding to nuclear genes]
partition Extreme = 12: morphology, COI_1st, COI_2nd, COI_3rd, EF1a_2nd, EF1a_3rd, EF1a_1st, LWRh_2nd, LWRh_3rd, LWRh_1st, 28S_Stem, 28S_Loop;
end;
begin mrbayes;
[The following lines set up a model in which all four genes have their unique GTR + gamma
+ propinv model]
set partition=Names;
prset ratepr=variable;
lset applyto=(2,3,4,5) nst=6 rates=invgamma;
unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all);
BEGIN MRBAYES;
charset morph= 1-326;
charset ribRNA= 327-509;
charset ribgaps = 510-538;
charset RAG1 = 539-2068;
Lset applyto=(1) coding=variable rates=gamma;
Lset applyto=(2) Nst=6 rates=invgamma;
Lset applyto=(3) coding=variable rates=gamma;
Lset applyto=(4) Nst=6 rates=invgamma;
ctype ord:58 117 131 142 146 196 208 211 220 223 232 237 256 258 264
270 284 295 323 326;
ctype unord: 1-57 59-116 118-130 132-141 143-145 147-195 197-207
209-210 212-219 221-222 224-231 233-236 238-255 257 259-263 265-269
271-283 285-294 296-322 324-325;
unlink shape=(all) revmat=(all) statefreq=(all);
mcmc ngen=500 printfreq=100 samplefreq=100 nchains=4 savebrlens=yes;
END;
Functions not yet implemented
The current release of MrBayes is beta 4. Some functions are not yet implemented in this beta, notably topological constraints and relaxed clock models. Although you can define constraints, the program will not take the constraints into account in the calculations and you may get spurious results if you try to enforce constraints.
New features in beta 4
You can now unlink topology. Default priors have been changed for branch lengths from uniform to exponential(10). For rate ratio parameters like the rate multiplier, transition/transversion ratio, and revmat rates, we now consistently use a Dirichlet / beta prior. These changes were introduced to give more reasonable posteriors for over-parameterized models. MrBayes 3 allows you to create very complex models easily and this has inspired some users to introduce many more parameters than can possibly be estimated from their data. Carefully chosen priors can save you from some over-parameterization but we urge you to use common sense and not introduce more parameters into your models than supported by your data. Always check that the posteriors for all model parameters are reasonable. A 'report' command has been added, which allows you to control the format in which rate ratio parameters are reported. Important changes and additions have been made to amino acid and nucleotide models. Several important bugs have been corrected; although they are unlikely to affect most analyses, we encourage you to confirm results obtained with previous beta versions using beta 4.
BURNIN
Determine after how many generations the graph becomes stationary. The burnin value is that number of generations divided by 50 (since only every 50th generation was sampled; i.e. the burnin value roughly is equal to the number of rows - not quite because there is a header). To more accurately determine the burnin, you need to rescale the Y-axis.
The burnin can also be detected using sump command:
sump filename=Fitch_HA.nex;
The result of the latter command is an ASCII graph of the number of generations vs. log-likelihood
Type "sumt filename=testseq1c.t burnin=20", where you need to substitute burnin value with the number you obtained in the previous step of the exercise (Note: that burnin values is the # of generations divided by 50, since we sample every 50th generation). This command creates a consensus tree, and shows posterior probabilities of the clades.
Load Fitch_HA.nex.p into Excel. (You need to load the data into 2 sheets, since Excel does not allow to load more than 256 columns per sheet. To load the data, create a new Excel spreadsheet. Go to Data->Get External Data->Import Text File, and load first 256 columns.
If you use a German version of Microsoft, in the import wizard's section on Format, select advanced and then select the correct decimal point symbol, i.e. ".".
Go to a separate sheet, and repeat "Get External Data" command to load the rest of the data, you need to block out (=select =make black) and exclude the first 256 columns). This file contains information for posterior probabilities for each site (columns) at each sampled generation (rows).