

You may specify more than one wildcard in old and in new. Janstat to stat, injanstat to instat, and subjan to sub ? means exactly one character, ? means exactly two characters, etc * in new corresponds with * in old and stands for the text that * in old matched. * means that zero or more characters go here. * in old selects the variables to be renamed. Rename all variables starting firm with Firmġ.

There is a number of keywords that may be used in this way, e.g., mean, median, std, min, max, or pctile, p(#) (for other values than the median). This is easily accomplished, provided that a variable indicates to which household each person belongs: In such a situation, you may wish to compute the household income, i.e., the sum of all individual incomes in each household. However, a common situation is that you have data which were collected on households, and in each household all adult persons were interviewed concerning, e.g., their individual income. Of course, this is something you will not normally wish to do.

Note that each case (=row) in the dataset will have the same value in this variable, to wit, the total of all incomes. Will compute the sum of income over the entire dataset and will store the result in a new variable called tinc. Some Stata commands that may be useful for data transformation do not relate to a single row of the data, but rather to the dataset in its entirety. But you may also build it into the by prefix, as in:īysort country: some Stata commmand(s) Data transformation If this is not the case, you may use the sort command prior to executing the command beginning with by. Note, however, that this presupposes that the data are sorted by "country". Whatever is achieved by "some Stata command(s)" is accomplished separately for all groups defined by variable "country".

The general form to deal with by is to use it as a prefix. It is most useful for data transformations, but of course it may also be used to do analyses by subgroups. To do something not on the entire dataset, but rather on subgroups, keyword by is used. Multiple Imputation: Analysis and Pooling Steps.Confidence Intervals with ci and centile.Changing the Look of Lines, Symbols etc.
