Shane A. Corwin and Paul Schultz
A simple way to estimate bid-ask spreads from daily high and low prices
Journal of Finance | Vol 67, Issue 2 (Apr 2012), 719–759

Our paper derives and tests a new way to estimate bid-ask spreads from high and low prices. The estimator is simple to compute and accurate, allowing it to be used in a variety of research contexts. The idea behind the estimator is simple. As shown by Beckers (JB 1983) and Parkinson (JB 1980), the expected value of the log of the high-low price ratio is proportional to the standard deviation of the true value of the security. However, in the presence of bid-ask spreads, the highest transaction price over a trading day will be a buyer-initiated trade at the ask price and the lowest transaction price over a trading day will be a seller-initiated trade at the bid price. As a result, the expected value of the high-low price ratio is a function of the standard deviation and the bid-ask spread. To disentangle the spread and variance portions of the high-low price range, we calculate the sum of the squared log price ranges over two consecutive days,   beta=  sum_{j=0}^1  left[ ln  left(  frac{H_{t+j}^O}{L_{t+j}^O}  right)  right]^2 ;/var/tmp/iawltxhtml/mathcache//udisplaymath5768c010ee42c785c3d36474d70e1317.svg and the squared log of the two-day price range   gamma=  sum_{j=0}^1  left[ ln  left(  frac{H_{t,t+j}^O}{L_{t,t+j}^O}  right)  right]^2 ;/var/tmp/iawltxhtml/mathcache//udisplaymath5776a5ea13083b0f7c5ae6799e192904.svg where HiO is the observed high price on day i and LiO is the observed low price on day i. The sum of the log price ratios over two days contains twice the daily variance and twice the bid-ask spread. The log price ratio for the two-day period contains twice the daily variance, but only one bid-ask spread. Making use of previous work on high-low price ratios, we can set up two equations to solve for two unknowns: the security's standard deviation and its bid-ask spread. These equations can be solved numerically. Alternatively, if we ignore Jensen's inequality, we obtain a closed-form solution for the bid-ask spread (S), as follows:  S =  frac{2 sdot(e^{ alpha-1})}{1+e^{ alpha}} , ;/var/tmp/iawltxhtml/mathcache//udisplaymathd62b56cd3c26b69d0ab7dfa6ba59fa6b.svg where   alpha =  frac{ sqrt{2 sdot beta} -  sqrt{ beta}}{3-2 sdot sqrt{2}}  ;- ;  sqrt{ frac{ gamma}{3-2 sdot sqrt{2}}} ;/var/tmp/iawltxhtml/mathcache//udisplaymathb60f48b72136d1109c6f0c3fa87a7a1d.svg (Simulation results suggest that this simplification has little impact on the performance of the estimator. At the same time, the resulting closed-form solutions lead to a substantial reduction in estimation complexity. It is also important to note that this derivation produces an estimate of the standard deviation, in addition to the bid-ask spread. See the original paper for details.) Using these equations, spread estimates can be obtained for each consecutive two-day period. We find that averaging two-day estimates across a month produces reasonably accurate spread estimates for U.S. common stocks. Averaging across longer periods may reduce sampling error, but assumes that the spread and volatility are constant over the longer window.

This technique for estimating spreads can be used with high and low prices over any time interval, not just trading days. In fact, because variances increase with the length of the interval while spreads do not, the signal-to-noise ratio from intraday periods is greater than that from daily periods. We also note that the use of the high-low ratio does not require that trades be reported in the correct sequence, only that high and low prices are reported in the correct time interval.

Complications in using the estimator in practice

The high-low spread estimator relies on the assumption that the variance over a two-day period is equivalent to the sum of two-consecutive single day variances. One reason this assumption may be violated is that markets for most securities are closed overnight. Hence the high-low ratio over a two-day period includes the overnight variance, while the single-day ratios do not. In our work with CRSP data, we find that a simple adjustment for overnight returns works well. Specifically, if the low on the second day exceeds the close on the first day, we assume the price rose overnight by the difference between the close and the low, and vice-versa, and adjust the day 2 high and low by this estimated price change.

There are other reasons why the two day variance may differ from the sum of two one day variances. Infrequent trading of some securities may cause the observed high-low range to be narrower than the true high-low range. In extreme cases, the high and low price for a day can be equal if a security trades only once or only a handful of times during a day. For some assets, price limits may also result in two day variances that are more than twice as large as one day variances. Our Journal of Finance paper discusses ways to deal with these complications.

It is important to note that the high-low estimator may capture price pressure in addition to the bid-ask spread. Specifically, if the high price results from a large trade that executes above the quoted ask (and vice-versa), this price pressure will be captured by the high-low spread estimator. Price pressure effects may be particularly important during illiquid periods or periods when quoted depths are very small.

Results

The high-low spread estimator can be used for any market and any time period for which high and low trade prices are available. One important advantage of the estimator is that it can be used to obtain transaction cost estimates during historical periods for which intraday trade and quote data are not available. To illustrate this potential application, we examine high-low spread estimates during the period from the Great Depression through World War II. Another important use of the estimator is to estimate transaction costs during periods when intraday data are available, but are difficult to use. The increase over time in the use of computerized trading systems has resulted in a substantial increase in the number of quotes and a corresponding increase in the size of quote databases. This data proliferation makes handling quote data difficult and may also lead to significant problems with matching trades to quotes. In this setting, the high-low spread estimator provides a simple estimate of transaction costs that does not require the use of quote data. To illustrate this application, we examine high-low spread estimates during the recent financial crisis.

For both periods, we use daily high and low prices from CRSP to calculate monthly spread estimates for all available exchange-listed common stocks. We categorize stocks into quintiles based on market capitalization, where cutoffs are defined at the beginning of each month based on NYSE breakpoints. Panel A of Figure 1 shows the average monthly high-low spread estimates across all stocks in each quintile.

As shown in Figure 1, transaction costs rose sharply during the Great Depression. Spreads begin to rise in October 1929 and exhibit a sharp spike from mid-1932 through mid-1933. Average spreads rise as high as 30% for the smallest quintile of stocks and to more than 8% for the middle size quintile. This figure illustrates that the market exhibited a sharp decrease in liquidity during the Great Depression that continued through much of World War II. The high-low estimator provides researchers with a simple and accurate means to study transaction costs during historical periods, such as the Great Depression, where intraday data are not available.

Panel B of Figure 1 plots mean transaction costs during the recent financial crisis. For all size quintiles, spreads begin to rise in August of 2007, peaking from about October 2008 through March 2009. Average spreads for the smallest quintile reach as high as 4.0%. However, increased transaction costs are also evident for the largest stocks, with spreads for these stocks reaching over 2.0%. Spreads appear to decrease by late 2009, though there are sharp increases around the “Flash Crash” in May 2010 and again in August 2011. Notably, the high-low spread estimator allows us to study these patterns without making use of any intraday quote data.

1: Estimated bid-ask spreads
figcs1
figcs2
The figure plots average high-low spread estimates across all available common stocks, where stocks are categorized into quintiles based on market capitalization using NYSE breakpoints. Panel A graphs spread estimates through the Great Depression and World War II. Panel B graphs spread estimates through the financial crisis.

As a simple illustration of the estimator's performance, we examine the time-series correlation between market-wide average spread measures based on the high-low estimator and intraday TAQ data. (In our original paper, we provide a detailed analysis of the accuracy and performance of the high-low spread measure relative to alternative transaction cost proxies. We find that the high-low spread estimator generally dominates other low frequency spread estimators at capturing both the cross-section and time-series of individual stock spreads. We also find that the estimator works best for small, illiquid stocks and during time periods when the minimum tick size is wide.) The spread measure from TAQ is a monthly average of daily time-weighted NBBO quoted spreads. For both measures, we calculate an equal weighted average each month across all available NYSE, Amex, and Nasdaq listed common stocks. Table 1 reports summary statistics and time-series correlations between the market-wide measures for the period from 1993 through 2011 and for various subperiods.

Across the full sample period, the mean (median) high-low spread is 2.18% (2.06%). This compares to a mean (median) TAQ quoted spread of 2.74% (2.54%). This suggests that the high-low estimator slightly underestimates the market-wide quoted spread across the full sample period. The estimator captures the time-series variation in market-wide quoted spreads very well, with a full-period correlation of 0.978. The subperiod results suggest that the high-low estimator underestimates TAQ quoted spreads in the 1990s when then minimum tick size was $0.125 and overestimates quoted spreads during the financial crisis. The overestimation in latter period may reflect, in part, the increased price pressure effects that are captured by the high-low estimator but are not reflected in the quoted spread. As a whole though, the high-low spread estimator provides an accurate and simple way to estimate spreads from daily data.

1: Summary statistics for market-wide spread measures
Number Time-Series TAQ Spread High-Low Spread
Period of Months Correlation Mean Median Mean Median
1993–2011 228 0.978 2.74 2.54 2.18 2.06
1993–2000 96 0.972 4.44 4.69 3.09 3.07
2001–2006 72 0.988 1.64 1.14 1.51 1.22
2007–2011 60 0.972 1.33 1.19 1.52 1.39
The table provides summary statistics for market-wide spread estimates based on intraday TAQ data and the high-low spread estimator. Monthly time-weighted quoted spreads based on intraday TAQ data and monthly high-low spreads are estimated for each NYSE, Nasdaq, and Amex listed common stock as described in Corwin-Schultz (JF 2012). Market-wide spreads are then defined each month as an equal weighted average across all available securities.

11-raphael1
Raphael (1843–1520): Sistine Madonna. Italian Renaissance, 1502.. The hesitant-yet-confident Madonna in this painting is believed to be Margherita, Raphael's mistress, who posed for at least six of Raphael's Madonnas. This painting hangs in Dresden, Germany, since the 18th century. From 1945 to 1955, it was in the Soviet Communist hands before it was returned to the Demokratische Deutsche Republik. They probably couldn't agree whether she was a counter-revolutionary, anyway.