Many resources are needed to download a project. Please understand that we have to compensate our server costs. Thank you in advance. Project price only 1 $
You can buy this project and download/modify it how often you want.
package com.conveyal.r5.analyst;
import com.amazonaws.services.sqs.AmazonSQS;
import com.amazonaws.services.sqs.AmazonSQSClient;
import com.amazonaws.services.sqs.model.MessageAttributeValue;
import com.amazonaws.services.sqs.model.SendMessageRequest;
import com.conveyal.r5.analyst.cluster.GridRequest;
import com.conveyal.r5.analyst.cluster.Origin;
import com.conveyal.r5.api.util.LegMode;
import com.conveyal.r5.profile.FastRaptorWorker;
import com.conveyal.r5.profile.PerTargetPropagater;
import com.conveyal.r5.profile.StreetMode;
import com.conveyal.r5.streets.LinkedPointSet;
import com.conveyal.r5.streets.StreetRouter;
import com.conveyal.r5.transit.TransportNetwork;
import gnu.trove.iterator.TIntIntIterator;
import gnu.trove.list.TIntList;
import gnu.trove.list.array.TIntArrayList;
import gnu.trove.map.TIntIntMap;
import org.apache.commons.math3.random.MersenneTwister;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.Base64;
import java.util.stream.DoubleStream;
/**
* Computes an accessibility indicator at a single cell in a Web Mercator grid, using destination densities from
* a Web Mercator density grid. Both grids must be at the same zoom level. This class computes accessibility given median
* travel time (described below and in Conway, Byrd and van Eggermond 2017).
*
* The accessibility is calculated using
* median travel time to each destination (there are plans to change this to allow use of arbitrary percentiles in the future). In order
* to facilitate probabilistic comparison of scenarios, many accessibility values are returned, representing the
* sampling distribution of the accessibility. The first value is the value computed using all the Monte Carlo frequency
* draws with equal weight (i.e. it is our best point estimate).
*
* Subsequent values are produced by bootstrapping the iterations (combinations of a departure minute and a Monte Carlo draw) to simulate
* the sampling distribution. For each value returned, we select
* _with replacement_ a different set of Monte Carlo draws to use to approximate the effects of repeating the analysis
* with completely new Monte Carlo draws. This yields the same number of Monte Carlo draws, but some of the draws from
* the original analysis are not present, and some are duplicated. Sounds crazy but it works.
*
* First, some terminology: a bootstrap replication is the statistic of interest defined on one bootstrap sample,
* which is a collection of values sampled with replacement from the original Monte Carlo draws.
* For each of the bootstrap samples we compute, we define which Monte Carlo draws should be included. We want
* to choose the Monte Carlo draws included in a particular bootstrap sample here, rather than choosing different
* Monte Carlo draws at each destination. The most intuitive explanation for this is that this matches the
* way our accessibility indicator is defined: each Monte Carlo draw is used at every destination, rather than
* computing separate Monte Carlo draws for each destination. Sampling Monte Carlo draws independently at
* each destination causes insufficient variance in the resulting sampling
* distribution. Extreme values of travel time tend to be correlated across many destinations; a particular Monte
* Carlo draw is likely to affect the travel time to all destinations reached by it simultaneously; in the most
* extreme case, a change in the Monte Carlo draw on the line serving the origin could make the whole network
* reachable, or not, within the travel time threshold.
*
* This leads us to a more theoretical justification. One of the tenets of the bootstrap is that the
* bootstrap samples be independent (Efron and Tibshirani 1993, 46). In situations where the data are not
* independent and identically distributed, a number of techniques, e.g. the moving block bootstrap, have been developed
* (see Lahiri 2003) Most of these methods
* consist of changes to the bootstrap sampling technique to ensure the samples are independent, and the dependence
* structure of the data is wrapped up within a sample. While none of the off-the-shelf approaches for dependent
* data appear to be helpful for our use case, since we know (or can speculate) the correlation properties of our
* data, we can come up with a sampling technique. Since there is a dependence among destinations (many or all
* being affected by the same Monte Carlo draw), we use the same bootstrap samples (consisting of Monte Carlo draws)
* to compute the travel times to each destination.
*
* There is also dependence in the departure minutes; Monte Carlo draws from the same departure minute will have
* more similar travel times and therefore accessibility values than those from different departure minutes due to the deterministic and constant effect of the
* scheduled network. Adjacent departure minutes will additionally have correlated travel times and accessibility values
* because transit service is similar (you might have to wait one more minute to catch the same bus).
* There are also likely periodic effects at the transit service frequency (e.g. 15 minutes,
* 30 minutes, etc.). In testing in Atlanta we did not find ignoring this dependence to be an issue, Patching this up is
* fairly simple. Rather than drawing n Monte Carlo draws with replacement for
* each bootstrap sample, we draw n / m draws for each departure minute, where n is the number of Monte Carlo draws done in
* the original analysis, and m is the number of departure minute. That is, we ensure that the number of Monte Carlo draws
* using a particular departure minute is identical in each bootstrap sample and the original, non-bootstrapped point
* estimate. This also means that all the variation in the sampling distributions is due to the effect of frequency-based
* lines, rather than due to variation due to departure time (which is held constant).
*
* We initially did not think that the change to sample from within the departure minutes to be that significant (see
* http://rpubs.com/mattwigway/bootstrap-dependence). However, it in fact is. It produces narrower sampling distributions
* that result entirely from the Monte Carlo simulation of frequency lines, without mixing in systematic variation that
* comes from variation due to different departure times. This also should yield p-values that are more correct; because
* there is no systematic influence from scheduled lines, 5% of the values should in fact be found statistically significant
* when testing the same scenario (which suggests we should probably be using a higher significance threshold, since we are
* doing thousands of computations; see Wasserstein and Lazar 2016, "The ASA's statement on p-values,"
* http://dx.doi.org/10.1080/00031305.2016.1154108, and Ioanndis 2005, "Why Most Published Research Findings are False",
* http://dx.doi.org/10.1371/journal.pmed.0020124.
*
* For more information on the bootstrap and its application to the computation of accessibility given median accessibility,
* see
*
* Efron, Bradley, and Robert J Tibshirani. An Introduction to the Bootstrap, Boca Raton, FL: Chapman and Hall/CRC, 1993.
* Lahiri, S. N. Resampling Methods for Dependent Data, New York: Springer, 2003.
* Conway, M. W., Byrd, A. and van Eggermond, M. "A Statistical Approach to Comparing Accessibility Results: Including
* Uncertainty in Public Transport Sketch Planning," paper presented at the 2017 World Symposium of Transport and Land
* Use Research, Brisbane, Queensland, Australia, Jul 3-6.
*
* The results are placed on an Amazon SQS queue for collation by a GridResultConsumer and a GridResultAssembler.
*/
public class GridComputer {
private static final Logger LOG = LoggerFactory.getLogger(GridComputer.class);
/** The number of bootstrap replications used to bootstrap the sampling distribution of the percentiles */
public static final int N_BOOTSTRAP_REPLICATIONS = 1000;
/** SQS client. TODO: async? */
private static final AmazonSQS sqs = new AmazonSQSClient();
private static final Base64.Encoder base64 = Base64.getEncoder();
private final GridCache gridCache;
public final GridRequest request;
private static WebMercatorGridPointSetCache pointSetCache = new WebMercatorGridPointSetCache();
public final TransportNetwork network;
public GridComputer(GridRequest request, GridCache gridCache, TransportNetwork network) {
this.request = request;
this.gridCache = gridCache;
this.network = network;
}
public void run() throws IOException {
final Grid grid = gridCache.get(request.grid);
// ensure they both have the same zoom level
if (request.zoom != grid.zoom) throw new IllegalArgumentException("grid zooms do not match!");
// use the middle of the grid cell
request.request.fromLat = Grid.pixelToCenterLat(request.north + request.y, request.zoom);
request.request.fromLon = Grid.pixelToCenterLon(request.west + request.x, request.zoom);
// Use the extent of the opportunity density grid as the destinations; this avoids having to convert between
// coordinate systems, and avoids including thousands of extra points in the weeds where there happens to be a
// transit stop. It also means that all data will be included in the analysis even if the user is only analyzing
// a small part of the city.
WebMercatorGridPointSet destinations = pointSetCache.get(grid);
// TODO recast using egress mode
StreetMode accessMode = LegMode.getDominantStreetMode(request.request.accessModes);
StreetMode directMode = LegMode.getDominantStreetMode(request.request.directModes);
if (request.request.transitModes.isEmpty()) {
// non-transit search
LinkedPointSet linkedDestinations = destinations.link(network.streetLayer, directMode);
StreetRouter sr = new StreetRouter(network.streetLayer);
sr.timeLimitSeconds = request.request.maxTripDurationMinutes * 60;
sr.streetMode = directMode;
sr.profileRequest = request.request;
sr.setOrigin(request.request.fromLat, request.request.fromLon);
sr.dominanceVariable = StreetRouter.State.RoutingVariable.DURATION_SECONDS;
sr.route();
int offstreetWalkSpeedMillimetersPerSecond = (int) (request.request.getSpeed(directMode) * 1000);
int[] travelTimes = linkedDestinations.eval(sr::getTravelTimeToVertex, offstreetWalkSpeedMillimetersPerSecond).travelTimes;
double accessibility = 0;
for (int y = 0, index1d = 0; y < grid.height; y++) {
for (int x = 0; x < grid.width; x++, index1d++) {
if (travelTimes[index1d] < request.cutoffMinutes * 60) {
accessibility += grid.grid[x][y];
}
}
}
finish(new int[] { (int) accessibility }); // no uncertainty so no bootstraps
} else {
// always walk at egress
LinkedPointSet linkedDestinationsEgress = destinations.link(network.streetLayer, StreetMode.WALK);
// if the access mode is also walk, the link function will use its cache to return the same linkedpointset
// NB Should use direct mode but then we'd have to run the street search twice.
if (!request.request.directModes.equals(request.request.accessModes)) {
LOG.warn("Disparate direct modes and access modes are not supported in analysis mode.");
}
LinkedPointSet linkedDestinationsDirect = destinations.link(network.streetLayer, accessMode);
// Transit search, run the raptor algorithm to get times at each destination for each iteration
LOG.info("Maximum number of rides: {}", request.request.maxRides);
LOG.info("Maximum trip duration: {}", request.request.maxTripDurationMinutes);
// first, find the access stops
StreetRouter sr = new StreetRouter(network.streetLayer);
sr.distanceLimitMeters = 2000; // TODO hardwired same as traveltimesurfacecomputer
sr.streetMode = accessMode;
sr.profileRequest = request.request;
sr.setOrigin(request.request.fromLat, request.request.fromLon);
sr.dominanceVariable = StreetRouter.State.RoutingVariable.DISTANCE_MILLIMETERS;
sr.route();
TIntIntMap reachedStops = sr.getReachedStops();
// convert millimeters to seconds
int millimetersPerSecond = (int) (request.request.getSpeed(accessMode) * 1000);
for (TIntIntIterator it = reachedStops.iterator(); it.hasNext(); ) {
it.advance();
it.setValue(it.value() / millimetersPerSecond);
}
FastRaptorWorker router = new FastRaptorWorker(network.transitLayer, request.request, reachedStops);
// Run the raptor algorithm
int[][] timesAtStopsEachIteration = router.route();
// compute bootstrap weights, see comments in Javadoc detailing how we compute the weights we're using
// the Mersenne Twister is a fast, high-quality RNG well-suited to Monte Carlo situations
MersenneTwister twister = new MersenneTwister();
// This stores the number of times each Monte Carlo draw is included in each bootstrap sample, which could be
// 0, 1 or more. We store the weights on each iteration rather than a list of iterations because it allows
// us to easily construct the weights s.t. they sum to the original number of MC draws.
int[][] bootstrapWeights = new int[N_BOOTSTRAP_REPLICATIONS + 1][router.nMinutes * router.monteCarloDrawsPerMinute];
Arrays.fill(bootstrapWeights[0], 1); // equal weight to all observations for first sample
for (int bootstrap = 1; bootstrap < bootstrapWeights.length; bootstrap++) {
for (int minute = 0; minute < router.nMinutes; minute++) {
for (int draw = 0; draw < router.monteCarloDrawsPerMinute; draw++) {
int iteration = minute * router.monteCarloDrawsPerMinute + twister.nextInt(router.monteCarloDrawsPerMinute);
bootstrapWeights[bootstrap][iteration]++;
}
}
}
// the minimum number of times a destination must be reachable in a single bootstrap sample to be considered
// reachable.
int minCount = (int) (router.nMinutes * router.monteCarloDrawsPerMinute * (request.travelTimePercentile / 100d));
// Do propagation of travel times from transit stops to the destinations
int offstreetTravelSpeedMillimetersPerSecond = (int) (request.request.getSpeed(accessMode) * 1000);
int[] nonTransitTravelTimesToDestinations =
linkedDestinationsDirect.eval(sr::getTravelTimeToVertex, offstreetTravelSpeedMillimetersPerSecond).travelTimes;
PerTargetPropagater propagater =
new PerTargetPropagater(timesAtStopsEachIteration, nonTransitTravelTimesToDestinations, linkedDestinationsEgress, request.request, request.cutoffMinutes * 60);
// store the accessibility results for each bootstrap replication
double[] bootstrapReplications = new double[N_BOOTSTRAP_REPLICATIONS + 1];
// The lambda will be called with a boolean array of whether the target is reachable within the cutoff at each
// Monte Carlo draw. These are then bootstrapped to create the sampling distribution.
propagater.propagate((target, reachable) -> {
int gridx = target % grid.width;
int gridy = target / grid.width;
double opportunityCountAtTarget = grid.grid[gridx][gridy];
// as an optimization, don't even bother to compute the sampling distribution at cells that contain no
// opportunities.
if (opportunityCountAtTarget < 1e-6) return;
// index the Monte Carlo iterations in which the destination was reached within the travel time cutoff,
// so we can skip over the non-reachable ones in bootstrap computations.
// this improves computation speed (verified)
TIntList reachableInIterationsList = new TIntArrayList();
for (int i = 0; i < reachable.length; i++) {
if (reachable[i]) reachableInIterationsList.add(i);
}
int[] reachableInIterations = reachableInIterationsList.toArray();
boolean isAlwaysReachableWithinTravelTimeCutoff =
reachableInIterations.length == timesAtStopsEachIteration.length;
boolean isNeverReachableWithinTravelTimeCutoff = reachableInIterations.length == 0;
// Optimization: only bootstrap if some of the travel times are above the cutoff and some below.
// If a destination is always reachable, it will perforce be reachable always in every bootstrap
// sample, so there is no need to compute the bootstraps, and similarly if it is never reachable.
if (isAlwaysReachableWithinTravelTimeCutoff) {
// this destination is always reachable and will be included in all bootstrap samples, no need to do the
// bootstrapping
// possible optimization: have a variable that persists between calls to this lambda and increment
// that single value, then add that value to each replication at the end; reduces the number of additions.
for (int i = 0; i < bootstrapReplications.length; i++)
bootstrapReplications[i] += grid.grid[gridx][gridy];
} else if (isNeverReachableWithinTravelTimeCutoff) {
// do nothing, never reachable, does not impact accessibility
} else {
// This origin is sometimes reachable within the time window, do bootstrapping to determine
// the distribution of how often
for (int bootstrap = 0; bootstrap < N_BOOTSTRAP_REPLICATIONS + 1; bootstrap++) {
int count = 0;
for (int iteration : reachableInIterations) {
count += bootstrapWeights[bootstrap][iteration];
}
// TODO sigmoidal rolloff here, to avoid artifacts from large destinations that jump a few seconds
// in or out of the cutoff.
if (count > minCount) {
bootstrapReplications[bootstrap] += opportunityCountAtTarget;
}
}
}
});
// round (not cast/floor) these all to ints.
int[] intReplications = DoubleStream.of(bootstrapReplications).mapToInt(d -> (int) Math.round(d)).toArray();
finish(intReplications);
}
}
private void finish (int[] samples) throws IOException {
// now construct the output
// these things are tiny, no problem storing in memory
ByteArrayOutputStream baos = new ByteArrayOutputStream();
new Origin(request, samples).write(baos);
// send this origin to an SQS queue as a binary payload; it will be consumed by GridResultConsumer
// and GridResultAssembler
SendMessageRequest smr = new SendMessageRequest(request.outputQueue, base64.encodeToString(baos.toByteArray()));
smr = smr.addMessageAttributesEntry("jobId", new MessageAttributeValue().withDataType("String").withStringValue(request.jobId));
sqs.sendMessage(smr);
}
}