One of the biggest challenges in the geospatial Big-Data analytics space is translating the results generated from a varying sample of mobile devices into insights about the full population. AirSage has developed the most efficient extrapolation methodologies to do so. This is done by maximizing and validating the correlation to independent sources such as updated census data, high-quality traffic counts, and attendance reports.
Sourced data is normalized and archived in AirSage’s Big Data system in a secure and accessible format. Irrespectively of the final use case, proprietary pre-processing is run on the data. This includes some unique features such as:
AirSage supports the ingestion of data from multiple different data providers (publishers, 1st party data providers and aggregators) and has also evaluated other providers that we don’t support.
Our experience is that the current data we use is among the largest panel with the most sufficiently high-quality devices for us to be able to select a large enough sample of a consistently high enough quality so that we can adjust for things like variable sample sizes.
We select our sample using a per device abstract monthly metric that measures both the visibility and mobility of each device to ensure that we have a sample of devices that behave consistently.
Our metric was defined by staff that also worked with telecom data, which offered better visibility than app data.
This is a key differentiator between us and competitors. Much of this is IP and, therefore, cannot be expanded upon.
With more than a decade of experience with sourcing various types of anonymous location data (carrier data, connected car data, fleet data, smartphone data, and more), and 5 years specifically in sourcing App data, AirSage has developed a unique skill in sourcing the best available data and building an optimal data panel.
Nearly all data available in the open market for large scale sourcing has been evaluated and considered by AirSage to enter its panel. Each such candidate passes a thorough and efficient evaluation process that ultimately reveals its data volume, coverage, uniqueness, and multiple other quality metrics, all relevant for the AirSage analytics use cases.
Data that has been chosen to enter the panel goes through similar ongoing evaluation to make sure that the highest quality standards are also kept through time. Data feeds that fail to maintain such standards are removed from the panel.
AirSage cleanses the data we use on ingest. We apply point types to sightings and various other important metadata for our individual product processing. Further, we don’t use bid-stream data like other providers.
We provide our output as CSV files for maximum compatibility with our customer’s systems.
Our customers can convert our output into their own GeoJson datasets to use it with Kepler and Superset. These do support the ability to import CSV data into the database to which it’s connected.
Our data can easily be imported as attribute data to be joined with standard Census shapefiles generically, allowing the use of your preferred GIS suite.
We can accept Shapefiles, GeoJson, and delimited text files with WKT or Hexified WKB.
We control sample bias by having a diverse data panel to get a better representation of all people. Our data panel includes tens of millions of unique devices and is comprised of apps in every bucket. After receiving the aggregated data, we implement an accuracy metric and device quality score to exclude some noise. Some things we take into consideration:
There are some other known biases that would be hard to avoid, such as age bias when looking at usage during particular times of the day (waking/sleeping hours typically vary depending on age). There could also be a vacation bias that may increase activity when one is on vacation compared with regular daily activities. Another possible bias would be income bias where more affluent areas may have more devices (i.e., people from affluent areas may have more than 1 device).
We discern user behavior, for example, at home/work vs. moving through a reported point vs. at a stationary location.
This is not an issue for us. AirSage uses the mobile advertiser ID to uniquely identify devices. AirSage’s data is coalesced at the device level, so we do not distinguish between different apps or SDKs.
We like to also consider “home” and “work” locations as “daytime” and “evening” locations. These locations are based on where devices ping the most during the daytime and late evening.
Total Devices counts distinct devices present at the location of interest during the reporting period. Total Sightings counts the total number of individual records produced by all devices present at the location of interest during the reporting period.
Yes, either by drawing a new POI or by uploading them during study creation.
We do. However this is possible only in the context of a Destination Location Analysis (DLA) study. DLA used mainly by the travel, tourism and hospitality industries.
It’s possible to easily extract data in CSV format.
We support Shapefile (.zip) and GeoJSON (.json, .geojson).