| Title: | R Client for Survey160 Data |
|---|---|
| Description: | Access Survey160 campaign data from R. Reads campaign results from Google Cloud Storage, triggers fresh exports via the Survey160 API, and computes per-campaign recipient-latency reports as in-memory R objects. Persistence and fleet orchestration live in consumer projects (see survey160-shiny). |
| Authors: | Lennon Shimokawa [aut, cre] |
| Maintainer: | Lennon Shimokawa <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.17.0 |
| Built: | 2026-06-11 00:25:00 UTC |
| Source: | https://github.com/survey160/survey160r |
Pure function. Derives flow.questions from the CSV column names
via campaign_discover_questions() and assembles the rest of the
config from the named arguments. No I/O, no API call, no auth precondition.
campaign_build_config( campaign_id, data, field_timezone = "UTC", project_id = NULL, date_filter = NULL, respondent_id_column = NULL )campaign_build_config( campaign_id, data, field_timezone = "UTC", project_id = NULL, date_filter = NULL, respondent_id_column = NULL )
campaign_id |
Campaign id (numeric or character). |
data |
A data frame of CSV results (or a character vector of column names) used to discover the question flow. |
field_timezone |
Tz used to bucket the Parquet |
project_id |
Optional Survey160 project id; defaults to the campaign id as a placeholder. |
date_filter |
Optional character/Date vector restricting which
survey dates are processed (interpreted in |
respondent_id_column |
Optional column name used to dedupe rows by
respondent. Default |
A config list ready to pass to campaign_report(), which
calls campaign_validate_config() before consuming it.
Hashes a canonical form of the config so the same logical config always produces the same hash even across different YAML serializations.
campaign_config_hash(config)campaign_config_hash(config)
config |
The config list. |
A hex sha256 string.
Scans the column names of an in-memory CSV data frame (as returned by
read.csv / s160_gcs_pull_csv) for id.<q>.scriptDate
columns and returns the question ids in their original column order.
Terminal flow states (refusal, ineligible) are dropped so
the result is usable directly as config$flow$questions.
campaign_discover_questions(data)campaign_discover_questions(data)
data |
A data frame or character vector of column names. |
Survey160 v2 CSV headers are emitted as id[<q>]scriptDate on disk;
read.csv converts the brackets to dots. Both forms are accepted so
callers can pass either a data frame or a character vector of raw header
tokens.
A character vector of question ids in flow order.
Pure function: same (data, config) always yields identical output.
Implements the algorithm in campaign_scripts.md §2.
campaign_report(data, config, run_at = NULL)campaign_report(data, config, run_at = NULL)
data |
A data frame with one row per respondent and the per-question
timestamp columns named |
config |
Config list from |
run_at |
Optional |
A list with consolidated (one row per
(campaign_id, date, hour_local, segment, threshold_min)),
latency_frame (one row per respondent x segment),
diagnostics (counts and breakdowns per spec §3.3), and
meta (algorithm_version, config_hash, run_at_utc).
Analyst-facing one-campaign runner. Given an in-memory campaign CSV
(already read by the caller), (optionally) builds the report config
from the CSV header and runs the latency algorithm. No I/O; pair
with s160_gcs_pull_csv() for the GCS source path or
read.csv() / readr::read_csv() / anything else for
off-GCS sources.
campaign_run( campaign_id, data, config = NULL, run_at = NULL, run_by = NULL, ... )campaign_run( campaign_id, data, config = NULL, run_at = NULL, run_by = NULL, ... )
campaign_id |
Campaign id (numeric or character). |
data |
In-memory campaign CSV as a data frame (one row per
respondent, columns named |
config |
Optional pre-built config. When |
run_at |
Optional |
run_by |
Optional string stamped on every row's |
... |
Forwarded to |
Two call shapes:
Convenience – omit config and pass any
campaign_build_config() overrides through ....
campaign_run() derives the config from the CSV header.
Custom – pre-build the config with
campaign_build_config() (mutating as needed) and pass
it via config. ... must be empty in that case
(passing both errors).
Provenance: if data carries source_csv_hash or
source_csv_path attributes (set by s160_gcs_pull_csv
for GCS reads), campaign_report() surfaces them on
result$meta. Analysts pulling CSVs from other sources can
attach the attributes themselves before calling, e.g.
attr(df, "source_csv_path") <- "dropbox:campaign_1234.csv" attr(df, "source_csv_hash") <- paste0( "sha256:", digest::digest(file = local_path, algo = "sha256"))
The list returned by campaign_report():
consolidated, latency_frame, diagnostics,
meta (with source_csv_hash and
source_csv_path from data's attributes, or NA
when absent).
## Not run: # GCS source -- pair with s160_gcs_pull_csv(). s160_gcs_init(bucket = "campaign_results") data <- s160_gcs_pull_csv(1980) result <- campaign_run(1980, data, field_timezone = "America/New_York") result$meta$source_csv_hash # Off-GCS source -- bring your own CSV. data <- read.csv("~/Dropbox/campaign_1980.csv", stringsAsFactors = FALSE) result <- campaign_run(1980, data) # Custom config (mutate before running). config <- campaign_build_config(1980, data, field_timezone = "America/New_York") config$flow$questions <- c("intro", "q1_custom") campaign_run(1980, data, config = config) ## End(Not run)## Not run: # GCS source -- pair with s160_gcs_pull_csv(). s160_gcs_init(bucket = "campaign_results") data <- s160_gcs_pull_csv(1980) result <- campaign_run(1980, data, field_timezone = "America/New_York") result$meta$source_csv_hash # Off-GCS source -- bring your own CSV. data <- read.csv("~/Dropbox/campaign_1980.csv", stringsAsFactors = FALSE) result <- campaign_run(1980, data) # Custom config (mutate before running). config <- campaign_build_config(1980, data, field_timezone = "America/New_York") config$flow$questions <- c("intro", "q1_custom") campaign_run(1980, data, config = config) ## End(Not run)
Implements the fail-fast checks from spec §2.4. Aborts with a named error on the first failing rule.
campaign_validate_config(config, data)campaign_validate_config(config, data)
config |
The config list (typically from
|
data |
The data frame the report will run against. |
Invisible TRUE on success; otherwise stops with an error.
Returns the (dot-form) column names campaign_report() touches for a
given config: the per-question scriptDate/batchDate
set, the population-filter columns (extracted from
config$filters$population), the campaign-id and optional
respondent-id columns, plus the fixed non-flow support columns
(id.intro.finalText, web_complete,
id.ineligible.scriptDate).
required_csv_columns(config, available = NULL)required_csv_columns(config, available = NULL)
config |
A config list from |
available |
Optional character vector of the actual (dot-form) column
names present in the file (e.g. from |
Some columns the report reads have data-dependent names – the close-message
Text columns (id.close*.scriptText/batchText) that
detect_survey_mode() greps to tell t2w_external from
sms. Pass available (e.g. the result of s160_csv_header())
so these are matched against the real header and retained; omitting it risks
projecting them away and misclassifying a t2w_external campaign as
sms.
Pass the result as columns = to s160_read_csv() /
s160_gcs_pull_csv() to parse only the columns the algorithm needs –
the projection yields output identical to a full read.
A character vector of unique dot-form column names.
## Not run: header <- s160_csv_header(path) config <- campaign_build_config(1980, header, field_timezone = "America/New_York") data <- s160_read_csv(path, columns = required_csv_columns(config, header)) ## End(Not run)## Not run: header <- s160_csv_header(path) config <- campaign_build_config(1980, header, field_timezone = "America/New_York") data <- s160_read_csv(path, columns = required_csv_columns(config, header)) ## End(Not run)
One entry point, addressed by environment name so the base URL, the campaign-results GCS bucket, and the API key are resolved together and cannot be mismatched. It authenticates and returns a connection – an opaque handle holding the JWT, credentials, base URL, environment name, and paired bucket. How you use the return value gives single- or multi-environment behaviour from the same call:
s160_api_auth(env = c("prod", "staging"))s160_api_auth(env = c("prod", "staging"))
env |
Environment name: |
Single environment: ignore the return value. The call also
refreshes the package's default connection, so subsequent
s160_api_campaign_results / s160_api_campaign_get
calls with no conn use it. s160_api_auth(); df <-
s160_api_campaign_results(744).
Both environments at once: capture each connection and pass
it as conn =. prod <- s160_api_auth("prod"); stg <-
s160_api_auth("staging"), then s160_api_campaign_results(744, conn
= stg). Each connection is independent, so prod and staging can be held
live in the same session – e.g. to compare a campaign across both.
Credentials come from ~/.Renviron (never typed into code): the user ID
from S160_API_USERID, and the API key from a per-environment variable
– S160_STAGING_API_KEY for staging, and S160_PROD_API_KEY (or,
as a fallback, the legacy S160_API_KEY) for prod. Missing values prompt
once on an interactive run and are saved.
The in-session JWT refresh (tokens expire after 10 minutes; re-auth at 8) reuses the credentials stored on the connection, so a staging connection held alongside prod keeps refreshing against staging rather than the default.
A connection object (an environment) to pass as conn, returned
invisibly. As a side effect, the package's default connection is updated to
this one so conn-less calls use the most recent authentication.
## Not run: # Single environment -- ignore the return, use the default connection: s160_api_auth() # defaults to prod df <- s160_api_campaign_results(744) # Both environments at once -- capture each, pass conn =: s160_gcs_init(bucket = "campaign_results") # one GCS auth covers all buckets prod <- s160_api_auth("prod") stg <- s160_api_auth("staging") df_prod <- s160_api_campaign_results(744, conn = prod) df_stg <- s160_api_campaign_results(744, conn = stg) ## End(Not run)## Not run: # Single environment -- ignore the return, use the default connection: s160_api_auth() # defaults to prod df <- s160_api_campaign_results(744) # Both environments at once -- capture each, pass conn =: s160_gcs_init(bucket = "campaign_results") # one GCS auth covers all buckets prod <- s160_api_auth("prod") stg <- s160_api_auth("staging") df_prod <- s160_api_campaign_results(744, conn = prod) df_stg <- s160_api_campaign_results(744, conn = stg) ## End(Not run)
Wraps the Survey160 API endpoint GET /campaigns/<campaign_id>, which
returns every column on the campaigns table for one campaign. Useful
for confirming attributes after a state-changing call (for example, reading
archive_scheduled_date after scheduling an archive) without dropping
to direct database access.
s160_api_campaign_get(campaign_id, conn = NULL)s160_api_campaign_get(campaign_id, conn = NULL)
campaign_id |
Campaign ID (numeric or character). |
conn |
Connection to use. Defaults to the package's default connection
(the most recent |
Enriched, API-only fields returned by the endpoint
(listlength, list, login, exports,
has_texting_started, sandbox_configuration, aggregator,
has_assigned_registration) are dropped; the result mirrors the
campaigns table only. JSON-valued columns (script, prompt,
quotas, ...) come back as length-1 list-columns holding the parsed
structure.
The endpoint runs several server-side subqueries on each call; this is a per-campaign read, not appropriate for tight loops over hundreds of IDs. A batch variant would need a backend extension and is out of scope.
A single-row data frame. Scalar columns are scalar; ISO-8601
timestamp columns are coerced to POSIXct in UTC; JSON columns
are list-columns of length 1.
## Not run: s160_api_auth() info <- s160_api_campaign_get(2107) info$active info$script[[1]] # parsed JSON ## End(Not run)## Not run: s160_api_auth() info <- s160_api_campaign_get(2107) info$active info$script[[1]] # parsed JSON ## End(Not run)
Triggers a fresh campaign results export, polls GCS until the file is
updated, and returns the results as a data frame. Requires both API auth
(s160_api_auth) and GCS auth (s160_gcs_init).
s160_api_campaign_results( campaign_id, filter_open = FALSE, timeout = 300, poll_interval = 5, destdir = NULL, conn = NULL, ... )s160_api_campaign_results( campaign_id, filter_open = FALSE, timeout = 300, poll_interval = 5, destdir = NULL, conn = NULL, ... )
campaign_id |
Campaign ID (numeric or character). |
filter_open |
Logical. Exclude open/uncontacted conversations?
Default |
timeout |
Timeout in seconds for export completion. Default 300. |
poll_interval |
Maximum polling interval in seconds. Default 5. Polling uses exponential backoff starting at the smaller of 2s and this value, capped at this value. |
destdir |
Directory to save the downloaded CSV. |
conn |
Connection to use. Defaults to the package's default connection
(the most recent |
... |
Additional arguments passed to |
A data frame with one row per survey response.
## Not run: s160_gcs_init(bucket = "campaign_results") s160_api_auth() df <- s160_api_campaign_results(1980) df <- s160_api_campaign_results(1980, filter_open = TRUE, timeout = 600) # Compare the same campaign across two environments concurrently: prod <- s160_api_auth("prod") stg <- s160_api_auth("staging") df_prod <- s160_api_campaign_results(744, conn = prod) df_stg <- s160_api_campaign_results(744, conn = stg) ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") s160_api_auth() df <- s160_api_campaign_results(1980) df <- s160_api_campaign_results(1980, filter_open = TRUE, timeout = 600) # Compare the same campaign across two environments concurrently: prod <- s160_api_auth("prod") stg <- s160_api_auth("staging") df_prod <- s160_api_campaign_results(744, conn = prod) df_stg <- s160_api_campaign_results(744, conn = stg) ## End(Not run)
Peeks the first line of a CSV and returns its column names in the same
make.names()-munged (dot-form) form the readers produce, without
parsing the body. Pair with campaign_build_config() +
required_csv_columns() to derive a column-projection set for a
large file before reading it:
s160_csv_header(path, encoding = "UTF-8")s160_csv_header(path, encoding = "UTF-8")
path |
Path to the CSV. |
encoding |
File encoding for the header peek ( |
header <- s160_csv_header(path) config <- campaign_build_config(id, header, field_timezone = tz) data <- s160_read_csv(path, columns = required_csv_columns(config))
Character vector of dot-form column names.
Returns the file names inside a campaign's folder in the results bucket.
Returns character(0) with a message if the campaign has no files.
s160_gcs_campaign_results_files(campaign_id, bucket = NULL)s160_gcs_campaign_results_files(campaign_id, bucket = NULL)
campaign_id |
Campaign ID (numeric or character). Must be a single value. |
bucket |
Source GCS bucket. |
Character vector of file names (without the campaign_id prefix).
## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_files(1980) ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_files(1980) ## End(Not run)
Returns a sorted character vector of campaign IDs (top-level folder names) in the results bucket. Objects at the bucket root (not inside a folder) are excluded.
s160_gcs_campaign_results_list(bucket = NULL)s160_gcs_campaign_results_list(bucket = NULL)
bucket |
Source GCS bucket. |
Character vector of campaign IDs, sorted.
## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_list() ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_list() ## End(Not run)
Downloads the CSV from GCS and reads it into R. By default, the file is
downloaded to a temporary location and cleaned up automatically. Set
destdir to keep a local copy.
s160_gcs_campaign_results_read( campaign_id, filename = NULL, destdir = NULL, bucket = NULL, columns = NULL, ... )s160_gcs_campaign_results_read( campaign_id, filename = NULL, destdir = NULL, bucket = NULL, columns = NULL, ... )
campaign_id |
Campaign ID (numeric or character). Must be a single value. |
filename |
File name in the campaign folder. Defaults to
|
destdir |
Directory to save the downloaded file. When |
bucket |
Source GCS bucket. |
columns |
Optional character vector of (dot-form) column names to keep,
e.g. from |
... |
Additional arguments forwarded to the CSV reader
( |
GCS path: gs://<bucket>/<campaign_id>/<filename>
A data frame with one row per survey response.
## Not run: s160_gcs_init(bucket = "campaign_results") df <- s160_gcs_campaign_results_read(1980) df <- s160_gcs_campaign_results_read(1980, destdir = ".") df <- s160_gcs_campaign_results_read(1980, destdir = "~/data") ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") df <- s160_gcs_campaign_results_read(1980) df <- s160_gcs_campaign_results_read(1980, destdir = ".") df <- s160_gcs_campaign_results_read(1980, destdir = "~/data") ## End(Not run)
Returns GCS file metadata for the campaign's export file without
triggering a new export. Requires GCS auth (s160_gcs_init).
s160_gcs_campaign_results_status(campaign_id, bucket = NULL)s160_gcs_campaign_results_status(campaign_id, bucket = NULL)
campaign_id |
Campaign ID (numeric or character). |
bucket |
Source GCS bucket. |
Named list with name, updated, and size,
or NULL if no export file exists.
## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_status(1980) ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") s160_gcs_campaign_results_status(1980) ## End(Not run)
Authenticates to GCS using the Survey160 Desktop OAuth client and sets the global bucket.
s160_gcs_init(bucket)s160_gcs_init(bucket)
bucket |
GCS bucket name (e.g. |
On first run, prompts for the client secret (get it from your team lead) and saves
it to ~/.Renviron. Subsequent runs read it automatically. Also
opens a browser for Google sign-in on first use; the OAuth token is
cached in a platform-dependent directory (run
gargle::gargle_oauth_sitrep() to locate it).
The authenticated Google account needs Storage Object Viewer permission on the target bucket.
Invisible NULL. Sets global bucket as side effect.
## Not run: s160_gcs_init(bucket = "campaign_results") ## End(Not run)## Not run: s160_gcs_init(bucket = "campaign_results") ## End(Not run)
Thin wrapper over s160_gcs_campaign_results_read that also
computes a sha256 of the downloaded CSV bytes. The hash and the
canonical gs:// path travel back on the returned data frame
as the source_csv_hash and source_csv_path
attributes; campaign_report() reads them and copies them onto
result$meta so downstream consumers (e.g. persistence layers)
don't have to fish them off attributes.
s160_gcs_pull_csv(campaign_id, filename = NULL, bucket = NULL, columns = NULL)s160_gcs_pull_csv(campaign_id, filename = NULL, bucket = NULL, columns = NULL)
campaign_id |
Campaign id (numeric or character). |
filename |
Optional override for the CSV filename. |
bucket |
Source GCS bucket. |
columns |
Optional character vector of (dot-form) column names to keep
(e.g. from |
A data frame with attributes source_csv_hash and
source_csv_path set.
Local-source sibling of s160_gcs_pull_csv(). Reads the CSV
via data.table::fread (falling back to utils::read.csv)
and stamps source_csv_hash and source_csv_path
attributes on the returned data frame so downstream
campaign_report() / campaign_run() surface them on
result$meta. Use for backfills (archived campaign CSVs stored
on disk, Dropbox, S3 mounts, etc.).
s160_read_csv(path, columns = NULL, hash = TRUE, ...)s160_read_csv(path, columns = NULL, hash = TRUE, ...)
path |
Path to the CSV. Recorded verbatim on
|
columns |
Optional character vector of (dot-form) column names to keep
(e.g. from |
hash |
When |
... |
Forwarded to the CSV reader ( |
A data frame with source_csv_hash and
source_csv_path attributes set.
## Not run: data <- s160_read_csv("~/Dropbox/archive/campaign_500.csv") attr(data, "source_csv_hash") campaign_run(500, data, field_timezone = "America/New_York") ## End(Not run)## Not run: data <- s160_read_csv("~/Dropbox/archive/campaign_500.csv") attr(data, "source_csv_hash") campaign_run(500, data, field_timezone = "America/New_York") ## End(Not run)