til/cloudrun/tailing-cloud-run-request-logs.md
# Tailing Google Cloud Run request logs and importing them into SQLite
The `gcloud` CLI tool has [the alpha ability to tail log files](https://cloud.google.com/logging/docs/reference/tools/gcloud-logging#live-tailing) - but it's a bit of a pain to setup.
You have to install two extras for it. First, this:
gcloud alpha logging tail
That installs the functionality, but as the documentation will tell you:
> To use `gcloud alpha logging tail`, you need to have Python 3 and the `grpcio` Python package installed.
Assuming you have Python 3, the problem you have to solve is *which Python* is the `gcloud` tool using to run. After digging around in the source code using `cat $(which gcloud)` I spotted the following:
CLOUDSDK_PYTHON=$(order_python python3 python2 python2.7 python)
So it looks like (on macOS at least) it prefers to use the `python3` binary if it can find it.
So this works to install `grpcio` somewhere it can see it:
python3 -m pip install grpcio
Having done that, you can start running commands. `gcloud logging logs list` shows a list of logs:
```
~ % gcloud logging logs list
NAME
projects/datasette-222320/logs/cloudaudit.googleapis.com%2Factivity
projects/datasette-222320/logs/cloudaudit.googleapis.com%2Fdata_access
projects/datasette-222320/logs/cloudaudit.googleapis.com%2Fsystem_event
projects/datasette-222320/logs/cloudbuild
projects/datasette-222320/logs/clouderrorreporting.googleapis.com%2Finsights
projects/datasette-222320/logs/cloudtrace.googleapis.com%2FTraceLatencyShiftDetected
projects/datasette-222320/logs/run.googleapis.com%2Frequests
projects/datasette-222320/logs/run.googleapis.com%2Fstderr
projects/datasette-222320/logs/run.googleapis.com%2Fstdout
projects/datasette-222320/logs/run.googleapis.com%2Fvarlog%2Fsystem
```
Then you can use `gcloud alpha logging tail projects/datasette-222320/logs/run.googleapis.com%2Frequests` to start logging. Only you also need a `CLOUDSDK_PYTHON_SITEPACKAGES=1` environment variable so that `gcloud` knows to look for the `grpcio` dependency.
CLOUDSDK_PYTHON_SITEPACKAGES=1 \
gcloud alpha logging tail projects/datasette-222320/logs/run.googleapis.com%2Frequests
The default format is verbose YAML. A log entry looks like this:
```yaml
httpRequest:
latency: 0.123684963s
remoteIp: 66.249.69.240
requestMethod: GET
requestSize: '510'
requestUrl: https://www.niche-museums.com/browse/museums.json?_facet_size=max&country=United+States&_facet=osm_city&_facet=updated&_facet=osm_suburb&_facet=osm_footway&osm_city=Santa+Cruz
responseSize: '6403'
serverIp: 142.250.125.121
status: 200
userAgent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
insertId: 611171fe000a38a469d59595
labels:
instanceId: 00bf4bf02dab164592dbbb9220b56c3ce64cb0f1c1f37812d1d61e851a931e9964ba539c2ede42886773c82662cc28aa858749d2697f537ff7a61e7b
service: niche-museums
logName: projects/datasette-222320/logs/run.googleapis.com%2Frequests
receiveTimestamp: '2021-08-09T18:20:46.935658405Z'
resource:
labels:
configuration_name: niche-museums
location: us-central1
project_id: datasette-222320
revision_name: niche-museums-00039-sur
service_name: niche-museums
type: cloud_run_revision
severity: INFO
timestamp: '2021-08-09T18:20:46.669860Z'
trace: projects/datasette-222320/traces/306a0d6e7e055ba66172003a74c926c2
```
I decided to import into a SQLite database so I could use [Datasette](https://datasette.io/) to analyze the log files (hooray for facets).
Adding `--format json` switches the output to JSON - but it's a pretty-printed array of JSON objects, something like this:
```json
[
{
"httpRequest": {
"latency": "0.112114537s",
"remoteIp": "40.77.167.88",
"requestMethod": "GET",
"requestSize": "534",
"requestUrl": "https://datasette.io/content/repos?forks=0&_facet=homepage&_facet=size&_facet=open_issues&open_issues=3&size=564&_sort=readme_html",
"responseSize": "72757",
"serverIp": "216.239.38.21",
"status": 200,
"userAgent": "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
},
"insertId": "6111722f000b5b4c4d4071e2",
"labels": {
"instanceId": "00bf4bf02d1d7fe4402c3aff8a34688d9a910e6ee6d2545ceebc1edefb99461481e6d9f9ae8de4e907e3d18b98ea9c7f57b2abb527c8857d9163ed193db766c349a1ee",
"service": "datasette-io"
},
"logName": "projects/datasette-222320/logs/run.googleapis.com%2Frequests",
"receiveTimestamp": "2021-08-09T18:21:36.061693305Z",
"resource": {
"labels": {
"configuration_name": "datasette-io",
"location": "us-central1",
"project_id": "datasette-222320",
"revision_name": "datasette-io-00416-coy",
"service_name": "datasette-io"
},
"type": "cloud_run_revision"
},
"severity": "INFO",
"timestamp": "2021-08-09T18:21:35.744268Z",
"trace": "projects/datasette-222320/traces/016d640caf845fbf8709486bc8dff9c7"
}
]
```
I want to stream the logs into `sqlite-utils` using [newline-delimited JSON](https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-newline-delimited-json) since that can insert while the data is still being tailed.
I ended up using two new `jq` recipes:
cat example.json | jq -cn --stream 'fromstream(1|truncate_stream(inputs))'
This turns an `[{"array": "of objects"}, {"like": "this one"}]` into a stream of newline-delimited objects. I [found the recipe here](https://github.com/stedolan/jq/issues/1984#issuecomment-568918146) - I don't understand it.
As you can see above, the objects are nested. I want them as flat objects so that `sqlite-utils insert` will create a separate column for each nested value. I used [this recipe](https://til.simonwillison.net/jq/flatten-nested-json-objects-jq) for that.
The end result was this:
```
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud alpha logging tail \
projects/datasette-222320/logs/run.googleapis.com%2Frequests \
--format json \
| jq -cn --stream 'fromstream(1|truncate_stream(inputs))' \
| jq -c '[leaf_paths as $path | {
"key": $path | join("_"), "value": getpath($path)
}] | from_entries' \
| sqlite-utils insert /tmp/logs.db logs - --nl --alter --batch-size 1
```
That last line inserts the data into the `/tmp/logs.db` database file. `--nl` means "expect newline-delimited JSON", `--alter` means "add new columns if they are missing", `--batch-size 1` means "commit after every record" (so I can see them in Datasette while they are streaming in).
**UPDATE:** [sqlite-utils 3.15](https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-15) added a `--flatten` option which you can use instead of that second `jq` recipe, so this should work instead:
```
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud alpha logging tail \
projects/datasette-222320/logs/run.googleapis.com%2Frequests \
--format json \
| jq -cn --stream 'fromstream(1|truncate_stream(inputs))' \
| sqlite-utils insert /tmp/logs.db logs - --nl --alter --batch-size 1 --flatten
```
The resulting schema looks like this (via `sqlite-utils schema /tmp/logs.db`):
```sql
CREATE TABLE [logs] (
[httpRequest_latency] TEXT,
[httpRequest_remoteIp] TEXT,
[httpRequest_requestMethod] TEXT,
[httpRequest_requestSize] TEXT,
[httpRequest_requestUrl] TEXT,
[httpRequest_responseSize] TEXT,
[httpRequest_serverIp] TEXT,
[httpRequest_status] INTEGER,
[httpRequest_userAgent] TEXT,
[insertId] TEXT,
[labels_instanceId] TEXT,
[labels_service] TEXT,
[logName] TEXT,
[receiveTimestamp] TEXT,
[resource_labels_configuration_name] TEXT,
[resource_labels_location] TEXT,
[resource_labels_project_id] TEXT,
[resource_labels_revision_name] TEXT,
[resource_labels_service_name] TEXT,
[resource_type] TEXT,
[severity] TEXT,
[timestamp] TEXT,
[trace] TEXT,
[httpRequest_referer] TEXT
);
```
Then I ran `datasette /tmp/logs.db` to start exploring the logs. Faceting by `resource_labels_service_name` was particularly useful.
<img width="1340" alt="Screenshot of logs in Datasette" src="https://user-images.githubusercontent.com/9599/128755995-fab7e478-82a5-4d80-a959-f89f7dd39209.png">
The `httpRequest_latency` column contains text data that looks like `0.012572683s` - thankfully if you cast it to a `float` the trailing `s` will be ignored. Here's an example query showing the services with the highest average latency:
```sql
select
resource_labels_service_name,
avg(cast(httpRequest_latency as float)) as avg_latency,
count(*)
from
logs
group by
resource_labels_service_name
order by
avg_latency desc
```
## Using the Logs explorer
Alternatively, you can use the Google Cloud logs explorer! It has pretty decent faceted search built in.
Here's a query showing results from that log file:
```
resource.type="cloud_run_revision"
log_name="projects/datasette-222320/logs/run.googleapis.com%2Frequests"
```
Run that at https://console.cloud.google.com/logs/query - or here's a link I can use to execute that directly (for the last 7 days): https://console.cloud.google.com/logs/query;query=resource.type%3D%22cloud_run_revision%22%0Alog_name%3D%22projects%2Fdatasette-222320%2Flogs%2Frun.googleapis.com%252Frequests%22;timeRange=P7D;?project=datasette-222320