Getting started

The CIM App Histories toolkit analyses Android APKs for digital-methods research: what an app declares (permissions, activities, locales), which A/B-testing SDKs it ships, what AI/ML runtimes and models it carries, and how data may flow from device inputs (microphone, camera, files) through those components to network endpoints. It is built to run the same way on a laptop and on an HPC cluster, over one app or a corpus of thousands.

The toolkit is alpha software under active development; commands and output fields may change between versions. Every output record carries the toolkit version that produced it, so results remain traceable.

Installation

Requires Python 3.10 or later.

git clone https://github.com/iaine/app_histories.git
cd app_histories
pip install -e .

This installs the runtime dependencies (androguard 4.x, pandas, networkx, loguru) and the cim-apps command. Two optional extras exist: pip install -e ".[viz]" adds the plotting stack (matplotlib, numpy, Pillow — deliberately not installed by default so cluster environments stay free of GUI toolkits), and ".[dev]" adds pytest and ruff for contributors.

Check the install:

cim-apps --version

For containerised installs (recommended on HPC), see Running at scale.

First run

Point a workflow at one APK and an output directory:

cim-apps metadata --apk MyApp.apk --outdir results/

This writes results/MyApp.metadata.jsonl: one JSON object containing the app's identity, versions, permissions, activities, intents, locale coverage, and detected A/B-testing SDKs. The other two workflows are classify (AI/ML runtimes and model files) and flows (input → module → endpoint graphs); all three take the same input and batching options.

To analyse a folder of apps — the common case for a scraped corpus — swap --apk for --apk-dir:

cim-apps metadata --apk-dir apps/ --outdir results/

One output file is written per app, already-completed apps are skipped on re-run, and the work is spread across your CPU cores automatically.

Where next

The command reference describes each workflow and option. Working with results shows how to load the JSONL outputs into pandas. Running at scale covers SLURM arrays, containers, and profiling for corpus-sized studies. Research methodology lives on the methods page.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search