Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem
Fecha
2023-01-10Resumen
The ability to identify the author responsible for a given software
object is critical for many research studies
and for enhancing software transparency and accountability. However,
as opposed to other application markets like Apple's iOS App Store, attribution in the
Android ecosystem is known to be hard.
Prior research has leveraged market metadata and
signing certificates to identify software authors without questioning
the validity and accuracy of these attribution signals.
However, Android application (app) authors can, either intentionally or by mistake,
hide their true identity due to:
(1) the lack of policy enforcement by markets to ensure the
accuracy and correctness of the information disclosed
by developers in their market profiles during the app release process, and
(2) the use of self-signed certificates for signing apps instead of
certificates issued by trusted CAs.
In this paper, we perform the first empirical analysis
of the availability, volatility and overall aptness of publicly available
market and app
metadata for author attribution in Android markets.
To that end, we analyze a dataset of over
2.5 million market entries and apps extracted from five Android markets
for over two years. Our results show that widely used attribution signals are
often missing from market profiles and that they change over time. We also
invalidate the general belief about the
validity of signing certificates
for author attribution. For instance, we find
that apps from different authors share signing certificates
due to the proliferation of app building frameworks and software factories.
Finally, we introduce the concept of an \emph{attribution graph} and
we apply it to evaluate the validity of existing attribution signals
on the Google Play Store. Our results confirm that
the lack of control over publicly available signals can
confuse automatic attribution processes.