Suggested Telemetry System (collect, analyze and report)

GabrielXia · Mar 11, 2017

Related to this GSoC item : Issue reporting improvement
And maybe related to the close item : Improve visibility of missing resource scenarios and general user reporting
Basic Idea
Our Telemetry System idea is to collect a bunch of players' telemetry recording (based on users' option), such as exception, os, version, video card, etc. This information will be gathered in some kind of telemetry storage where some analysis jobs are doing. Their results might help us better improve our Terasology :gooey:

Here are some similar project :

I think at first this telemetry system might specifically concentrate on Crash &Issue Reporting, lots of further developments could be made.

Scenario
Steal @Cervator's example

A bunch of game clients encountered a bug with specific log "Error X, Y, Z occurred". We use the telemetry system and find out "Hey this says that a bug with log entry soandso happened x times, we should look at that". Or a bug happened in a specific OS several times, then we will say "hey there is a bug specific in this OS, we should subit an issue on Github"
If we get a bunch of error report caused by the user enabling 200 different modules, this might suggest to us that we need disclaim to not enable 200 modules.

Data collected

OS
Video card
Game version
Game mode (Singleplayer, modules enable, world generator, etc.)
log (especially warning, error)
Memory

Potential telemetry framework

Snap. Just ran their example code, looks good. Although it seems to be a server metric tracker, I've a first contact with snap team, they said that it's doable and we need to write a plugin if we need some specific hooks. They are nice, too I'm not familiar with snap framework yet, I need to do some small samples as soon as possible !
Google Analysis seems potential. It even has API for unity games, though I guess we don't use unity. Atom metric uses Google Analysis. There are also some similar on open source projects like piwik, but they are more for a web or a mobile application.
Thanks to @Rostyslav Zatserkovnyi the potential framework snowplow
Need some more suggestion here Thanks

Need to do
Lots of things in the issue should be more specific :

Issue automatically reporting, how the new system fit to game/CR
What will be the telemetry storage ? Where will be it runs on ?
What specific analysis job will be done in this system ?

Future development
Put some future development here for the motivation

The feedback from users is important !!! It should be made easy or even automatic, and safe. It'll help our game easier to develop and funnier. I think this telemetry system could do a lot of funny stuff :

We can get what kind of box users like the most, the most popular tools, their favorite module, etc.
The command the users like the most, then we might get ideas of what kinds of new command will be
Too much , probably need to focus more on the specific stuff to do than the future

Appreciate any feedback

Cervator · Mar 12, 2017

I got really excited about Snowplow after realizing we have a "man on the inside" so to say, hehe. I love it when work and hobby can coincide like that. And it looks really well put together, not that I've had time to try it out.

Google Analytics and maybe Fabric (which Google bought out) has potential and probably would tie really well to Android (maybe first for Destination Sol, then one day for Terasology) but may be overly web or mobile centric. Plus probably there'll be some paranoid devs that'll prefer something more independent. Maybe I'm a bit too worried about conspiracy theories but both devs and users can get really mad about opt-in / opt-out tracking/telemetry.

We can provide some server infrastructure for testing out whatever, just let me know.

Try to keep in mind that it would be nice to be able to reuse as much of this as possible for more than just Terasology. Destination Sol and our other tools at first, maybe other games in the future. That's sort of a secondary consideration, I think for GSOC the focus should be on Terasology, but it is a good thing to architect for early.

I don't know if our CrashReporter / IssueReporter would itself be replaced by some component in an existing framework. Probably it would just report to a different back-end. I don't know the frameworks well enough yet.

Skaldarnar · Mar 12, 2017

Probably very detailed analytics of resources (RAM, CPU, GPU usage) and framerate could also be interesting. Viewing graphs relating chunk processing to memory usage to framerate, etc. This could help us to focus on optimizations that have an impact for players. Also, loading times (of assets, world generation, ...) might be interesting. There are so many potential things to look at, we just have to start somewhere

GabrielXia · Mar 18, 2017

Hey

I did a simple test of snowplow tracker with the server set up in cloud, it worked. It's in my Github account repository, I wrote some docs there, thanks for any feedback

Your guys' cool suggestion is very important to me. Any suggestion such as, what the next steps will be, whether is cool or not and so on will be important to me ~

Here is some ideas about implementing the tracker in terasology :

Two goals in mind :

The telemetry system should change the code in terasology as little as possible
This system could also be used for terasology mobile version as well as DestinationSol

Implementation :

There will be an emitter which sends event to server, using @in Emitter emitter inject to other classes
The trackers will be used in several classes :
- There might be a MetricSystem who extends BaseComponentSystem implements UpdateSubscriberSystem. While initializing it could track some basic information such as os, video card, etc. While updating it could track information like memory usage, framerate, etc.
- The other specific event might be tracked in specific class
- Using Logback appender to send `error` or `warn` logs to the server
The other work will have to be done in the server

A new idea about future development :

The user might also want to see some data graph, I think this system can be set up locally very easy. Then we can see data graph in the browser

Skaldarnar · Mar 18, 2017

In any case, you have to make sure that it can be disabled completely if user's don't want to track such information. Some people are really sensible when it comes to tracking user data, and we should specify very clearly what kind of data we might observe, where it is sent, etc.

I think we already have a metric system of sorts which is responsible for the on-screen debug info. Afaik you can also hook up to that system, or rather add new debug screens to the overlay. Maybe (parts of) that system can be re-used/extended for the purpose of the telemetry system?

GabrielXia · Mar 25, 2017

Hi @Skaldarnar , thanks for your suggestions

I've set up a telemetric system locally. I used Logback logstashTCPSocketAppender in the client part and Logstash in server to report warn and error logs, it worked.

Here is the diagram of the system, @Rostyslav Zatserkovnyi might can help me check this structure

In short, "Collector", "Enrich", "Sink good" and "Sink bad" are snowplow tools to collect and select events. "logstash" collects log information. "Elasticsearch Cluster" stores the data. "Kibana" provides a browser-based analytics and search dashboard. Please see more details in the repositories : the server and the client.

I sum up the precondition of our telemetry system :

Users can disable all the metric tracking functions!
The telemetry system should change the code in terasology as little as possible
This system could also be used for terasology mobile version as well as DestinationSol

I sum up here some my idea :

Apart from normal metrics, we could also provide a "light feedback report system", we can create a custom event named feedbackEvent, users can write their feedback in a the "feedbackOverlay" and the feedback will be sent to the server.
The user might also want to see some data graph. This system can be set up locally very easily. After it's been set up locally, we can then see data graph in the browser.

Did I make it clear ? Are you happy with this system ?

Please let me know your feeling

Rostyslav Zatserkovnyi · Mar 26, 2017

The entire setup looks amazing for an early prototype! ~~The only issue I could notice is that in your system overview the "good-events-pipe" should be connected to the "good" sink, but that's just a minor oversight - otherwise everything looks solid.~~ (This is fixed now, good job!) I'll be setting the entire pipeline up later this week and trying it out.

A few additional things I'd focus on:

What metrics will be sent, anyway? The prototype collects some system hardware/software info, which is a good starting point - it might make sense to use a custom structured event to collect all the info we can get using JVM system methods and obtain data similar to the Steam hardware & software survey. Beyond that, I'd think of some gameplay-related data that can be sent by Core (stats like blocks walked/destroyed/placed etc.) as well as an API that external module devs can use for custom metrics.
The metrics should be transparent and viewable by end users - Minecraft does a good job at this by having a menu that shows all the raw data sent to the devs. Replicating this UI would be nice (bonus feature: allow the user to enable/disable individual fields in the data - i.e. if the user would like to send info about screen resolution but not location they can disable the relevant fields).

GabrielXia · Mar 27, 2017

Thanks for your suggestions @Rostyslav Zatserkovnyi

I've updated the image, it's my mistake of painting. The metrics being sent and the users' authorization are exactly what I'm thinking about at present. I'm wondering how the autorization can be implemented in the code, e.g., how does a tracker know whether it should track the metric or not ?

Rostyslav Zatserkovnyi · Mar 27, 2017

Either have a config to determine which fields are OK to send, or a blacklist of fields (the latter is better for external modules). Then just replace the blacklisted fields with something like "[REDACTED]" in an event. (Edited for clarity)

GabrielXia · Mar 29, 2017

Reborn after a terrible examen !!!

With the help of @Rostyslav Zatserkovnyi , @oniatus , @Skaldarnar and @Cervator , I've finally done a draft !
https://docs.google.com/document/d/1N4zkThE7VG_LSR67XH8H7UkvMKrTSSAf7eoj1Esz66c/edit?usp=sharing
However I knew that there are still lots of things to improve, I'm open to any changs !

In general, I don't have much ideas about the gameplay metrics (the whole metric table is in section 6 in doc). I may focus on that in the last four days, thanks for your additions !

GabrielXia · Mar 31, 2017

Hihi

I updated my proposal, here is the link : https://docs.google.com/document/d/1g7RA1xMm7YODRhqLs31Jd9WXeMtASb5QG1H_dS2xe1Q/edit?usp=sharing,
Thank you for reviewing!

Suggested Telemetry System (collect, analyze and report)

GabrielXia

New Member

Cervator

Org Co-Founder & Project Lead

Skaldarnar

Development Lead

GabrielXia

New Member

Skaldarnar

Development Lead

GabrielXia

New Member

Rostyslav Zatserkovnyi

Member

GabrielXia

New Member

Rostyslav Zatserkovnyi

Member

GabrielXia

New Member

GabrielXia

New Member