Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# /e/OS Assistant - proof of concept
This project is proof of concept for a virtual assistant for /e/OS. It is licensed and distributed under [The GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).
## Technologies
- only **Kotlin**, without any Java
- **Jetpack Compose** for UI
- **Hilt** for dependency injection
- **DataStore** for user preferences
- **Material 2** design and theming, using /e/'s SDK
- [**Vosk**](https://alphacephei.com/vosk) for speech recognition
- [**llama.cpp**](https://github.com/ggerganov/llama.cpp) to run Large Language Models, along with Kotlin bindings written in C++ and built with the NDK, see [the note below](#llamacpp)
## Project structure
- `gradle.properties`, `settings.gradle.kts`, `build.gradle.kts` and `gradle/` are the usual files for the Gradle project, and in particular `gradle/libs.versions.toml` is the dependencies **Version Catalog** shared across all subprojects
- the `skillDataPlugin/` subproject is described [below](#skill-data-gradle-plugin)
- the `app/` subproject is the actual app
- `app/src/main/java` contains the Kotlin code for the app and the UI
- `app/src/main/cpp` hosts the C++ code for interacting with llama.cpp, see [the note below](#llamacpp)
- `app/src/main/skill_data` is the source folder for the [skill data plugin](#skill-data-gradle-plugin)
## Skill data Gradle plugin
The `skillDataPlugin/` folder hosts a Gradle plugin whose purpose is to take data for skills from `app/src/main/skill_data/` and compile it into Kotlin files under `app/build/generated/skill_data_plugin/foundation/e/assistant/skill_data/`.
### skills.json file
`app/src/main/skill_data/` must contain a `skills.json` file with an item for each skill. Each skill has a unique ID, and a possibly-empty list of fields that the skill way want to capture from sentences uttered by the user. Here's a schema:
```json
[
{
"id": "the_skill_id",
"captures": ["information_fields_the_skill_can_process"]
},
{
"id": "weather",
"captures": ["place"]
},
// ...
]
```
### Skill data files
Every subfolder of `app/src/main/skill_data/` represents a set of skill data for each language, e.g. `en/`. Each file under these subfolders is of the form `the_skill_id.json`, according to the skill id defined in the root `skills.json`. All such files contain a list of example prompts and corresponding captures, here's a schema:
```json
{
"User example query in natural language": { "capture1": "value1" },
"What's the weather in New York?": { "place": "New York" },
// ...
}
```
### Generated kotlin files
All generated files are under `app/build/generated/skill_data_plugin/foundation/e/assistant/skill_data/`, in the package `foundation.e.assistant.skill_data`.
There is one `SkillData_the_skill_id.kt` for each skill id defined in `skills.json`. Each of those files contains a Kotlin `object` named after the skill id, with the `llmData` field containing id, captures and examples separated by language, and then other string fields named after the captures assigned to the string names of the captures themselves (e.g. `val place: String = "place"`).
Finally, there is `SkillData.kt` that just contains a list of the languages found and parsed in subfolders of `app/src/main/skill_data/`.
The advantages of having a plugin that generates files like this are:
- to avoid hardcoding data in Kotlin files
- to avoid the (slight) overhead of loading things at runtime
- to allow having captures' names checked at compile-time
- the possibility of publishing the files on a community translation platform
## llama.cpp
The code for interacting with llama.cpp was taken from [their Android example](https://github.com/ggerganov/llama.cpp/blame/master/examples/llama.android). This includes the `foundation.e.assistant.llm.Llamacpp` class and the `app/src/main/cpp/llama-android.cpp` file. The `Llamacpp` class contains bindings for the `llama-android.cpp` native code, plus a couple more utilities to handle model state. The llama.cpp repository is set as a dependency and included into the compilation by `app/src/main/cpp/CMakeLists.txt`, and currently the last commit on the master branch is always used.
## TODO
- add more languages and possibly implement an option to change language dynamically (code infrastructure is already done)
- the weather skill could be multilanguage by just passing a different language code to the API
- the menu for selecting the LLM model should be to a settings screen, but it should still be possible to perform the initial download from the main screen
- download LLM models from a server owned by /e/, instead of directly from HuggingFace, which would also allow providing customized/fine-tuned models
- support resuming LLM download
- improve the calendar skill to obtain both begin and end dates
- currently [dicio-numbers](https://github.com/Stypox/dicio-numbers) is used for date and number parsing, but it only supports English and Italian and does not support date ranges, so an alternative might be needed, e.g. [facebook/duckling](https://github.com/facebook/duckling/) or [microsoft/Recognizers-Text](https://github.com/microsoft/Recognizers-Text)
- the colors used in the theme are the ones from /e/'s SDK, but they have been remapped to Material2 colors in possibly a wrong way, so this needs to be checked by a designer
- currently the fine-tuning process described [above](#creating-fine-tuned-model) does not take into account e.g. the summary skill, which might negatively impact summary generation
- `llama.cpp` supports "grammars", that can force an LLM model into generating output in a specific format (in our case, JSON format, with the "skill" field belonging to a set of supported skills)
- Vosk seems to crash on some devices when using 44100Hz as microphone sample frequency (although according to Android docs it should be the only frequency that works everywhere). 16000Hz seems to solve the problem in those cases. The number can be changed in [VoskManager.kt](./app/src/main/java/foundation/e/assistant/input/VoskManager.kt).
## Setup
To build the project, just open it in Android Studio and build it from there. Alternatively, the usual gradle commands (e.g. `gradle build` or `gradle assembleRelease`) work just fine.
Requirements:
- Java 17
- Set `OPEN_WEATHER_MAP_API_KEY=<your Open Weather Map API key>` inside local.properties