Pretty cool idea!
I like the "passive" approach.
Few questions: Does it automatically take screenshots each X seconds?
And which models does it run locally to analyze the images and do the audio transcriptions?
For voice I use Apple's SFSpeechRecognizer. I'm thinking of switching that to an OS model, but the memory footprint of the application is already very high.
Pretty cool idea! I like the "passive" approach. Few questions: Does it automatically take screenshots each X seconds? And which models does it run locally to analyze the images and do the audio transcriptions?
Ty!
It currently takes pictures every 30 seconds and whenever you switch applications.
I use https://huggingface.co/mlx-community/gemma-3-4b-it-qat-4bit to do the chat/image recognition and Qwen/Qwen3-Embedding-0.6B-4bit and Qwen3-Reranker-0.6B-4bit to help in search related features.
For voice I use Apple's SFSpeechRecognizer. I'm thinking of switching that to an OS model, but the memory footprint of the application is already very high.
Creator of tasktrace here, AMA!
[dead]