Week 46: Sprint 3

Block programming

Block programming has worked together to make sure that The block server and the blockly site can communicate together. We have also made sure that two of the groups’ computers can be used to host the block programming. This was done in preparation for the open house we will be hosting together with Luleå Makerspace on the 19:th.

Block server

Aside from syncing with the blockly group, we have mostly done some small house cleaning during this sprint. We have started using Pylint, the common linter we decided to use. We have also started work on a docker file, which should allow us to more easily host the program when we are done.

A composite function, in the form of a dance, has been started. This function, when called via a block, should make Pepper dance. We are thinking that it will probably start music as well. After some other composite functions, the only thing we have left to implement is a “joystick” mode, where Pepper can be controlled with, for example, the WASD buttons. This was requested by Peter Parnes, as a way to more easily control Pepper.

Blockly

There was a lot of bugfixing and rewriting of documentation during this sprint. As usual, new blocks have been added but not only blocks for Peppers built-in functions but a wait-block as well. The connection with the Block server had some issues when testing out some more functionality but it is now resolved. The multi-language version of the site is not done yet since focus have been on creating a fairly stable version before the Open house with LMS so the children can test the programming interface. Next sprint should have both english and swedish versions for the user to choose from and a design which enables the addition of futher languages.

Web interaction

The majority of this sprint’s focus has been to implement multi-language functionality. Our primary focus has been to make it functional for Swedish but we also wanted to have the possibility to process speech-to-text in other languages in the future. Therefore we chose to use OpenAI’s Whisper, implemented on a server.

There was a need for a server in order to make the transcribing quicker, there is some RTT delay but a given request takes sub 10 seconds, usually less. The RTT is heavily dependent on the server hardware. Whisper is a really great model, but when it comes to transcribing singular words e.g. “football” the model can struggle to provide accurate transcription. Our speculation is because of the lack of context for as soon as one provides more words or complete sentences the accuracy noticeably goes up.

With this added Swedish functionality we implemented a method that displays a picture for a given word from Google. This works great due to both Google’s auto-correction but also because we defined our own search-engine which allows us to enable Safe Search, which was important to us due to the interactions with kids.

Going forward, there are several different paths that could be followed. We intend to speak to our stakeholder to gain some clarity over what could be interesting to delve further into.

Rock-paper-scissors

This sprint we added face tracking with Pepper, text-to-speech interactions that instructs the player what is happening in the game and also functionality to declare the winner from an integer. We have also started looking into packaging the system so that the block programming group could create a block for our group. In addition, we have also started working on server communication with the image recognition group and started discuss how that communication should be done.

In preperation for the open house event we also have a minimal solution that could work if we are not able to establish connections and receive results from the image recognition group.

Next sprint we will continue working with the server communications and the packaging of the system.

Image recognition

In this sprint we have been more divided in our work than in previous sprints, this has mostly been due to conflicting schedules in the new study period, and sickness. Even so, we have made progress on our model where we have focused on improving the data augmentation. This is an area where we will need to put a lot more effort, as the dataset we are working with from tensorflow is rather simplistic and none of the images have a background image. At present we can automatically add randomized backgrounds, how ever they appear with a distinct white outline, as seen below.

Augmented hand

Before these augmented images can be effectively used in model training we need to handle the outlines, otherwise it is very likely that the model will latch onto detecting the outline. We also apply gaussian-noise as an augmentation factor, this should alleviate the outline problem but is far from enough on its own.

In addition we have also begun creating our own dataset that we can use to test the models on real world situations. This dataset is taken with Peppers own hardware and with some variations such as distance, grip, and supporting hands. The goal is to both create a valuable dataset as well as to thoroughly document the creation process. This is to ensure that future work can add to the dataset with as few issues as possible.