Welcome back to my last project update. I still can’t believe the fellowship is over and honestly it feels insane thinking back to the time we started this project and how much we’ve achieved by the end of it. I became aware of DTSF my freshman year from my international mentor and didn’t ever think I was capable enough to carry out my independent project with little knowledge of using technologies. But now that I completed my project, I feel proud and grateful for how far I’ve come. From dancing with a VR robot during lunch breaks to continuously debugging the same code for an entire week, I can safely say it has been a tough but fun ride.
The project kicked off by brainstorming ideas. Firstly, I knew I wanted to work with drones. I’ve always wanted to fly one and thought it would be great if I could program it and make it more useful. I came across an article that mentioned mind-controlled drones. I thought this idea of being able to think about moving a drone in a certain direction and making it do exactly was pretty incredible. However, this was too complex and time-consuming for me to execute so I settled on creating a voice-controlled drone instead. Now, let’s talk about how I turned this project idea into reality!
Firstly, I used the DJI Tello EDU drone for my project because it was easily programmable in python. It had pre-built APIs which were pretty flexible to use and small in size that could be easily tested indoors. Then I proceeded to write a program to convert speech commands to text commands so that the drone can receive those text commands and execute them. To do so, I used google’s speech to text API. While this was a decent API, it has some disadvantages too. Sometimes it would give widely inaccurate predictions and the drone would receive a wrong command. Below is an example where I passed a voice command ‘Take-off’ and it thought I said: “steve-o”, “Eva”, or “takeoff”.
To solve this issue, we used an external microphone. I also added a noise cancellation function so that the voice is clear and the API can predict more accurately.
Next step: Operating the drone via laptop. I used the official DJI Tello API to control the drone using my laptop. This part didn’t take a lot of time and I thought was the easiest one. As you keep reading, you will realize how wrong I was. Now at this point, I realized that I needed my laptop to connect to the internet for my speech-to-text API to work but I also needed my laptop’s WIFI to connect to the drone. Both could not be done at the same time. So, I used a USB wifi adapter to create a dual WIFI interface and this issue was resolved.
Finally, now was the moment of truth: feeding voice commands to operate the drone. I tried passing few commands but sadly the drone refused to take any other command than “takeoff”. It was as if the drone was confused about what to do after the execution of the first command was completed. My voice command took a while to process into a text command so the drone couldn’t stabilize until the next command was processed. I had to completely change my entire code and import a different API other than the official one to solve this issue. This took quite a bit of time.
At this point, my drone could follow the voice commands sent but I had to wait until the execution of the first command to pass another command. However, I wanted my drone to be able to take continuous voice input, create a queue of command sequences and execute it one by one. I did so using a programming concept called Threading. Using threading you can have two or more two code blocks running at the same time. This took a huge chunk of the project time because I had limited experience with threading, especially in Python. After I was successful in completing this part, my drone had a background program that would continuously take voice input and the main program would execute the commands.
I was almost done with my project but I wanted to add more functionality to make the product more user-friendly and interactive. I decided to add text to speech functionality as well. This will allow the user to ask questions such as “What’s the battery life?” or ”What’s the internal temperature?” and the drone should be able to speak back to the user through the external speaker. In addition to that, the user would also be able to provide their value for commands such as “go forward by 60” or “go up by 100” instead of the drone executing the commands using the default value of 50. Putting everything together, the drone was now able to process the commands fed by the user and even talk back to the user in certain situations.
This was the end of my original project idea. However, I had about a week and a half to add more functionalities to the project and make it even better. To do so, I wanted to make my drone be able to scan QR codes and process the commands encoded in the QR code. Firstly, I was able to create two different programs to encode and decode the QR codes. However, the problem arose when I tried taking a video stream from the drone. The thread that I used to take the video stream did not communicate well with the other previous threads and the program entirely crashed. I took a week trying to figure this out but I couldn’t and was running out of time. So, I started preparing for my final presentation instead.
The presentation was great and we all got really good responses from the audience. As for the future, I would love to keep working on this idea and take it further by completing the QR detection part, adding object recognition functionality, and in the end creating a prototype of a device that would help physically challenged people to move different objects around in their apartment.
At last, I would like to thank all my fellow friends along with Josh and Eric for being amazing support throughout the project. I learned so much this summer and hope to continue using the creative lab to expand my knowledge of different technologies.